PHYD57 Advanced Comp. in Physics - Midterm preparation materials.

Introduction

Review all the materials posted on the course web page before the exam, especially lecture notes and code page, and for the Linux commands + languages + OpenMP, also the Links section at the end of course page.

You'll find below problems intended to give you an idea af the kind of quiz questions and problems that you will encounted in a 1-hr midterm exam on the 1 Mar 2022 (first hour of lectures).

You can (and should, as it helps to remember things later) make up to 3 facts sheets (6 pages) of hand-written notes (no other materials allowed, calculator only allowed as technical aid).

Quiz

There will be about 30 quiz questions in the actual midterm. Instructions below apply to midterm and final exams. Ignore typos and grammar, look for more significant issues. Sometimes only 1 word may be incorrect, so read carefully.
................................................................................
Note: k-th power of x is denoted as x^k.

If you think the statement is fully correct, or Yes seems much more
correct than No as a judgment on the truth of the statement, then delete N and
leave Y.
If you answer is No then leave only N, and enclose at least one word or number, 
which is incorrect, in << >> brackets. If you do not indicate what is wrong 
with the statement, your N answer, even if correct, is treated as a guess and 
earns zero points for that question. 

If you want to leave extra explanation on why you selected Y or N, do it on the
marging or just above the statement. 

Since an occasional ambiguity may happen in the formulation of the question, 
about 10% point will be given for free while marking the quiz. 
................................................................................

[Y N] Antikythera Mechanism was probably built in Sicily, in the the workshop
	of Archimedes.

[Y N] Motivation for many early computer builders was creation of 
	astronomical tables (Wilhelm Schickard, Gottrfied Leibnitz, Charles Babbage).

[Y N] In 1600s Blaise Pascal built machines that could add and subtract numbers,
	 to aid	in tax collection. No multiplication or division was implemented.

[Y N] The first known use of binary system in mechanical calculators was the 
   calculator known as 'steped reckoner' by Alan Turing from the beginning of
   20th century. 

[Y N] Babbages unfinished, hand-cranked, Differential Engine contained 8000 parts
 and weighed 5 tons.  

[Y N] Charles Babbage has built the first 4-operations general-purpose calculator
	which used punched cards and was programmable (called Analytical Engine). 

[Y N] The pre-WWII versions of German coding machine ENIGMA were deciphered and
	reverse-engineered by Polish cryptologists Rejewski, Zygalski and Rozycki.  

[Y N] Alan Turing was computer theorist and a head of British counter-intelligence
	group tasked with deciphering ENIGMA-coded messages during the 2nd world war,
	using electromechanical relays.

[Y N] Colossus, a British programmable electronic computer used to decrypt 
  messages from German coding machine called Lorenz SZ-40, has the first high 
  speed reader of punched paper tapes (5000 characters per second). 

[Y N] Since information about Colossus was classified until 1970, and other 
 projects like those of K. Zuse were unknown, for a long time it was thought 
 that the first computers were built by the US military (ENIAC in 1946)

[Y N] ENIAC used binary numbers but its programs had no conditional branching

[Y N] The basic principle of von Neumann architecture of our computers is that CPU,
 the central processign unit has access to operating memory storing both program
 and data, but which is physically separate from the memory. 

[Y N] High-level programming languages and compilers first appeared on mainframe
	computers in the late 1950s. Compiler is a programmer who operates the main
	console of the computer.

[Y N] Control, data, and address bus are sub-parts of a system bus that sends
  data and memory addresses from/to CPU to/from RAM (non-volatile memory), 
  as well as peripheral devices, such as volatile memory. 

[Y, N] Compiler is translator from higher-level to lower-level programming 
	language. It may combine the function of linker, a program that translates
	assembly language into a binary form (0 and 1 digits) and attaches all 
	necessary executable libraries.   

[Y N] Fortran compiler, released in 1957, was the first high level 
 language, freeing the programmer from working with low-level (elementary) CPU
 instructions, such as is pracaticed in assembly language. 

[Y N] IBM computers in 1960s were mostly leased at high cost to wealthy 
	businesses and universities. In the 1970s, Digital Equipment Corp (DEC) 
	disrupted that model by selling a 100x cheaper minicomputer series PDP-8 to 
	PDP-11. 

[Y N] Minicomputers in 1970s were sold with open (meaning non-proprietary) 
	internal architecture to knowledgable clients in business or academia. 
	OEMs (Original Equipm. manufacturers) started producing compatible devices, 
	such as printers. 

[Y N] C language was created in 1973 in conjunction with UNIX operating system.
They stayed tightly connected. Today, every Linux system has C compiler as a 
necessary part of software stack.  

[Y N] C and later C++, were created mainly for scientific computions. 

[Y N] First microprocessor (using integrated circuits) was Intel's 8086 CPU

[Y N] 8086 was the name of a popular Intel 16-bit processor. Subsequnt generations 
	of Intel processors with similar instruction set have a name retaining part of
	that name. For instance, x86_64 means all modern 64-bit processors compatible
	with Intel's instruction set.

[Y N] Python was created as a high-level, compiled language similar to Fortran. 

[Y N] Apple personal computers were invented in 1980s, developing the idea of
	interactive use of terminals initiated by DEC Corporation on minicomputers.

[Y N] There were two popular versions of Unix, from GNU project and Berkeley 
	Software Distribution (BSD). Both were used as inspiration and parts of the 
	new OS called Linux, created in 1991 by Linus Torvalds in Finland. 

[Y N] Currently, almost all supercomputers and all web servers, as well as
    Android and Chrome phone operating systems use Linux kernel, one of the most
	successful community-based open, freely available software.

[Y N] In Linux, ls -l | more  is a command that pipes the output of directory
	listing command (long format) to 'more' command which shows the output page
	by page or using other size of block of text.

[Y N] In Linux, cp files* path1  command will copy all files whose names start 
	with string 'files' to subdirectory given by the string path1.

[Y N] In order to suspend the execution of a foreground program and put it in 
	a background execution mode in Linux, one needs to press CTLR-Z  to get the 
	prompt, then type 'bg' +RETURN. 

[Y N] Linux command pwd returns the working directory's name and path.

[Y N] Linux command cd path1 changes the working directory to path1

[Y N] Root or superuser in Linux is the user with full priviledges. Other users 
	cannot issue certain dangerous commands, and access files restricted for use
	by the root.  

[Y N] Bit's value can either be True or False. 8 bits make one Byte.

[Y N] ASCII is a table specifying how characters (including Latin alphabet and 
	some text formatting symbols) are mapped to unsigned 1-Byte integer values 

[Y N] Hard disk with capacity 1 TB can hold up to 1 thousand million Bytes. 

[Y N] An 8-Byte integer can represent numbers from 0 to 2^32 -1 = 

[Y N] A 1-Byte integer can represent numbers from 0 to 16*16 = 256 so 8-bit Bytes
	are enough to code more natural languages like English.

[Y N] Break-in attempts to linux systems can be seen in /var/log/secure logfiles 

[Y N] GNU project provides: g++ = C++ compiler, gfortran = Fortran95 compiler  

[Y N] ifort -openmp    provides access to OpenMP by a fortran program 

[Y N] Laplace operator has a shape of a cross and computes the gradient of the
 input function. It is used in convolution, unsharp masking and other algorithms

[Y N] OpenMP consists of 3 components: compiler options, library of functions 
	and OMP compiler pragmas (directives)

[Y N] Timing of the sections of code in multithreaded applications written in 
	C or Fortran should be done using double precision function omp_get_wtime() 

[Y N] A loop: for(i=0; i < N; i++) { ...work...} must be preceded by OMP 
	pragma #pragma omp parallel for private(i,...), otherwise the loop counter i
	will be a single variable common to all program threads, which will lead to 
	race condition and overwriting of correct values of i. 

[Y N] All the variables not declared as private in a parallel loop construct
	will be automatically public, that is common to all threads. 

[Y N] OMP allows the programmer to specify a different number of threads 
	to be spawned by each fork-join parallel block section, for instance one 
	such section may use 8 threads and another 16 threads.

[Y N] OpenMP language assumes shared memory access, that is all the threads
	it creates must have access to the same operational memory (RAM). 

[Y N] The latest 3 CPU technologies of integrated circuit fabrications are
	called: 130 nm, 90 nm, and 65 nm.
 
[Y N] After 2003 or 2004, internal clocks processors stopped having gradually 
higher frequencies; stagnation happened at 1-4 GHz frequency, depending on 
the processor type. 

[Y N] CPU powere usage per unit area of IC chip equals P = N f C V^2, 
where N is the number of cores on the chip, f their clock frequency, 
C is capacitance of one transistor, and V its working voltage.  

[Y N] During the reign of Moore's law, power supply to a unit area of a microchip 
P = N f C V^2 was doubling after the switch to the next level of miniaturization. 
 
[Y N] Automatic vectorization of a program is performed by compiler, and can 
   be requested via directives by the programmer. Compiler creates code 
   using a lareg number of wide (256 or 512 bit) AVX registers which can perform 
   simultaneous multiplications and/or additions. 

[Y N] In multiple, nested loops, vectorization should occur in the innermost loop

[Y N] Multithreading via OpenMP creates fork-join parallel region encompassing 
a loop construct

[Y N] In multiple, nested loops, OpenMP mutithreading should be applied to 
	the innermost loop

[Y N] Number of cores in CPUs produced after 2005 started climbing up from 1 to 
   currently about 20 per processor, while the uniprocessor (core) performance 
	stagnatesd.

[Y N]  The most powerful supercomputers today have arithmetic speed capability 
of order 1000 PetaFLOPS = 1 ExaFLOPS. 

[Y N] In near-guture, indicators of performance of the processor will improve 
by a factor of two not in 2 years like in the last half-century, but in ~7 years 
Original Moore's law has ended, or alternatively, it still holds but the progress
slowed down considerably. 

[Y N] Up to several years ago, every ~2 years, 15 times, the size of transistors
etched onto a silicon wafer was decreased by a factor sqrt(2).  

[Y N] The end of energy scaling characterizing the Moore's law after 2003-2008
necessitated freezing the single core performance and introduction of more cores
per processor. This, in turn, necessitated transition to parallel programming
(multithreading).  

[Y N] We do not need to learn methods of parallel programming if we use efficient
languages like C, Julia or Fortran.

[Y N] In 1999, 1 out of 500 supercomputers was in China. In 2019, 220 (44%) of 
the top 500 supercomputers in the world were in China.

[Y N] The top supercomputers use 10 to 20 MW of power, and cost ~300 million 
dollars. These numbers are similar for just one big airliner.  

[Y N] No.1 supercomputer Summit has about 4600 nodes, containing tens of 
thousands of Nvidia graphics cards. Summit can perform 150 TFLOPS (150 * 10^12 
floating point operations per second). 

[Y N] If 1 million particles interact with each other pairwise, and calculation 
of one interaction takes 100 ns (nanoseconds), then a well-written sequential 
program will take almost 1.5 hours to evaluate all interactions. 

[Y N] Random walk of 10000 steps of length 1 will on average end at location 
100 units away from the starting point.

[Y N] Compiler lists all the errors a program generates during the compilation, 
while the interpreter (such as Python) lists them only after encountering errors
during execution of the script. We thus know all that can go wrong with a 
compiled program before we run it.

[Y N] Arrays in C are zero-based (start with index 0), while in Fortran they are
1-based.

[Y N] This loop statement in C has correct syntax: for(a=0,b=10;a+b<10;)

[Y N] In fortran, loop 0,4,8,...,512 starts like:  do  i = 0,512,4

[Y N]  k=1; if(k<=2) k=2*k+1;   This line compiles OK as C, C++, and F95 code 

[Y N] Scanning RAM allocated to a 3-dimensional array in C/C++ from start to end, 
the first index of the array changes fastest and the last index most slowly. 

[Y N] Communication speed between the RAM and GPU (graphics processing unit) 
on certain graphics card is 240 GB/s. You want to process 0.5 GB of data. 
Arithmetic throughput for 4-Byte floats is of order 2 TFLOPS (TeraFLOPS). 
The bottleneck is thus in arithmetic computations, not in communication 
(bandwidth). 

[Y N] Buildings on campus are connected with 1 Gbit/s ethernet network 
(one-directional throughput of data is in practice, let's say, 0.8 Gbit/s). 
To transfer a database of 800 million double precision C or Fortran floating 
point numbers between the buildings, one needs almost 11 minutes.

[Y N] The knapsack problem is this: from among N given items, each having a 
value and a weight, a thief wants to put as much value in a knapsack, which can
only hold a certain weight of items. How to best choose the items? This problem 
has an easy, non-optimal, solution: consider all possible combinations of items.
Of order O(N * 2**N) additions are needed for the solution. 
 
[Y N] I have read all Lecture notes and all the pages linked to our course page.

Practice Problems

There will be only one problem in the midterm.

Algorithms should be implemented in C (or C++) or Fortran95. We are looking for correct algorithms and will overlook what we judge as simple syntax errors. For example, if you forget the precise syntax of calling the Fortran subroutime:
call random_number(tab)
and write some other similar name or mistake it for a function and write tab = random_number(), but explain satisfactorily in comments what the function or routine does , your markes will not be reduced.
[REMEMBER to thoroughly comment your code!!! Short, single-line comments should appear in the program. Number and place long explanations outside the program, only putting reference to them in a comment line. Lack of thorough comments will lead to reduced mark.]

Glaring lack of knowledge of basic language constructs such as loops, arrays, conditional and arithmetic statements, writing code in Python, or only outlining the program structure in some metalanguage, will result in reduction of mark down to about 70% of maximum, but not to zero (it is much better to write a Python code, or an awful/buggy C code than nothing).

_____________________________________________________________________


Problem 1. 

Write a main program that declares a 4-Byte float (or real) two-dimensional
array Tab holding 6 floats representing x,y,z,vx,vy,vz, for N particles,
where N is a constant equal one million. Fill the array with 
values randomly (uniformly) distributed in the the range 0. < val < 1.,
for every variable var = x,y,...

Particles attract each other pairwise with force 
F_ij_x = -(dx) * (dx*dx + dy*dy + dz*dz)**(-3./2.) 
F_ij_y = -(dy) * (dx*dx + dy*dy + dz*dz)**(-3./2.) 
F_ij_z = -(dz) * (dx*dx + dy*dy + dz*dz)**(-3./2.) 
where position of particle i with respect to particle j is a vector
(dx,dy,dz), and F_ij  vector acts on particle i. All particles have mass
equal m = 1 in code units. Time step is equal dt = 0.001 in code units. 

Show that the above equations represent a Newtonian or Coulomb -1/r^2 force law.

Write a function step() containing a code that performs one time step of the 
simulation of the evolution of the system of N particles. The algorithm can be 
Euler (1st order) or leapfrog (2nd order accuracy in time) but in each case, 
meaning of variables and the step-by-step scheme of the algorithm, should be explained.   

Insert OpenMP pragma/directive to parallelize the execution with K threads, 
where K is integer constant defined at the beginning of the program. 



Problem 2. 

Given a double precision type function fun(x), satisfying
fun(a)*fun(b) <= 0.  for [a,b] = [0.,1.], implement in C or Fortran 
the Newton-Raphson search for the root of fun() in interval [a,b]. In other
words, find x such that fun(x)=0.
 
Print information about every iteration of the method, including an estimate 
of the current value of fun(x) and the estimated error of x (assuming fun(x) 
is linear near the zero). 
Final accuracy of root position should be better than 3e-5 * |a-b|. 
If it can't be achieved, appropriate statement after 30 iterations 
should be printed and the program should finish.


Problem 3. 

N = 120 small model cars whose size can be neglected in this problem are
initially distributed uniformly on a frictionless, straight, air track extending 
from x=0.0 to x=L=12 m. Initial position of particle k is
x_k = k * L/N, k = 1,...,N. 
Initial speed is +-10 cm/s, left or right direction being chosen at random. 
Upon encountering a neighbor, cars bounce off elastically (without dissipation 
of energy, therefore preserving speed). Cars fall off the ends of track.

Simulate the motion of cars until the last one falls off the track. 
Optimize the speed of the program using OpenMP with 12 threads. Print final time. 

Note: There is a simple analytical solution. 

Hint: Random number generation in Fortran. You can fill the whole array 
x or a single value x with random number(s) between 0. and 1. like so:    
 call random_number(x);   v = 20.*(x-0.5).
Array v will hold random values +-10.

Hint: Random number generation in C/C++. 
There is a standard library function (so you must 
 #include  in the header of your code) called 'rand()'.
It returns one integer from 0 to a large number much exceeding the 
number of objects you simulate. In C/C++, you randomize the 
array of initial velocities by creating N remainders of division by 2
in a loop. Expression 
 10 * (2 *(rand()%2) -1) 
has a value +-10 with equal probability. 
 


Problem 4. 

You are given a function U(x,y,z) describing the stationary gravitational 
potential per unit mass of test particle, as a function of coordinates x,y,z, 
each in interval [-10,10]. U(x) has a single minimum at (0,0,0). Notice that U 
is not force but the potential per unit mass and its units are J/kg =(m/s)^2. 
Acceleration is gradient of U times (-1).  

In the beginning of simulation, randomly distribute N non-interacting, massless 
point-like test particles in a sphere of radius r=10 around (0,0,0). Initial 
speeds are zero. You can choose Cartesian coordinates in appropriate boundaries 
forming a cube, but do not accept particles which are located outside
radius r. (Notice that the loop generating particles must be 
appropriately longer than N passages!) 

Using function U(x,y,z), write a simulation of motion of the cloud of
particles. Stop the simulation after time t=100 and print the number of 
particles located at r>10 then.



Problem 5. 

Fortran array A(2000,2000) or C array A[2000][2000] contain 4-Byte floats
and are declared outside of the main program (to avoid allocation in stack. 
If you program in Fortran, declare this array in a module.)

Write a program in C of F95, which scans the array to find and print 
the minumum value of the Laplace operator d^2A/dx^2 +  d^2A/dy^2. 
The array represents a square grid, each cell represents 0.01 mm x 0.01 mm. 
Laplace operator must be assembled from two 2nd order-accurate methods in two 
directions. Symmetric stencils will assure that.

Optimize the execution speed with OpenMP. 
Pay attention to indices i and j when accessing A(i,j) elements near boundaries:
do not refer to elements which are outside the array. Boundaries should be 
periodic, i.e. wrap around as if A was living on a torus or donut.

Print the result in an informative way (value, description of what it
represents, time of execution according to wall clock, using omp_get_wtime()).