PHYD57. Midterm exam 1 Mar 2022 - Problems. Solutions.
Outline a well commented C(++)/Fortran program. Use as many comment lines as you please,
but do explain each line that does something. Small mistakes in syntax will not be
counted against you.
Random photon path
Write in C(++) or Fortran program that simulates a random walk in 3-D
of a photon inside a star. The photon starts at the origin of (x,y,z) system and
each coordinate changes from -1 cm to +1 cm in each of N = 1024**3 (about 1 billion)
steps.
To provide the random numbers in the [-1,+1] range with a physically plausible
probability distribution, a separate fast procedure called Rnd(tab) is available.
That is, if tab is a 1-D float/real array of length 3N, then calling the
procedure as "Rnd(tab);" in C, or in Fortran "call Rnd(tab)", you will quickly
fill the whole array with random numbers.
The program needs to be multithreaded via OpenMP (omp) to make simultaneous use of
16 available CPU threads. Make sure that you construct 1 continuous path, not 16
different paths in your code, and that the code is parallel not serial, yet if run
with 1 thread only would return the same results. The final trajectory should be
contained in 3 float/real arrays of length N, called x, y, and z.
SOLUTION OUTLINE
There can be different approaches, but in general the problem is called
a parallel scan (equivalent to prefix sum, or integration of a function sampled
at N points.) Given enough threads, one can try to implement O(ln N) - type
algorithms (see wikipedia on prefix sum), but here, with 16 threads, a more
reasonable, simple, and very fast algorithm is simply divide-and-conquer
algorithm, where each of the 16 threads does N/16 summations.
In real life "summation" may not simply be a summation, but some involved and
difficult calculation of its own.
However, there is one problem here, since the final result must be one
continous result, while each of the 16 pieces of integration (they should
start from x=0,y=0,z=0) is a disconnected, discontinuous partial random walks.
They need to be connected at the end.
I see two varians of a program part to reconnect the pieces.
One is actually building a new array of continuous trajectory, the other
is essentially the same but apply to the whole block of data, saving a few
operations. The second method leaves the partial integrations data as they are
(that's memory-efficient) and only on demand provides a value of the position
after step i, F(i), by doing the following steps.
1. Compute deltas: delta_k = f((k+1)*N/16) -f(k*N/16) =
sum of all steps in segment k, k=0...15.
2. For any step number i at which we want to evaluate the position, compute
j = i/16 (Fortran) or i//16; (C/C++)
(integer division provides the number j labeling the section, j = 0...15).
3. Sum up all deltas up to j-1, and add the result to the f(i). F(i) is ready.
Time needed for integration is of order O(N/16), as is the time needed to
complete the construction of the whole tracectory, so the cost is of order
O(N/8). This is to be compared with the cost O(N) of sequential calculation
in this particular problem which is extremely light on computations.
But if, say, evaluating each step takes 100 arithmetic operations, then
the amount of time is O(N/16*100 + N/16)= O(N/16*101) and the adjustment
cost is only 1% of the essential computation.
PHYD57. Midterm Exam 2022. QUIZ. Solutions.
Edit this text file. Leave only one, Y[es] or N[o], then sign and submit this file.
Statements may be tricky, so read carefully. A single word or number
may be incorrect. Any "[N]" answer MUST have at least one wrong word
or number highlighted by enclosing them in << >> brackets.
Please disregard typos. Symbol % stands for command-line prompt, symbol ^
is a power operation, e.g. 3^2 = 9.
To make up for unintended ambiguity of the questions, 2 points will be
added to your result. Good luck!
[Y] Antikythera Mechanism was the first known digital+analog portable computer
of times of lunar and solar eclipses, positions of planets, olympiads etc.
[Y] IP packets sent over Internet contain detailed information such as
the so-called MAC addresses, allowing unique identification of the sending
equipment.
[N] Numpy is faster than straight Python because it is <> and
pre-compiled in other languages
[N] icpc is a <> compiler able to traslate C, C++,<< Fortran and Python>> programs
[Y] OpenMP language assumes shared memory access, that is all the threads
it creates must have access to the same operational memory (RAM).
[N] <>, Atanassov and Berry computer was built by <> motion in air.
It also solved linear algebra equations. It used thousands of vacuum tubes.
[N] Break-in attempts to Linux can be blocked by removing or changing <> file
[Y] Simpson's integration method is 4th order
[Y] Assembly program is a series of low-level processor instructions, e.g. using registers
[N] A 4-Byte integer can represent all numbers from 0 to 2^{32} <>
[N] Scanning the momory allocated to a 3-d array in Fortran from start to end, the
<> index of the array changes fastest and the <> index most slowly.
[N] Option -openmp of Intel compiler makes sure OpenMP is <>
[Y/N - was not discussed in detail so far!]
In the same number of clock cycles a CPU can perform 30 aritmetic
operations or fetch a randomly positioned number from RAM; thus fetching blocks of
RAM into cache
[N] Density of transistors on an integrated circuit has <> 1 MT/mm^2
(1 million transistors per mm^2).
[N] Double semicolon in the C loop: for (i,j,k=0;;) {(...)} <>
[Y] IBM computers in 1960s were mostly leased at high cost to wealthy
businesses and universities. In the 1970s, Digital Equipment Corp (DEC)
disrupted that model by selling a 100x cheaper minis: PDP-7 to PDP-11.
[N] <> is a relatively new language, created for parallel computing of a
scientific problems.
[Y] Operating system Unix was created at AT&T Bell Labs on PDP-series minicomputers
[N] Currently, the <> performance of processors grows by a factor of 2
in about 2 years.
[Y] In Linux, a wave (tilde ~) means home directory of user; .. means parent of
the current directory
[N] %hostname in Linux prints the name of host computer; %ssh does secure <>
[N] Command %rm /home/user/output*2.dat removes files in /home/user,
such that any <> can be in place of "*".
[Y] Intel corp. produced a microprocessor called 8086 in late 70's. It was a 16-bit CPU.
Later processors had compatible instruction sets, called x86_, for instance x86_64
is a family of 64-bit processors by Intel and AMD.)
[N] Debian, Linux, Fedora, Red Hat etc. are << flavors (versions) of Unix OS >>
[Y] OpenMP has directives for making variables private, i.e. creating copies of
variables that are only known to a given thread of the program.
[N] Single precision floating point number is <<8>> Bytes long, and has 6 to 7
accurate decimal places
[N] If you parallelize a loop by OpenMP directive, the threads remain alive and
usable after the loop ends. A separate directive needs to be issued to collapse
the fork of multiple threads to a single thread.
[Y] Charles Babbage obtained in early 19th century a grant to build "Differential
Engine" to compute astronomical & math tables. The project was mostly successful
but it was not a general calculator that could do */+- operations.