PHYD57 - Computing Projects

PHYD57 Advanced Comp. Methods in Physics - Final Exam.
Due on Quercus Assign. at 17:15 on Thu 12 Dec 2024.

Preamble

Computations can be done on your own computer or on art-1 (please remember to do them in your own subdirectory!).
Submit results to Quercus Assignments, final exam section. Multiple files are ok, multiple submission of updated files too, within the normal time of 3 hrs.
Analytical solutions can be submitted as scans or snapshots taken with your phone, but should be in some common format like jpg or pdf. Descriptions of algorithms, solutions, should be described in doc, txt or pdf file (preferably, one file for the exam).
Numerical codes should be submitted to Quercus before the submission time, which normally is 17:10, and also (this could be after 17:10) left in your subdir on phyd57@art-1, in case we want to run them.

Caltec (Calif. Inst. of Technology) students strictly & proudly adhere to honors code. UofT students should do the same. Copying codes produced by "AI" or browsers (i) is plagiarism in this course, (ii) easily detectable, and (iii) most of them don't actually solve the problems. As an exception (not a recommendation), you may use internet tools to resolve syntax errors in your code, if you get stuck with them (which we judge unlikely).

Problem 1 [50 p.] Elastic 3-D pendulum

1. Create a single-thread program in C/C++ or Fortran using double precision variables, calculating $T=10^3$ seconds of motion of the elastic pandulum with the following characteristics. The suspension point is fixed at the origin of Cartesian coordinates. Earth's gravity results in constant z-axis acceleration $\vec{g} = -\hat{z}$ 9.81 m/s$^2$; versor $\hat{z}$ points upward, while $\vec{g}$ downward. A mass is attached to the suspension point by massless elastic rod (always straight, but extendable) that, depending on rod's length $l$, produces on the attached mass the acceleration following Hooke's ideal spring law, and directed toward the suspension point. The full acceleration is $$ \vec{f} = - \frac{k}{m}(l-l_0) \,\frac{\vec{r}}{l} \;+ \vec{g}$$ where $l = |\vec{r}|=\sqrt{x^2+y^2+z^2}$, parameter $\frac{k}{m} = 33$ s$^{-2}$, and $l_0 = 0.1$ m is the zero-force length of the rod.

Initial conditions (position in meters and velocity of point mass in m/s) are: $$ \vec{r}_0 = (x,y,z)_0 = (-0.1, 0.2, 0.1)\; {\rm m}$$ $$ \vec{v}_0 = (v_x,v_y,v_z)_0 = (0, 0, -0.1)\; {\rm m/s}.$$

The choice of integration scheme is up to you. If you choose a leapfrog method then remember to offset the position and velocities by dt/2 (except when measuring energy $E$), take dt = $10^{-3}$ s. If you use the Euler method, compensate its smaller accuracy by a smaller dt = $10^{-4}$ s. If you use a more sophisticated algorithm (not necessary!), choose an appropriate, larger timestep dt, making sure that the relative energy error (ending vs. starting value) is $\ll 10^{-3}$. Analytically show that the specific total mechanical energy per unit mass of pendulum is given by $E = \frac{1}{2} [v^2 + \frac{k}{m}(l-l_0)^2] +g\,z$. Evaluate the relative error of the total energy, i.e. ${\Delta}E/E = [E(T)-E_0]/E_0$, at $t= T$, where $E_0=E(t=0)$.
As output, print at the end of simulation: the number of timesteps made, final position $(x,y,z)$ and length $l$, as well as $\Delta E/E$. There was a misprint in the energy formula; as a result energy conservation will not be checked in solutions.

2. Put an outer loop on the integrations, doing threedifferent simulations: First, the standard case specified above, second the version with 2 times larger dt to verify the accuracy of calculation, and third the version with slightly different value of $y_0$: $\vec{r}_0 = (x,y,z)_0 = (-0.1, 0.21, 0.1)\; {\rm m}$. Compare results with the standard case and provide numerical and physical commentary. Is the elastic pendulum solution very sensitive to initial conditions? That (plus non-periodicity that you did not explore, but the solution is non-periodic) would make our elastic pendulum chaotic. Can the evolution be reliably computed in timeframe $T$?

3. Comment in one sentence on whether the calculation would benefit from OpenMP or CUDA, and why.

Problem 2 [30 p.] Trapezoidal integration

One of the popular methods of computing 1-dimensional integrals is trapezoidal rule, in which we divide the interval of integration $x\in [0,1]$ into N equal subintervals, then approximate the areas under a curve $f(x)$ by N trapezoids, which is equivalent to N+1 point, piecewise linear approximation of $f(x)$. By using Taylor expansions, derive analytically the error term of the approximation. Use discretization step $h = 1/N$ (some would call it $\Delta x$ but it's more difficult to type than $h$). Show that this error, called truncation error, because it stems from truncated Taylor series expansion, is proportional to $h^3$ in every subinterval, and to $h^2$ on the whole interval of $x$. If you encounter second or higher derivatives of $f$ over $x$, you may assume that for the function you integrate, they are approximately 1 or less, in absolute value.

Next, write an approximate formula for the roundoff error, assuming it is a result of random addition (like random walk) of N steps of calculation, each of which requires 6 additions and multiplications. Every arithmetic operation contributes $\varepsilon^2$ to the variance (sum of squares of errors), where $\varepsilon= 10^{-7}$ is the machine epsilon, the possible error in one arithmetic operation. Show that this error drops rather than grows with $h$ (with what power and front coefficient?) because the number of steps $N$ decreases with growing $h$, making rounoff error smaller. Useful analogy: roundoff error growth is a random walk. How does the mean-square root distance covered in a random walk depend on N?

Compare the two types of error and establish the optimum discretization spatial $h$ that guarantees the least combined truncation and roundoff error in trapezoidal integration of the considered smooth function on [0,1] interval.

Problem 3 [35 p.] Adding reverberation of Massey Hall to a home studio recording

You graduate from UofT and become acoustics & DSP (digital signal processing) expert at Massey Hall, T.O. You measure that the concert hall has reverberation time equal $t_R = 2.25$ s. This means that the measured power of a very short-lasting sound (emitted at intermediate acoustic frequency), subsequently bouncing off the walls and structures many times, after $t_R$ decays, exponentially in time, by 60 dB, i.e. to $10^{-6}$ of the initial power, or equivalently $10^{-3}$ of the initial amplitude, because power $\sim $ (amplitude)$^2$. Mathematically, amplitude decays at $t>0$ as $$ K(t) = e^{-t/\tau}$$ with $\tau \approx 0.3257$ s, since $\exp(-2.25/0.3257) \simeq 10^{-3}$, as you can easily verify. Also notice that for negative argument (here, $t<0$) K is not given by this formula, instead it equals zero.

Your task is to digitally add Massey Hall reverb to a CD-quality sound recorded in a non-reverb, padded-wall studio. Let's call that recording $f(t)$, where the acoustic pressure amplitude $f$ is sampled in dt = 0.02267 ms intervals (so there is a very large number N of sampling intervals, perhaps $10^8$!), at a standard 44.1 kHz frequency used on CDs. Mathematically, the operation of replacing each sharp sound with its dragged-out copy (having a shape described by K) is a convolution.

A. First, compute a discretized version of convolution $$F(t) = \oint f(t-x) K(x) \,dx = f \star K.$$ You may think of it as creating $N_K$ copies of $f(t)$, each shifted forward in time by $x$ (that is done by subtracting x from t in argument of $f$), multiplying each copy by decay function $K(x)$, and adding together. $N_K$ is the number of samples in reverb function; $K$ does not need to have as many as $N$ samples, because the reverberation decays to fairly small, barely perceptible levels, say, in one second ($N_K = 44k$ samples). Shifted copies will stick out beyond the end time of the recording, i.e. beyond N samples. It would be OK to discard any output beyond N samples, in our example below. But in general, you want to keep a reverb continuing after the original input stops. You'll have to reserve a little more space in memory for output than for input, by however many samples you think your reverb has non-negligible amplitude. 1s time or 44100 more samples in output seems plenty enough to me. If you follow the above description, you can see that, if there is a single signal of duration $dx=0.02267$ ms at position $t_1$ in $f(t)$, then it will correctly be replaced by the a copy of it, followed by an exponentially decaying tail of shape $K(t-t_1)$.

The issue with this direct convolution procedure is that its complexity scales as $O(2 N_K \,N) $. But you see, that's not a very nice scaling, because $N_K$ is very large. (We've encounter a similar but more grave an issue in direct-summation gravity solver, where $N_K = N$ was even much larger, 'cause the pattern of force extended throughout the physical space, while the reverb pattern is more compact in time domain). But write a direct convolution subroutine or procedure in Ftn or C(++), anyway, and remove any syntax/compile-time errors from it, if you manage. This will be a pseudocode that you won't actually execute because there is no time for testig it, especially on real recordings. But take care of the complexity issue by trying to use OpenMP or CUDA, whichever you prefer. Mutithreading/CUDA could alleviate the slow speed of direct convolution somewhat.

B. There are two much better methods than direct-summation convolution, thousands of times faster, in practice. One is FFT, which we will skip here (it allows for an arbitrary convolution kernel $K(x)$, of arbitrary length and shape). The other, almost miraculously, allows you to write a streamlined method for adding reverb, using only one pass through the input recording. That algorithm's arithmetic complexity is only about $O(2N)$. But it only works with exponential decay kernel $K(t)$ above.

Hint: Notice this special property of exponential decay. The ratio between its value and the value in the previous sample is a constant factor (smaller than one). We can write this factor as $K(t)/K(t-dt) = e^{-t/\tau} / e^{-(t-dt)/\tau} = e^{-dt/\tau}$. The reverb of all the previously encountered sound amplitude values (no matter how long ago the original sound first arrived at the microphone) is simply the previous sample of the output amplitude. It carries over to the given sound sample diminished by the constant factor. At the same time, new input arrives from the original recording, and is added without reduction to output.

Write a fast routine that converts any recording of N samples to N+44100 samples (in order not to lose the reverb of the last note or sound played by your favorite musician). You don't need to bother with actual reading of input or writing of output stream (we describe how to create it in a test below.) The N you will use for testing in your code will be N = 44100, so only 1 second of the original recording $f(t)$ will be processed. In fact, to be easily verifiable, create an empty array $f$ (N zero entries), and only at indices corresponding to times = 0.02267s, 0.25s, 0.30s, and 0.50s, enter values 1.0 (amplitude 1 lasting for one sample's duration). Compute on a calculator what output you expect at time 1s and compare with the printed output array value at that particular time/index.

Problem 4 [20 p.] Correct the algo

Since all of you can read a working, uncomplicated code in Fortran, I removed all the syntax and other compile and runtime errors originally introduced.

prob4_w_issue.f95

This is the program, and it resides also in the top directory of phyd57 user on art-1, for your convenience.

This program tests various ways, including fast, parallel ways, of computing a 2-d Laplacian operator that acts to smooth an array that may be an image. 4/5 of a pixel's value is taken away from the pixel and added to the neighboring 4 cells (noth, south, east & west). The sum of coefficents is 1, preserving the total sum of values in the array.

Compile it on your machine or on art-1 in your subaccount like this
\$ pgf95d prob4_w_issue

It will compile. Run the program. It will run correctly:
art-1[187]:~\$ pgf95d prob4_w_issue
art-1[188]:~\$
art-1[188]:~\$ prob4_w_issue.x
prob4_w_issue.x
result = 54.06276 CUDA kernel
result = 53.61340 host OMP slices
result = 52.78773 host OMP
result = 52.79328 host (CPU)

Look at the output (above). Read the text of the program. What does it do? Should the results of different subroutines/procedures executed by the main program be identical, almost identical, or different? Something is a bit wrong!

Describe in solutions file what's wrong with the algorithm, what causes the differences. The issue is not in the programming details, like using different random numbers or different syntax. It is in the logic of what the algorithm does with the data, and it probably can be fixed in 15 minutes. For almost full credit, you can just find and describe the cause of the difference, but if you rename, fix, and recompile the program, that's even better! Submit the new, fixed version including your 4 last digits of student number in filename, in that case.

QUIZ [41 pt. worth 30% of final exam]

Here is the simple text file that you need to edit: finalD57-Q-2024.txt .

It will also reside for your convenience in the top directory of phyd57 user on art-1.
Copy it to your subdirectory (or computer).
Change the name to include your four last digits of student number.
Edit & submit to Quercus > Assignments > Final 2024.

Now you can do Problem 4 in full or relax.. Thank you for your participation & effort. And happy holidays soon!