Caltec (Calif. Inst. of Technology) students strictly & proudly adhere to honors code. UofT students should do the same. Copying codes produced by "AI" or browsers (i) is plagiarism in this course, (ii) easily detectable, and (iii) most of them don't actually solve the problems. As an exception (not a recommendation), you may use internet tools to resolve syntax errors in your code, if you get stuck with them (which we judge unlikely).
Initial conditions (position in meters and velocity of point mass in m/s) are: $$ \vec{r}_0 = (x,y,z)_0 = (-0.1, 0.2, 0.1)\; {\rm m}$$ $$ \vec{v}_0 = (v_x,v_y,v_z)_0 = (0, 0, -0.1)\; {\rm m/s}.$$
The choice of integration scheme is up to you. If you choose a leapfrog method
then remember to offset the position and velocities by dt/2 (except when
measuring energy $E$), take dt = $10^{-3}$ s.
If you use the Euler method, compensate its smaller accuracy
by a smaller dt = $10^{-4}$ s. If you use a more sophisticated algorithm
(not necessary!), choose an appropriate, larger timestep dt, making sure that
the relative energy error (ending vs. starting value) is $\ll 10^{-3}$.
Analytically show that the specific total mechanical
energy per unit mass of pendulum is given by
$E = \frac{1}{2} [v^2 + \frac{k}{m}(l-l_0)^2] +g\,z$. Evaluate the relative
error of the total energy, i.e. ${\Delta}E/E = [E(T)-E_0]/E_0$, at $t= T$,
where $E_0=E(t=0)$.
As output, print at the end of simulation: the number of timesteps
made, final position $(x,y,z)$ and length $l$, as well as $\Delta E/E$.
There was a misprint in the energy formula;
as a result energy conservation will not be checked in solutions.
2. Put an outer loop on the integrations, doing threedifferent simulations: First, the standard case specified above, second the version with 2 times larger dt to verify the accuracy of calculation, and third the version with slightly different value of $y_0$: $\vec{r}_0 = (x,y,z)_0 = (-0.1, 0.21, 0.1)\; {\rm m}$. Compare results with the standard case and provide numerical and physical commentary. Is the elastic pendulum solution very sensitive to initial conditions? That (plus non-periodicity that you did not explore, but the solution is non-periodic) would make our elastic pendulum chaotic. Can the evolution be reliably computed in timeframe $T$?
3. Comment in one sentence on whether the calculation would benefit from OpenMP or CUDA, and why.
Next, write an approximate formula for the roundoff error, assuming it is a result of random addition (like random walk) of N steps of calculation, each of which requires 6 additions and multiplications. Every arithmetic operation contributes $\varepsilon^2$ to the variance (sum of squares of errors), where $\varepsilon= 10^{-7}$ is the machine epsilon, the possible error in one arithmetic operation. Show that this error drops rather than grows with $h$ (with what power and front coefficient?) because the number of steps $N$ decreases with growing $h$, making rounoff error smaller. Useful analogy: roundoff error growth is a random walk. How does the mean-square root distance covered in a random walk depend on N?
Compare the two types of error and establish the optimum discretization spatial $h$ that guarantees the least combined truncation and roundoff error in trapezoidal integration of the considered smooth function on [0,1] interval.
Your task is to digitally add Massey Hall reverb to a CD-quality sound recorded in a non-reverb, padded-wall studio. Let's call that recording $f(t)$, where the acoustic pressure amplitude $f$ is sampled in dt = 0.02267 ms intervals (so there is a very large number N of sampling intervals, perhaps $10^8$!), at a standard 44.1 kHz frequency used on CDs. Mathematically, the operation of replacing each sharp sound with its dragged-out copy (having a shape described by K) is a convolution.
A. First, compute a discretized version of convolution $$F(t) = \oint f(t-x) K(x) \,dx = f \star K.$$ You may think of it as creating $N_K$ copies of $f(t)$, each shifted forward in time by $x$ (that is done by subtracting x from t in argument of $f$), multiplying each copy by decay function $K(x)$, and adding together. $N_K$ is the number of samples in reverb function; $K$ does not need to have as many as $N$ samples, because the reverberation decays to fairly small, barely perceptible levels, say, in one second ($N_K = 44k$ samples). Shifted copies will stick out beyond the end time of the recording, i.e. beyond N samples. It would be OK to discard any output beyond N samples, in our example below. But in general, you want to keep a reverb continuing after the original input stops. You'll have to reserve a little more space in memory for output than for input, by however many samples you think your reverb has non-negligible amplitude. 1s time or 44100 more samples in output seems plenty enough to me. If you follow the above description, you can see that, if there is a single signal of duration $dx=0.02267$ ms at position $t_1$ in $f(t)$, then it will correctly be replaced by the a copy of it, followed by an exponentially decaying tail of shape $K(t-t_1)$.
The issue with this direct convolution procedure is that its complexity scales as $O(2 N_K \,N) $. But you see, that's not a very nice scaling, because $N_K$ is very large. (We've encounter a similar but more grave an issue in direct-summation gravity solver, where $N_K = N$ was even much larger, 'cause the pattern of force extended throughout the physical space, while the reverb pattern is more compact in time domain). But write a direct convolution subroutine or procedure in Ftn or C(++), anyway, and remove any syntax/compile-time errors from it, if you manage. This will be a pseudocode that you won't actually execute because there is no time for testig it, especially on real recordings. But take care of the complexity issue by trying to use OpenMP or CUDA, whichever you prefer. Mutithreading/CUDA could alleviate the slow speed of direct convolution somewhat.
B. There are two much better methods than direct-summation convolution, thousands of times faster, in practice. One is FFT, which we will skip here (it allows for an arbitrary convolution kernel $K(x)$, of arbitrary length and shape). The other, almost miraculously, allows you to write a streamlined method for adding reverb, using only one pass through the input recording. That algorithm's arithmetic complexity is only about $O(2N)$. But it only works with exponential decay kernel $K(t)$ above.
Hint: Notice this special property of exponential decay. The ratio between its value and the value in the previous sample is a constant factor (smaller than one). We can write this factor as $K(t)/K(t-dt) = e^{-t/\tau} / e^{-(t-dt)/\tau} = e^{-dt/\tau}$. The reverb of all the previously encountered sound amplitude values (no matter how long ago the original sound first arrived at the microphone) is simply the previous sample of the output amplitude. It carries over to the given sound sample diminished by the constant factor. At the same time, new input arrives from the original recording, and is added without reduction to output.
Write a fast routine that converts any recording of N samples to N+44100 samples (in order not to lose the reverb of the last note or sound played by your favorite musician). You don't need to bother with actual reading of input or writing of output stream (we describe how to create it in a test below.) The N you will use for testing in your code will be N = 44100, so only 1 second of the original recording $f(t)$ will be processed. In fact, to be easily verifiable, create an empty array $f$ (N zero entries), and only at indices corresponding to times = 0.02267s, 0.25s, 0.30s, and 0.50s, enter values 1.0 (amplitude 1 lasting for one sample's duration). Compute on a calculator what output you expect at time 1s and compare with the printed output array value at that particular time/index.
Since all of you can read a working, uncomplicated code in Fortran, I removed all the syntax and other compile and runtime errors originally introduced.
This is the program, and it resides also in the top directory of phyd57 user on art-1, for your convenience.
This program tests various ways, including fast, parallel ways, of computing a 2-d Laplacian operator that acts to smooth an array that may be an image. 4/5 of a pixel's value is taken away from the pixel and added to the neighboring 4 cells (noth, south, east & west). The sum of coefficents is 1, preserving the total sum of values in the array.
Compile it on your machine or on art-1 in your subaccount like this
\$ pgf95d prob4_w_issue
It will compile. Run the program. It will run correctly:
art-1[187]:~\$ pgf95d prob4_w_issue
art-1[188]:~\$
art-1[188]:~\$ prob4_w_issue.x
prob4_w_issue.x
result = 54.06276 CUDA kernel
result = 53.61340 host OMP slices
result = 52.78773 host OMP
result = 52.79328 host (CPU)
Look at the output (above). Read the text of the program. What does it do? Should the results of different subroutines/procedures executed by the main program be identical, almost identical, or different? Something is a bit wrong!
Describe in solutions file what's wrong with the algorithm, what causes the differences. The issue is not in the programming details, like using different random numbers or different syntax. It is in the logic of what the algorithm does with the data, and it probably can be fixed in 15 minutes. For almost full credit, you can just find and describe the cause of the difference, but if you rename, fix, and recompile the program, that's even better! Submit the new, fixed version including your 4 last digits of student number in filename, in that case.
It will also reside for your convenience in the top directory of phyd57 user
on art-1.
Copy it to your subdirectory (or computer).
Change the name to include your four last digits of student number.
Edit & submit to Quercus > Assignments > Final 2024.
Now you can do Problem 4 in full or relax.. Thank you for your participation & effort. And happy holidays soon!