Assignment 3, PHYD57
Due date and time: 8 Nov (Fri) at 11am
We denote first and second -derivative as ≡
, and ≡
.
Problem 3.1 [25 pts] Research the history of GPGPU (i.e., GPU computing)
We are going to talk about GPU computing during Lectures, but will not
cover in detail the history of GPU computing. Study it using numerous
resources on:
(i) course web page, Links > Parallel computing - CUDA C & CUDA Fortran,
(ii) auxiliary page /xx with books which sometimes provide historical introduction,
(iii) internet
In your 2-page writeup (max. 3 pages please) mention not just the history
in the sense of who did what & when, but also how GPU computing differs
from CPU computing. You may include some diagrams or short code snippets
(in Word or Latex or jupyter notebook format).
Try to capture the essential things; it's easy to get carried away
citing various interesting but not essential information and suddenly
run out space (no more than 3 pages please). Cite more than two sources,
mention the types of applications GPU has found, and personal experience
with graphics computing, if any.
Problem 3.2 [25 pts] Stencils, stability and CFL numbers
Introduction and the definition of the CFL number
Much of numerical Physics revolves around the solution of PDEs (partial
differential eqs.) Everything that happens in n-dimensional space and
involves time evolution will likely be described by a PDE, specifying how
fast the quantities involved change in time at a given location on a
stationary or moving grid (discretizing the space with local
cell sizes , , etc.). Time is discretized too,
into intervals of duration .
The CFL (Courant-Fredrichs-Levy) condition connects , ,
... and in wave propagation and fluid flow problems, governed by
the so-called hyperbolic equations like Navier-Stokes equation, ideal
gasdynamics equations called Euler equations, various water wave equations
(deep water, shallow water equations). Essentially, the meaning of such
conditions is to specify when a given computational stencil, i.e.
pattern of combining the data from previous time step to produce the updated
value at a given location in the next time step, works ok,
and when it produces instabilities. Instabilities are not just truncation errors,
meaning - inaccuracies caused by numerical approximations to math oprations like
the derivatives and integrals. Instabilities introduce expopnentially growing
errors quickly making the solution non-physical.
There are so-called implicit schemes,
which use stencils transferring information about the future (next timestep) state
of the spatial neighbors of a given cell, in order to compute the future
state of that cell, and many of them don't have a CFL-like condition, because
they always are stable, no matter how large one uses. Such schemes are
said to be unconditionally stable. The problem is that they require
information (about neighbors) which is not yet available. In practice, one can device
iterative scheme, which will derive better and better approximations to all
the future values of variables on the grid, but there is a price to pay for this:
implicit schemes are computationally expensive. This forces the user to
adopt larger timesteps than in explicit schemes, with adverse effect on accuracy.
Some of the best available CFD algorthms (Computational Fluid Dynamics) are therefore
explicit.
You have learned much about the approximators of the 1st and 2nd
derivatives on a grid. Now is the time to see whether they work in the context of
some popular PDEs. One example that we consdered in Lectures is the von Neumann
analysis of error propagation in the heat (or diffusion)
equation ,
when the first order, forward time differencing is used for time, and 2nd order
symmetric formula for the r.h.s. 2nd derivative is used. The stencil looks like this
*
|
o___o___o
The time is going up by one timestep, 3 spatial gridpoints are shown as o. In more
dimensions we have 5 o's in 2-D, and 7 o's in 3-D. The discretization of 1-D heat
equation PDE is goes as follows:
or
We mark the timestep number as upper index, and the spatial
cell number as lower index. Notice that we could have denoted the
nondimensional constant as a CFL number.
We found that CFL is sufficient and necessary for the stability
of the considered scheme, i.e. the timestep must satisfy the inequality
. Check the lecture notes.
The physical meaning of is the diffusion time after
which a local peak or trough of of width gets significantly
leveled by diffusion. Timestep is a couple of times smaller than diffusion
timescale across one cell, for the algorithm to be able to adequately
describe the diffusion or temperature changes.
Stability of schemes for advection and Schrödinger equation
The 1-D advection equation describes transport of quantity described by function
across the -axis:
where is the propagation speed (its sign determines direction of
propagation). In multidimensional space we'd write
. In 1-D ,
in 2-D it is and so on.
The time-dependent Schrödinger equation describes the time evolution
of , the quantum-mechanical wave function:
where is the mass of the particle represented by the wave function,
is reduced Planck's constant, and is Laplace
operator. In 1-D, ,
in 2-D it is , and so on. We are
interested in the simple case of free-space propagation of the wave function,
.
Show that discretization of either of the above equations using the same
numerical approximations to and as we discussed above
in case of the diffusion equation, is unconditionally unstable. That is, there
is no CFL relation guaranteeng stability, because the numerical solution always
blows up.
So what do we do in such a situation?
We have to employ more sophisticated explicit methods, or implicit methods.
The Klein-Gordon and the wave equation. CFL limit
There is a number of wave equations in physics that describe the propagation of
disturbance in some medium, or quantum-mechanical probability wave functions in
vacuum. Notice that Schrödinger equation does not contain the speed of light
. This is because it applies to nonrelativistic particles only, a fact that
initially saddened its discoverer so much that he seriously considered not
publishing it.
One attempt was to formulate a wave equation patterned after the
relativistic energy triangle equation
where is the rest mass of a particle, its relativistic momentum
(in general not equal to since that product would be limited in value
because of requirement), and total energy. This gave the
Klein-Gordon equation which, as we now know, describes the spin-zero
particles such as the Higgs boson.
Let's consider the application to massles particles first. If , the
Klein-Gordon equation looks exactly like the familiar wave equation of
clasical physics, although the function is the same quantum-mechanical
wave function in Schrödinger equation, and is in general complex-valued.
Use von Neuman's analysis, where spatial and time dependence of
error is assumed to be split into Fourier components that depend on position
and time like ,
the being the wave vector and the wave's frequency.
Show that any wave equation (including the homogeneous, mass-less
Klein-Gordon equation) can be successfully solved numerically with the 2nd
derivative approximator (2nd order accuracy) used in heat equation.
Derive the critical CFL number. The explicit numerical stencil has the form
*
|
o___o___o
|
o
Time is running upwards (2 time steps) and space is extending horizontally; we
illustrate only one spatial dimension, but analogous multidimensional stencils
can be written down. The CFL number will depend on the dimension of space!
This is how we define CFL number in this equation, for brevity denoting it
as in 1-D:
or
Let us illustrate the 2-D wave equation by labeling spatial locations with indices
or
I don't think that having different cell sizes in different directions on the
grid makes the task much more difficult. But if you prefer - you can consider
a spatial grid with .
Interpret your result by comparing the numerical speed of propagation of
information on the uniform space-time grid, and the physical wave-crest
velocity .
HINT: Follow the philosophy that was used for diffusion equation in L5,
use the trick . Judge at which
values of there are no real-valued 's that satisfy the
equation, assuming a real-valued (or if dimension ).
OPTIONAL PART:
Finally, if you dare, derive the stability criterion for the Klein-Gordon
equation with , discretized as above. This is not necessary to get full
points for this problem! But if you do it, some mistakes in the earlier parts
of this problem will be forgiven.
Problem 3.3 [40 pts] N-body problem: Gravitational collapse
Write a code in C or Fortran that solves the N-body gravitational problem
by 4th order symplectic method (cf. L7). The code should be modular, i.e. write
separate procedures/subroutines to: initiate the positions and velocities,
compute accelerations by direct summetion (that part is the bottleneck, as
it is complexity, so it needs to be well parallelized by OpenMP
multithreading), compute instantaneous value of total energy and momentum
of the system for monitoring accuracy, and the integrator that does kick
and push substeps, as well as the main program that calls those procedures.
Unless you have other ideas, keep all the variables for all particles:
coordinates, velocity components, accelerations, in one array dimensioned
9 by N, or N by 9 (depending on the C/Fortran language, since C and F store
rows and columns of arrays differently, and it is usually better
to keep all data belonging to one particle physically close in RAM).
Consider an initially motionless, uniform-density, spherical cloud of
randomly placed points, extending to radius . The total mass
is equal , and each body's mass is . The gravitational
constant is . [You are free to use the actual physical constants and
dimensional values. I'll not omit () or in the formulae below.
The task you're facing will not depend on what values of and you use.
Only the and the simulated times will have different values, but the
number of steps taken or wall clock time should not be much different.]
The acceleration acting on particle is equal
where ,
, and
the sum is done over all particles other than .
It's not required, but you may use non-zero softening parameter
, effectively limiting the strength of gravity
force at distances . It's rare that
particles are that close to each other, but then their trajectories will
be computed more reasonably (as if gravity was actually non-Newtonian at
small distances, or if gravity was Newtonian but particles were
really fuzzy balls of radius of much smaller components.
In astronomy, such objects with potential
are known as Plummer spheres.)
Monitor the relative change of the total mechanical energy
and the instantaneous total momentum of the system divided by ,
where and
Choose a constant timestep that assures good conservations
of energy and nearly zero total momentum during the simulation.
State the absolute value of momentum divided by .
Try to keep the relative energy error below . Consider any
results where the relative error is larger than 0.01 garbage,
affected by a very close encounter of particles.
If the running time is too long, kill the program by CTRL-C
after 10 or 15 minutes and run it again with much smaller ,
it should then finish quickly. Establish the biggest N
that gives reasonably short runtime by trial and error guided
by the method of bisection. Hopefully you won't have to decrease N,
but who knows..
You will finish the computation once the following happens. Write a
subroutine or procedure, which you will call once in a while, not after
every timestep, since that would slow down the working of the code
unnecessarily, but maybe every 10 or more steps. It will return
some information about the system, such as and , as well
as the mean radius of the system
and the mean velocity, .
The ratio tells you the average crossing time, which is
similar to the typical infall timescale (initially) and the
orbital period of a particle at the end.
The code should finish when the time exceeds , by which time we
will consider that the system settled into equilibrium.
The main goal of your code is to find by what factor the mean radius
of the system in final equilibrium is smaller than the initial mean radius.
(The virial theorem can be used to make a prediction for that ratio,
and you can try to make the prediction, but that is not required.)
Some particles may escape. Don't include in the average
radius those, which escaped beyond 10 times the initial ball's radius, and don't
plot them (you can choose to plot only the particles within 3 initial radii).
But study wherether they escaped because of code inaccuracies- by varying dt.
If there is a noticeble change in results when making energy more
exactly conserved (smaller dt) then it might mean that even dE/E ~ 1e-5 is not
sufficient to converge to a reliable answer. Consider adding a small number
called softening parameter ε2 ~ 10-10
to the square of every particle-particle distance to limit acceleration
and decrease errors due to close encounters. Cf. the
for a proper way to modify the equation.
Cluster evaporation is a physical phenomenon, but the theoretical timescale
of it (to lose significant part of the cluster) much exceeds 20 crossing times.
Code details need to be described in comments, and in a separate text
that will also include the results and their discussion.
Problem 3.4 [10 pts] A variant of code in 3.3
Once the code in problem 3.3 is debugged and works well, create a separate
version with a variable timestep, updated after each timestep to a value
depending on the minimum distance of any two masses (this
value should be established as part of the acceleration calculation,
not by a separate procedure finding all pairs of particles again!).
The dependence should be , with
being a constant you change to obtain a satisfactory conservation
of the integrals of motion and . Submit the file with the
variant of the code. Compare the results and timing of the two kinds of
time-integration.
What about the following idea for another optimization. The method, as
described above, computes a given pair twice, once while
summing the acceleration on particle , and another time doing summation
for particle . Meanwhile, Newton's 3rd law states that the second
calculation of force is redundant, because the force is equal and opposite
to the one already computed (both acceleration can be updated simultaneously).
One can only consider pairs that satisfy or , to reduce
the number of 3-D distances calculated. But will that work well with OMP?
You can try and tell us what happened. (We are looking forward to
your bright ideas on code optimization. E.g., you might check whether
9 by N or N by 9 arrays work faster, whether putting them on stack or
heap, by palcing the declaration in or outside a procedure or fortran module,
works better. Just remember to increase OS stack size if you get error
messages having to do with memory allocations, or segmentation faults,
which is the typical symptom of those troubles. This is just a suggestion.
Describe your efforts even if they didn't bring much benefit. We ask you to
describe your solution to 3.3 and 3.4 in a text file, points will be
cut for submitting only the codes.)
Pawel Artymowicz /
pawel.artymowicz@utoronto.ca