PHYD57. Final exam 12 December 2024
    
  QUIZ 

Type your student number here: ....... 

Edit this text file. Insert [Y] for Yes, or [N] for No into the square brackets.   
Statements may be tricky, so read carefully. A single word or number may be 
incorrect. Any "[ ]" answer MUST have at least one wrong word or number 
highlighted by enclosing them in << >>   brackets like this <<2+2=5>>. 
 
Please disregard typos. Symbol % stands for command-line prompt, Ftn means
Fortran, and symbol ^ is raising to a power, e.g. 2^3 = 8. 

Each question earns 1 pt. To make up for unintended ambiguity of questions, 3 
points will be added to your result. 
___________________________________________________________________________________
[ ] If one billion single precision floating point numbers (describing 100M 
 particles) are transferred between the CPU and GPU by a data bus at 10 GB/s, then
 it takes at least 2 s to transfer the data back and forth, to/from GPU. 

[ ] Programs in CUDA can be started and stay totally on GPU, without the help
 of CPU.   

[ ] Moore's law states that the computational performance of integrated-circuit
 processors doubles in about 2 years

[ ] A double precision, i.e. 8-Byte, float or real number (in C or Ftn)
 can represent real numbers with accuracy of 15 to 16 digit after decimal point.   

[ ] In Linux, %ls ~/one.ext two.ext  creates a soft link two.f90 that
 points to the file one.ext residing in top directory of the user.   

[ ] Intel compiler icc is able to compile C/C++ programs that include
CUDA C constructs into executable programs

[ ] OpenMP directive-driven language assumes shared memory access, i.e. all the
 created program threads must have access to the same operational memory (RAM). 

[ ] In 2003-2004, the rate of improvement of one-core processor performance 
 slowed down considerably, because the so-called power barrier no longer 
 allowed exponential increase of density of transistors on an integrated 
 circuit. 

[ ] Today, performance of supercomputers in Top 500 lists doubles on a timescale
 two times longer than that timescale in the period 1995-2005

[ ] In 1942, Atanassov and Berry computer was built to correlate signals 
 at read at the speed in excess of 4 kilobits/s from paper tapes.
 It also solved linear algebra equations, using thousands of vacuum tubes.

[ ] You sometimes could not login to art-1 linux node because your IP address
 was not listed as in the file specifying the IP subdomains allowed to use ssh

[ ] Supercomputers use up to dozens of MW of electrical power. About 1/2 of it
is emitted as heat, and 1/2 is used for computation.

[ ] Simpson's 1/3 integration method is 4th order, which means that error of
 a computed definite integral increases as N^4 (N = number of subintervals) 

[ ] Direct CPU register content manipulation can be performed in C++ language 
 
[ ] Scanning the momory allocated to a 3-d array in Ftn from start to end, 
 the first index of the array changes slowest and the last index fastest.
 The opposite is true of C/C++.   

[ ] Loading of two float from RAM to CPU can last up to 10^2 times longer than
 their multiplication. Thus bandwith to/from RAM often is a limiting factor on 
 program execution speed.  

[ ] Density of transistors on an integrated circuit has just reached 14
  MT/mm^2 (14 million transistors per mm^2). Total number of transistors 
   exceeded 140 billion. 

[ ] In gravitational N-body codes,  smoothing length is primarily introduced 
 due to the danger of encountering zero distance between particles, and then 
 NaN (not a number) values after division by such distance.

[ ] Had the curent slowdown of performance improvements of processors occurred
  in 1975 rather than 2005, most phones today would not have graphics display, 
  except for maybe 1-line calculator-style ones, and laptops would not play video.

[ ]  P ~ N^2 C V f, where P=energy usage per second to change the state of N 
 transistors from representing 1s to representing 0s or vice versa, f = clock 
 frequency of processor, C = capacitance, and V = voltage on a transistor. 

[ ] Dennard's law says that parallel performance of n processors cannot exceed 
 the single-processor performance by more than a factor 1/(1-q) where q = fraction
 of the program which can be parallelized & done by n processors concurrently.  

[ ] Operating system ARM = Advanced RISC Machine was created at AT&T Bell Labs 
 on PDP-series minicomputers 

[ ] %sftp filename.ext in Linux secures the file filename.ext in current 
directory against transfer by users other than the file owner. 

[ ] Intel corp. produced a microprocessor called 8086 in late 70's. It was a 
 16-bit CPU. Later processors had compatible instruction sets, called x86_, for
 instance x86_64 is a family of 64-bit processors by Intel and AMD.)

[ ] A six-core Intel CPU on art-1 node typically uses six program threads
and cannot speed up the single threded code by more than a factor 6 or so.
 
[ ] OpenMP has directives for making variables private, i.e. creating copies of 
    variables that are only known to a given thread of the program. 
  
[ ] After on OMP-instrumented loop in C or Ftn ends, a separate directive 
 needs to be issued to collapse the fork of multiple threads to a single thread.

[ ] Top supercomputer Frontier at Oakridge Ntnl Lab uses AMD processors and 
 exceeds the arithmetic performance of 1 EFLOP (exaflop) both nominally and in 
 tests. This means it does 10^18 double precision floating point operations/s. 

[ ] Symplectic integrators are preferable to RK4 and higher order Runge-Kutta
 schemes for the same reason ARM/RISC processor architecture is winnig with 
 CISC architecture (complex instruction set computer): simplicity of program.

[ ] Besides this, symplectic integrators have no monotonically growing errors of
 any kind. 

[ ] The number of bodies N simulated in a single program on UTSC phi cluster using 
 Intel Xeon Phi or on the whole art cluster, can exceed 1 billion. The largest 
 cosmological simulations in the world (e.g., Bolshoi, Millenium XXL) have used 
 an order of magnitude more particles.


[ ] The simulations of gas flow by UTSC graduate Jeffrey Fung have shown that a
  planet like proto-Earth, growing in a gas disk, sheds 4 counterrotating vortices. 

[ ] Calculation by Fung and Artymowicz showed that optically thin, transparent,
 gas disks surrounding a single star and subject to its radiation pressure, are
 unconditionally unstable and devolop large-scale spiral growing modes.

[ ] SPH or Smoothed Particle Hydrodynamics needs a fast neighbor-searching 
 component. One solution to this is offered by the so-called hash tables.  

[ ] Type-III migration was discovered in CFD (Computational Fluid Dynamics)
 simulations. It is a mode of migration of early planets in gas disks, in which 
 they force the gas to flow at high speed across a severely underdense gap 
 surrounding their orbits. 

[ ] Aerodynamical forces on a wing of an airliner can be computed using neural
 network of vortices. 

[ ] AI teached us that 'prediction is difficult, especially of the future', as 
 one baseball player (nicknamed Yogi Berra) once said. 

[ ] Ftn can use handwritten CUDA kernels (routines, threads), or the compiler
 can write turn loops into kernels by writing them automatically

[ ] In practice, one can achieve a typical speedup by a factor 10^2 from 
 a consumer graphics card (GPU) over the  preformance of a modern CPU program
 multithreaded with OMP

[ ] Typical memory sizes and floating point processing speeds of small computers
 at the end of 1970s were expressed in units starting with the prefix "kilo" or 
 10^3. Nowadays the prefix is "giga" or 10^9. The increase is million-fold. 
 Thanks to this computers parse and translate spoken words in real time, and 
 reliable weather prediction time increased from a few days to a few weeks.  

[ ] Neural nets are capable of balancing an inverted pendulum consisting of 2
 or even 3 arms, free to rotate around the joints.