PHYD57. Final exam 12 December 2024 QUIZ Type your student number here: ....... Edit this text file. Insert [Y] for Yes, or [N] for No into the square brackets. Statements may be tricky, so read carefully. A single word or number may be incorrect. Any "[ ]" answer MUST have at least one wrong word or number highlighted by enclosing them in << >> brackets like this <<2+2=5>>. Please disregard typos. Symbol % stands for command-line prompt, Ftn means Fortran, and symbol ^ is raising to a power, e.g. 2^3 = 8. Each question earns 1 pt. To make up for unintended ambiguity of questions, 3 points will be added to your result. ___________________________________________________________________________________ [ ] If one billion single precision floating point numbers (describing 100M particles) are transferred between the CPU and GPU by a data bus at 10 GB/s, then it takes at least 2 s to transfer the data back and forth, to/from GPU. [ ] Programs in CUDA can be started and stay totally on GPU, without the help of CPU. [ ] Moore's law states that the computational performance of integrated-circuit processors doubles in about 2 years [ ] A double precision, i.e. 8-Byte, float or real number (in C or Ftn) can represent real numbers with accuracy of 15 to 16 digit after decimal point. [ ] In Linux, %ls ~/one.ext two.ext creates a soft link two.f90 that points to the file one.ext residing in top directory of the user. [ ] Intel compiler icc is able to compile C/C++ programs that include CUDA C constructs into executable programs [ ] OpenMP directive-driven language assumes shared memory access, i.e. all the created program threads must have access to the same operational memory (RAM). [ ] In 2003-2004, the rate of improvement of one-core processor performance slowed down considerably, because the so-called power barrier no longer allowed exponential increase of density of transistors on an integrated circuit. [ ] Today, performance of supercomputers in Top 500 lists doubles on a timescale two times longer than that timescale in the period 1995-2005 [ ] In 1942, Atanassov and Berry computer was built to correlate signals at read at the speed in excess of 4 kilobits/s from paper tapes. It also solved linear algebra equations, using thousands of vacuum tubes. [ ] You sometimes could not login to art-1 linux node because your IP address was not listed as in the file specifying the IP subdomains allowed to use ssh [ ] Supercomputers use up to dozens of MW of electrical power. About 1/2 of it is emitted as heat, and 1/2 is used for computation. [ ] Simpson's 1/3 integration method is 4th order, which means that error of a computed definite integral increases as N^4 (N = number of subintervals) [ ] Direct CPU register content manipulation can be performed in C++ language [ ] Scanning the momory allocated to a 3-d array in Ftn from start to end, the first index of the array changes slowest and the last index fastest. The opposite is true of C/C++. [ ] Loading of two float from RAM to CPU can last up to 10^2 times longer than their multiplication. Thus bandwith to/from RAM often is a limiting factor on program execution speed. [ ] Density of transistors on an integrated circuit has just reached 14 MT/mm^2 (14 million transistors per mm^2). Total number of transistors exceeded 140 billion. [ ] In gravitational N-body codes, smoothing length is primarily introduced due to the danger of encountering zero distance between particles, and then NaN (not a number) values after division by such distance. [ ] Had the curent slowdown of performance improvements of processors occurred in 1975 rather than 2005, most phones today would not have graphics display, except for maybe 1-line calculator-style ones, and laptops would not play video. [ ] P ~ N^2 C V f, where P=energy usage per second to change the state of N transistors from representing 1s to representing 0s or vice versa, f = clock frequency of processor, C = capacitance, and V = voltage on a transistor. [ ] Dennard's law says that parallel performance of n processors cannot exceed the single-processor performance by more than a factor 1/(1-q) where q = fraction of the program which can be parallelized & done by n processors concurrently. [ ] Operating system ARM = Advanced RISC Machine was created at AT&T Bell Labs on PDP-series minicomputers [ ] %sftp filename.ext in Linux secures the file filename.ext in current directory against transfer by users other than the file owner. [ ] Intel corp. produced a microprocessor called 8086 in late 70's. It was a 16-bit CPU. Later processors had compatible instruction sets, called x86_, for instance x86_64 is a family of 64-bit processors by Intel and AMD.) [ ] A six-core Intel CPU on art-1 node typically uses six program threads and cannot speed up the single threded code by more than a factor 6 or so. [ ] OpenMP has directives for making variables private, i.e. creating copies of variables that are only known to a given thread of the program. [ ] After on OMP-instrumented loop in C or Ftn ends, a separate directive needs to be issued to collapse the fork of multiple threads to a single thread. [ ] Top supercomputer Frontier at Oakridge Ntnl Lab uses AMD processors and exceeds the arithmetic performance of 1 EFLOP (exaflop) both nominally and in tests. This means it does 10^18 double precision floating point operations/s. [ ] Symplectic integrators are preferable to RK4 and higher order Runge-Kutta schemes for the same reason ARM/RISC processor architecture is winnig with CISC architecture (complex instruction set computer): simplicity of program. [ ] Besides this, symplectic integrators have no monotonically growing errors of any kind. [ ] The number of bodies N simulated in a single program on UTSC phi cluster using Intel Xeon Phi or on the whole art cluster, can exceed 1 billion. The largest cosmological simulations in the world (e.g., Bolshoi, Millenium XXL) have used an order of magnitude more particles. [ ] The simulations of gas flow by UTSC graduate Jeffrey Fung have shown that a planet like proto-Earth, growing in a gas disk, sheds 4 counterrotating vortices. [ ] Calculation by Fung and Artymowicz showed that optically thin, transparent, gas disks surrounding a single star and subject to its radiation pressure, are unconditionally unstable and devolop large-scale spiral growing modes. [ ] SPH or Smoothed Particle Hydrodynamics needs a fast neighbor-searching component. One solution to this is offered by the so-called hash tables. [ ] Type-III migration was discovered in CFD (Computational Fluid Dynamics) simulations. It is a mode of migration of early planets in gas disks, in which they force the gas to flow at high speed across a severely underdense gap surrounding their orbits. [ ] Aerodynamical forces on a wing of an airliner can be computed using neural network of vortices. [ ] AI teached us that 'prediction is difficult, especially of the future', as one baseball player (nicknamed Yogi Berra) once said. [ ] Ftn can use handwritten CUDA kernels (routines, threads), or the compiler can write turn loops into kernels by writing them automatically [ ] In practice, one can achieve a typical speedup by a factor 10^2 from a consumer graphics card (GPU) over the preformance of a modern CPU program multithreaded with OMP [ ] Typical memory sizes and floating point processing speeds of small computers at the end of 1970s were expressed in units starting with the prefix "kilo" or 10^3. Nowadays the prefix is "giga" or 10^9. The increase is million-fold. Thanks to this computers parse and translate spoken words in real time, and reliable weather prediction time increased from a few days to a few weeks. [ ] Neural nets are capable of balancing an inverted pendulum consisting of 2 or even 3 arms, free to rotate around the joints.