You'll find below problems intended to give you an idea af the kind of quiz questions and problems that you will encounted in a 1-hr midterm exam on the 1 Mar 2022 (first hour of lectures).
You can (and should, as it helps to remember things later) make up to 3 facts sheets (6 pages) of hand-written notes (no other materials allowed, calculator only allowed as technical aid).
................................................................................ Note: k-th power of x is denoted as x^k. If you think the statement is fully correct, or Yes seems much more correct than No as a judgment on the truth of the statement, then delete N and leave Y. If you answer is No then leave only N, and enclose at least one word or number, which is incorrect, in << >> brackets. If you do not indicate what is wrong with the statement, your N answer, even if correct, is treated as a guess and earns zero points for that question. If you want to leave extra explanation on why you selected Y or N, do it on the marging or just above the statement. Since an occasional ambiguity may happen in the formulation of the question, about 10% point will be given for free while marking the quiz. ................................................................................ [Y N] Antikythera Mechanism was probably built in Sicily, in the the workshop of Archimedes. [Y N] Motivation for many early computer builders was creation of astronomical tables (Wilhelm Schickard, Gottrfied Leibnitz, Charles Babbage). [Y N] In 1600s Blaise Pascal built machines that could add and subtract numbers, to aid in tax collection. No multiplication or division was implemented. [Y N] The first known use of binary system in mechanical calculators was the calculator known as 'steped reckoner' by Alan Turing from the beginning of 20th century. [Y N] Babbages unfinished, hand-cranked, Differential Engine contained 8000 parts and weighed 5 tons. [Y N] Charles Babbage has built the first 4-operations general-purpose calculator which used punched cards and was programmable (called Analytical Engine). [Y N] The pre-WWII versions of German coding machine ENIGMA were deciphered and reverse-engineered by Polish cryptologists Rejewski, Zygalski and Rozycki. [Y N] Alan Turing was computer theorist and a head of British counter-intelligence group tasked with deciphering ENIGMA-coded messages during the 2nd world war, using electromechanical relays. [Y N] Colossus, a British programmable electronic computer used to decrypt messages from German coding machine called Lorenz SZ-40, has the first high speed reader of punched paper tapes (5000 characters per second). [Y N] Since information about Colossus was classified until 1970, and other projects like those of K. Zuse were unknown, for a long time it was thought that the first computers were built by the US military (ENIAC in 1946) [Y N] ENIAC used binary numbers but its programs had no conditional branching [Y N] The basic principle of von Neumann architecture of our computers is that CPU, the central processign unit has access to operating memory storing both program and data, but which is physically separate from the memory. [Y N] High-level programming languages and compilers first appeared on mainframe computers in the late 1950s. Compiler is a programmer who operates the main console of the computer. [Y N] Control, data, and address bus are sub-parts of a system bus that sends data and memory addresses from/to CPU to/from RAM (non-volatile memory), as well as peripheral devices, such as volatile memory. [Y, N] Compiler is translator from higher-level to lower-level programming language. It may combine the function of linker, a program that translates assembly language into a binary form (0 and 1 digits) and attaches all necessary executable libraries. [Y N] Fortran compiler, released in 1957, was the first high level language, freeing the programmer from working with low-level (elementary) CPU instructions, such as is pracaticed in assembly language. [Y N] IBM computers in 1960s were mostly leased at high cost to wealthy businesses and universities. In the 1970s, Digital Equipment Corp (DEC) disrupted that model by selling a 100x cheaper minicomputer series PDP-8 to PDP-11. [Y N] Minicomputers in 1970s were sold with open (meaning non-proprietary) internal architecture to knowledgable clients in business or academia. OEMs (Original Equipm. manufacturers) started producing compatible devices, such as printers. [Y N] C language was created in 1973 in conjunction with UNIX operating system. They stayed tightly connected. Today, every Linux system has C compiler as a necessary part of software stack. [Y N] C and later C++, were created mainly for scientific computions. [Y N] First microprocessor (using integrated circuits) was Intel's 8086 CPU [Y N] 8086 was the name of a popular Intel 16-bit processor. Subsequnt generations of Intel processors with similar instruction set have a name retaining part of that name. For instance, x86_64 means all modern 64-bit processors compatible with Intel's instruction set. [Y N] Python was created as a high-level, compiled language similar to Fortran. [Y N] Apple personal computers were invented in 1980s, developing the idea of interactive use of terminals initiated by DEC Corporation on minicomputers. [Y N] There were two popular versions of Unix, from GNU project and Berkeley Software Distribution (BSD). Both were used as inspiration and parts of the new OS called Linux, created in 1991 by Linus Torvalds in Finland. [Y N] Currently, almost all supercomputers and all web servers, as well as Android and Chrome phone operating systems use Linux kernel, one of the most successful community-based open, freely available software. [Y N] In Linux, ls -l | more is a command that pipes the output of directory listing command (long format) to 'more' command which shows the output page by page or using other size of block of text. [Y N] In Linux, cp files* path1 command will copy all files whose names start with string 'files' to subdirectory given by the string path1. [Y N] In order to suspend the execution of a foreground program and put it in a background execution mode in Linux, one needs to press CTLR-Z to get the prompt, then type 'bg' +RETURN. [Y N] Linux command pwd returns the working directory's name and path. [Y N] Linux command cd path1 changes the working directory to path1 [Y N] Root or superuser in Linux is the user with full priviledges. Other users cannot issue certain dangerous commands, and access files restricted for use by the root. [Y N] Bit's value can either be True or False. 8 bits make one Byte. [Y N] ASCII is a table specifying how characters (including Latin alphabet and some text formatting symbols) are mapped to unsigned 1-Byte integer values [Y N] Hard disk with capacity 1 TB can hold up to 1 thousand million Bytes. [Y N] An 8-Byte integer can represent numbers from 0 to 2^32 -1 = [Y N] A 1-Byte integer can represent numbers from 0 to 16*16 = 256 so 8-bit Bytes are enough to code more natural languages like English. [Y N] Break-in attempts to linux systems can be seen in /var/log/secure logfiles [Y N] GNU project provides: g++ = C++ compiler, gfortran = Fortran95 compiler [Y N] ifort -openmp provides access to OpenMP by a fortran program [Y N] Laplace operator has a shape of a cross and computes the gradient of the input function. It is used in convolution, unsharp masking and other algorithms [Y N] OpenMP consists of 3 components: compiler options, library of functions and OMP compiler pragmas (directives) [Y N] Timing of the sections of code in multithreaded applications written in C or Fortran should be done using double precision function omp_get_wtime() [Y N] A loop: for(i=0; i < N; i++) { ...work...} must be preceded by OMP pragma #pragma omp parallel for private(i,...), otherwise the loop counter i will be a single variable common to all program threads, which will lead to race condition and overwriting of correct values of i. [Y N] All the variables not declared as private in a parallel loop construct will be automatically public, that is common to all threads. [Y N] OMP allows the programmer to specify a different number of threads to be spawned by each fork-join parallel block section, for instance one such section may use 8 threads and another 16 threads. [Y N] OpenMP language assumes shared memory access, that is all the threads it creates must have access to the same operational memory (RAM). [Y N] The latest 3 CPU technologies of integrated circuit fabrications are called: 130 nm, 90 nm, and 65 nm. [Y N] After 2003 or 2004, internal clocks processors stopped having gradually higher frequencies; stagnation happened at 1-4 GHz frequency, depending on the processor type. [Y N] CPU powere usage per unit area of IC chip equals P = N f C V^2, where N is the number of cores on the chip, f their clock frequency, C is capacitance of one transistor, and V its working voltage. [Y N] During the reign of Moore's law, power supply to a unit area of a microchip P = N f C V^2 was doubling after the switch to the next level of miniaturization. [Y N] Automatic vectorization of a program is performed by compiler, and can be requested via directives by the programmer. Compiler creates code using a lareg number of wide (256 or 512 bit) AVX registers which can perform simultaneous multiplications and/or additions. [Y N] In multiple, nested loops, vectorization should occur in the innermost loop [Y N] Multithreading via OpenMP creates fork-join parallel region encompassing a loop construct [Y N] In multiple, nested loops, OpenMP mutithreading should be applied to the innermost loop [Y N] Number of cores in CPUs produced after 2005 started climbing up from 1 to currently about 20 per processor, while the uniprocessor (core) performance stagnatesd. [Y N] The most powerful supercomputers today have arithmetic speed capability of order 1000 PetaFLOPS = 1 ExaFLOPS. [Y N] In near-guture, indicators of performance of the processor will improve by a factor of two not in 2 years like in the last half-century, but in ~7 years Original Moore's law has ended, or alternatively, it still holds but the progress slowed down considerably. [Y N] Up to several years ago, every ~2 years, 15 times, the size of transistors etched onto a silicon wafer was decreased by a factor sqrt(2). [Y N] The end of energy scaling characterizing the Moore's law after 2003-2008 necessitated freezing the single core performance and introduction of more cores per processor. This, in turn, necessitated transition to parallel programming (multithreading). [Y N] We do not need to learn methods of parallel programming if we use efficient languages like C, Julia or Fortran. [Y N] In 1999, 1 out of 500 supercomputers was in China. In 2019, 220 (44%) of the top 500 supercomputers in the world were in China. [Y N] The top supercomputers use 10 to 20 MW of power, and cost ~300 million dollars. These numbers are similar for just one big airliner. [Y N] No.1 supercomputer Summit has about 4600 nodes, containing tens of thousands of Nvidia graphics cards. Summit can perform 150 TFLOPS (150 * 10^12 floating point operations per second). [Y N] If 1 million particles interact with each other pairwise, and calculation of one interaction takes 100 ns (nanoseconds), then a well-written sequential program will take almost 1.5 hours to evaluate all interactions. [Y N] Random walk of 10000 steps of length 1 will on average end at location 100 units away from the starting point. [Y N] Compiler lists all the errors a program generates during the compilation, while the interpreter (such as Python) lists them only after encountering errors during execution of the script. We thus know all that can go wrong with a compiled program before we run it. [Y N] Arrays in C are zero-based (start with index 0), while in Fortran they are 1-based. [Y N] This loop statement in C has correct syntax: for(a=0,b=10;a+b<10;) [Y N] In fortran, loop 0,4,8,...,512 starts like: do i = 0,512,4 [Y N] k=1; if(k<=2) k=2*k+1; This line compiles OK as C, C++, and F95 code [Y N] Scanning RAM allocated to a 3-dimensional array in C/C++ from start to end, the first index of the array changes fastest and the last index most slowly. [Y N] Communication speed between the RAM and GPU (graphics processing unit) on certain graphics card is 240 GB/s. You want to process 0.5 GB of data. Arithmetic throughput for 4-Byte floats is of order 2 TFLOPS (TeraFLOPS). The bottleneck is thus in arithmetic computations, not in communication (bandwidth). [Y N] Buildings on campus are connected with 1 Gbit/s ethernet network (one-directional throughput of data is in practice, let's say, 0.8 Gbit/s). To transfer a database of 800 million double precision C or Fortran floating point numbers between the buildings, one needs almost 11 minutes. [Y N] The knapsack problem is this: from among N given items, each having a value and a weight, a thief wants to put as much value in a knapsack, which can only hold a certain weight of items. How to best choose the items? This problem has an easy, non-optimal, solution: consider all possible combinations of items. Of order O(N * 2**N) additions are needed for the solution. [Y N] I have read all Lecture notes and all the pages linked to our course page.
Algorithms should be implemented in C (or C++) or Fortran95.
We are looking for correct algorithms and will overlook what we
judge as simple syntax errors. For example, if you forget the precise
syntax of calling the Fortran subroutime:
call random_number(tab)
and write some other similar name or mistake it for a function and write
tab = random_number(), but explain satisfactorily in comments
what the function or routine does , your markes will not be reduced.
[REMEMBER to thoroughly comment your code!!! Short, single-line comments
should appear in the program. Number and place long explanations
outside the program, only putting reference to them in a comment line.
Lack of thorough comments will lead to reduced mark.]
Glaring lack of knowledge of basic language constructs such as loops, arrays, conditional and arithmetic statements, writing code in Python, or only outlining the program structure in some metalanguage, will result in reduction of mark down to about 70% of maximum, but not to zero (it is much better to write a Python code, or an awful/buggy C code than nothing).
_____________________________________________________________________ Problem 1. Write a main program that declares a 4-Byte float (or real) two-dimensional array Tab holding 6 floats representing x,y,z,vx,vy,vz, for N particles, where N is a constant equal one million. Fill the array with values randomly (uniformly) distributed in the the range 0. < val < 1., for every variable var = x,y,... Particles attract each other pairwise with force F_ij_x = -(dx) * (dx*dx + dy*dy + dz*dz)**(-3./2.) F_ij_y = -(dy) * (dx*dx + dy*dy + dz*dz)**(-3./2.) F_ij_z = -(dz) * (dx*dx + dy*dy + dz*dz)**(-3./2.) where position of particle i with respect to particle j is a vector (dx,dy,dz), and F_ij vector acts on particle i. All particles have mass equal m = 1 in code units. Time step is equal dt = 0.001 in code units. Show that the above equations represent a Newtonian or Coulomb -1/r^2 force law. Write a function step() containing a code that performs one time step of the simulation of the evolution of the system of N particles. The algorithm can be Euler (1st order) or leapfrog (2nd order accuracy in time) but in each case, meaning of variables and the step-by-step scheme of the algorithm, should be explained. Insert OpenMP pragma/directive to parallelize the execution with K threads, where K is integer constant defined at the beginning of the program. Problem 2. Given a double precision type function fun(x), satisfying fun(a)*fun(b) <= 0. for [a,b] = [0.,1.], implement in C or Fortran the Newton-Raphson search for the root of fun() in interval [a,b]. In other words, find x such that fun(x)=0. Print information about every iteration of the method, including an estimate of the current value of fun(x) and the estimated error of x (assuming fun(x) is linear near the zero). Final accuracy of root position should be better than 3e-5 * |a-b|. If it can't be achieved, appropriate statement after 30 iterations should be printed and the program should finish. Problem 3. N = 120 small model cars whose size can be neglected in this problem are initially distributed uniformly on a frictionless, straight, air track extending from x=0.0 to x=L=12 m. Initial position of particle k is x_k = k * L/N, k = 1,...,N. Initial speed is +-10 cm/s, left or right direction being chosen at random. Upon encountering a neighbor, cars bounce off elastically (without dissipation of energy, therefore preserving speed). Cars fall off the ends of track. Simulate the motion of cars until the last one falls off the track. Optimize the speed of the program using OpenMP with 12 threads. Print final time. Note: There is a simple analytical solution. Hint: Random number generation in Fortran. You can fill the whole array x or a single value x with random number(s) between 0. and 1. like so: call random_number(x); v = 20.*(x-0.5). Array v will hold random values +-10. Hint: Random number generation in C/C++. There is a standard library function (so you must #includein the header of your code) called 'rand()'. It returns one integer from 0 to a large number much exceeding the number of objects you simulate. In C/C++, you randomize the array of initial velocities by creating N remainders of division by 2 in a loop. Expression 10 * (2 *(rand()%2) -1) has a value +-10 with equal probability. Problem 4. You are given a function U(x,y,z) describing the stationary gravitational potential per unit mass of test particle, as a function of coordinates x,y,z, each in interval [-10,10]. U(x) has a single minimum at (0,0,0). Notice that U is not force but the potential per unit mass and its units are J/kg =(m/s)^2. Acceleration is gradient of U times (-1). In the beginning of simulation, randomly distribute N non-interacting, massless point-like test particles in a sphere of radius r=10 around (0,0,0). Initial speeds are zero. You can choose Cartesian coordinates in appropriate boundaries forming a cube, but do not accept particles which are located outside radius r. (Notice that the loop generating particles must be appropriately longer than N passages!) Using function U(x,y,z), write a simulation of motion of the cloud of particles. Stop the simulation after time t=100 and print the number of particles located at r>10 then. Problem 5. Fortran array A(2000,2000) or C array A[2000][2000] contain 4-Byte floats and are declared outside of the main program (to avoid allocation in stack. If you program in Fortran, declare this array in a module.) Write a program in C of F95, which scans the array to find and print the minumum value of the Laplace operator d^2A/dx^2 + d^2A/dy^2. The array represents a square grid, each cell represents 0.01 mm x 0.01 mm. Laplace operator must be assembled from two 2nd order-accurate methods in two directions. Symmetric stencils will assure that. Optimize the execution speed with OpenMP. Pay attention to indices i and j when accessing A(i,j) elements near boundaries: do not refer to elements which are outside the array. Boundaries should be periodic, i.e. wrap around as if A was living on a torus or donut. Print the result in an informative way (value, description of what it represents, time of execution according to wall clock, using omp_get_wtime()).