General remarks first, applicable to all assignments. Please include your thoroughly documented/commented code and description of methods and results in a Jupyter notebook submitted via Quercus to TA Fergus H. Every important line needs to be commented to show that you correctly understand the algorithm (do not comment "z gets multiplied by y". We can read that. We what to know why you do certain things. DEADLINE DELAYED 3 days to: 8 November 2019 in the evening Problem 1 - Truncation error of second differentiation _______________________________________________________ Read the Turner et al. textbook chapter 3.2 on Numerical Differentiation, and review material in Lecture 7. Demonstrate the well known fact that the basic, symmetric, three-point stencil y''(x)_2 = [y(x+h) -2 y(x) + y(x-h)]/(h*h), where dx is the constant x-spacing between data values represented in y(x) array, has truncation term proportional to the square of h. We say it is a method of order p=2 (meaning that error~h^p). Follow the style of proofs in the book, using Taylor series. Derive and and write the full truncation error term with the correct numerical coefficient and higher derivative at x. Consider now a more accurate, 5-point second derivative stencil of order p=4, given by equation (3.8) on p. 43 in Turner's book: y''(x)_4 = [-y(x+2h) +16 y(x+h) -30 y(x) + 16 y(x-h) -y(x-2h)]/(12h*h) Derive the full truncation term and demonstrate the h^4 convergence of the formula (i.e. prove that the order of accuracy is p=4). Numerically compute the second derivative of y = sqrt(x) at x=1 with the second and fourth order methods, for uniformly spread 200 values of log10(h) between -8 and -1, and plot on the log-log diagram the absolute values of errors with respect to true analytical value of the second derivative y''(1). On the same diagram, overplot your derived truncation term. Comment on whether you obtained a good agreement of the numerical and analytical error estimates, and why they diverge for small h. Assume that computer arithmetic causes 10 *(machine epsilon)=2.22e-15 round-off error in the numerator of the second-order differentiation formula of order p=2, and 15*machine epsilon in p=4 numerator. Can you quantitatively explain the behavior of the numerical error for small h, based on these assumptions? Hint: Recall the analysis of first integration formulae in Lecture 7. Create plot of truncation errors in the same general style as you see in slides of L7, and the output of http://planets.utsc.utoronto.ca/pyth/err_d1f.py (1st order derivative study), which you can use as template for this assignment (plagiarism allowed!). Problem 2 - Another form of Titus Rule for Solar System _______________________________________________________ Planets in Solar System are at mean distances from the sun (semi-major axes "a", to be more precise) which seem to follow some rather regular spacing rule, called Titius or Titius-Bode rule, expressing the distance of planet number n, a(n), as a simple function of n. See the wikipedia for more historical background. In this exercise you will explore one promising variant of such a rule, more modern than Titius rule itself. Any such rule, and in the last couple of centuries there were some quite successful proposals, is of unknown origin (your lecturer will willingly share his opinion on how the regularities arise, ask a question during or after one of the lectures.) Find the planetary distance law of the form a(n) = x^n where a(n) is in astronomical units (AU), and n is the number -2,1,0,...,6 corresponding to: Mercury(-2), Venus(-1), Earth(0), Mars(1), Ceres(2), Jupiter(3), Saturn(4), Uranus(5), and Neptune(6). Ceres is the largest asteroid. In wikipedia, find the observed semi-major axes a (in AU) for all these objects, and place them in a numpy array of length 9, called a_obs. Place theoretical predictions in a numpy array called d, of the same length. If you keep the object numbers in a numpy array called n, then d = x**n will calculate all 9 predicted distances for a given value of parameter x. Parameter x is unknown and needs to be found from minimization of quadratic deviations between theory and observations. Let E be the standard deviation of relative error of predictions vs. observations. Write a function for E(x), calculating it as the square of the arithmetic average of relative deviations (essentially, variance measure): E = np.sum( ((d - a_obs)/a_obs)**2 )/len(d) where len(d) returns number 9. Notice again, you need to minimize the sum of squares of the relative, not of the absolute deviations of theory from observations, which would be coded without the "/a_obs" part of instruction. For some x between x=1.5 and x=2, there is a best fit to observational data, or a minimum error E. Compute the best-fit x as a root finding problem, using bisection method (described in thousands of web pages, Turner et al. textbook, and our course). Function f(x) which satisfies f(x) = 0 at the optimum x, is the slope of the previously discussed variance E(x), f=dE/dx. To the left of the minimum of E(x) the slope will be negative, and to the right positive. Which stencil you use for the calculation of the slope, that is, fist derivative dE/dx, is up to you, but we will appreciate efforts to go beyond second-order, two-point, formula, and if you do that also your comments on how much it changes the estimate of x. Plot the best-fit theory and observations on semi-log plot of a(n), and print the relative error of prediction of 9 planetary distances. Problem 3 - Newton's method for finding reciprocal of a number c ________________________________________________________________ This problem is inspired by exercises 5 and 6 in Turner et al. 2018 textbook (our textbook #2) section 5.4, on p. 166 and 167. Show that Newton’s method for finding reciprocal of c by solving 1/x − c = 0 results in the iteration x_{n+1} = x_{n} (2 − c x_{n}). Prove that the n-th and (n-1)th error are connected via (x_{n+1} −1/c) = −c (x_{n} − 1/c)**2 (quadratic decrease of truncation error). Show that x_{n+1} < 1/c, and also that if x_{n} < 1/c then x_{n+1} > x_{n}. It follows that for n ≥ 1, x_{n} is an increasing sequence, which converges quadratically to 1/c. Demonstrate those properties numerically for c = 7, starting the iteration from x_{1} = 0.09 < 1/7 and displaying the series until the x does not change any more. How does the number of iterations compare with bisection method, to reach the same near-machone accuracy? Problem 4 - Time Series Analysis ________________________________ http://planets.utsc.utoronto.ca/~pawel/pyth/IXIC.dat http://planets.utsc.utoronto.ca/~pawel/pyth/AAPL.dat http://planets.utsc.utoronto.ca/~pawel/pyth/INTC.dat are simple text files containing 8821 lines of floats, one per line, holding some price information from Yahoo Finance site on 8821 days, from market close on 22-10-1984 to market close on 18-10-2019, i.e., full 35 calendar years (give or take a couple of weekend days). IXIC is the symbol of the composite index (average over all stocks) oo U.S. electronic stock market Nasdaq, the first electronic stock market exchange and 2nd largest stock exchange in the wold, trading technological companies such as Microsoft, Apple or Nvidia. AAPL stands for Apple Co., and INTC for Intel Corp. The time series have been normalized such that they all start from value 1.000, and end at different values. There are about 8821/35 = 252 market days every year. Knowing this you are asked to display the time axis in the more informative units of years. Read in the data with np.loadtxt function and overplot it on a logarithmic price axis and linear time axis, using matplotlib.pyplot routine plt.np.semilogy(x,y,color=(r,g,b)). Use r,g,b values of 0 or 1, such that the color is black for Nasdaq, red for AAPL, and blue for INTC. A. The data shows much growth of price over the decades, including the inflation of dollars as a unit of measuring prices. Remove the inflation trend from the data, by assuming a smooth growth of all sorts of U.S. prices at the historic average rate of 3% per year. Comment on whether the historic, long-term growth of price seen in the data is mostly due to inflation. Secondly, renormalize data (multiply by appropriate constants) such that the 3 sets begin with different values end at value =1. Let's call a de-inflated and normalized time series of 8821 samples p(t). B. Using numerical differentiation, from value p(t) derive the daily change dp(t), for instance by subtracting the previous day's value. To train vectorized methods, you can avoid doing this in a loop, and instead issue instructions for whole arrays, similar to data data averaging in Oxford weather analysis posted as http://planets.utsc.utoronto.ca/~pawel/pyth/oxford-IO-4.py C. Display the histograms of dp values using pyplot function plt.histogram(), to make sure that the average is much smaller than the standard deviation dp.std() [numpy arrays can be processed by .mean(), .sum(), .std() and other methods]. For example, instruction: Arrayname[i1:i2].mean() returns the arithmetic mean of a slice of array named Arrayname from index i1 to (and including) index i2-1. D. Prove that if two time series X(t) and Y(t) are strictly linearly coupled as in Y = c*X + b (c,b=const), then coefficient of correlation r_XY is either r=+1 or r=-1, and in fact, r=sign(c). E. Create your own function computing (without using any specialized statistics modules, just Numpy) the three correlation coefficients r_NA, r_NI, and r_IA (denoting N=Nasdaq(IXIC), A=AAPL, I=INTC, for short). Definitions are placed in lecture notes to L7. Since such correlation coefficients may change with time, print a listing of 35 yearly values, as well as the average value covering all 35 years. Describe in words your results and conclusions. Extra challenge part, not for credit (for local fame only): Using simple averaging method .mean() create a function performing in a loop the operation of boxcar convolution (top-hat kernel smoothing) over an arbitrary number of days nD. A typical nD=11 would mean that from array named Array the function creates a new array sArray, in which all but the 10 border values (why 10?) are replaced by the limited-time mean values in Array, extending 5 days back and 5 days forward in time. This will allow you to compute another widely used variant of correlation, in which the mean value = sArray(t). The data in X are priceline p(t), not the daily increments. is not a global constant but a smoothly varying function of time. With such one can do summation to obtain things like covariance <(X-)(Y-)>, or the variance if you like that. is called the boxcar average or running average over the nD=2*5+1 or some other odd number of days. Display smoothed time series

and deviations from it, p-

, before you compute the statistical indicators such as the 35 yearly (or 17 biannual) correlation coefficients. Do the correlations and/or Beta coefficients computed this way differ from the correlations between differential data of daily increments dp(t)? How strong are all those correlations, and what may be the reason for it?