# Using NumPy Arrays to Calculate pi ````{admonition} Overview :class: overview Questions: - What are the differences between numpy arrays and lists? Objectives: - Be able to name the differences between Python lists and numpy arrays. - Understand the idea of broadcasting. ```` ## Rewriting our calculation of pi using NumPy We will now return to our calculation of $\pi$. Instead of using the Python Standard Library functions, we will examine how this calculation could have been made much simpler using the NumPy library. Recall our former code for calculating $\pi$ ````{tab-set-code} ```{code-block} python n_samples = 100 num_inside = 0 for i in range(n_samples): # Generate a random point between 0 and 1 for x. x = random.random() # Generate a random point between 0 and 1 for y. y = random.random() # Calculate the radius of this point r = math.sqrt(x ** 2 + y ** 2) if r < 1: num_inside += 1 pi = 4 * (num_inside) / n_samples ``` ```` To start using numpy, we will first import it. ````{tab-set-code} ```{code-block} python import numpy as np ``` ```` We set the same number of samples: ````{tab-set-code} ```{code-block} python n_samples = 100 ``` ```` Next, we generate 100 random points. In our previous code, we did this using the `random.random` function in a `for` loop. NumPy also has a `random.random` function. You can try it out: ````{tab-set-code} ```{code-block} python np.random.random() ``` ```` ````{tab-set-code} ```{code-block} output 0.627385792598836 ``` ```` When no argument is given to `np.random.random` it behaves exactly as we are used to `random.random`. It gives us one random number. However, you can also give the function a size to generate an array of random numbers. For us, we would like a random array that is size `n_samples, 2`. ````{tab-set-code} ```{code-block} python random_numbers = np.random.random((n_samples, 2)) ``` ```` If you check this variable, you will see it is an array with 100 rows and 2 columns. The next thing we need to do is to square and sum the x and y values. We can square all the values at once because of element-wise operations: ````{tab-set-code} ```{code-block} python r_squared = random_numbers ** 2 ``` ```` Next, we want to sum all of the rows of r_squared. We will want to use the numpy function `sum` which we learned about in the last module. Executing ````{tab-set-code} ```{code-block} python np.sum(r_squared) ``` ```` will give us the value of the sum of all the elements. We want a sum of the rows, so we will want to use the `axis` arugment. To figure out the axis we want, check the shape of `r_squared` ````{tab-set-code} ```{code-block} python r_squared.shape ``` ```` ````{tab-set-code} ```{code-block} output (100, 2) ``` ```` If we want the sum of the rows, we should use `axis=1 `. We expect that this sum should have 100 values, so check this after your calculation: ````{tab-set-code} ```{code-block} python vals = np.sum(r_squared, axis=1) vals.shape ``` ```` ````{tab-set-code} ```{code-block} output (100, ) ``` ```` Next, we can easily count the number of points inside the circle using logical comparison with the numpy array. ````{tab-set-code} ```{code-block} python points_inside = vals < 1 num_inside = np.sum(points_inside) ``` ```` Finally, we estimate $\pi$ ````{tab-set-code} ```{code-block} python pi = 4 * num_inside / n_samples ``` ```` Final code: ````{tab-set-code} ```{code-block} python import math import numpy as np # Start by using 100 samples. n_samples = 100 random_numbers = np.random.random(size=(n_samples,2)) vals = np.sum(random_numbers**2, axis=1) vals = np.sqrt(vals) num_inside = np.sum(vals < 1) pi = 4 * num_inside / n_samples ``` ```` ## Comparing Performace : NumPy vs Python Standard Library Let's try timing how long it takes to calculate $$\pi$$ with 10 million samples for each method: ````{tab-set-code} ```{code-block} python import time import random start = time.time() # Start by using 100 samples. n_samples = 1000000 num_inside = 0 for i in range(n_samples): # Generate a random point between 0 and 1 for x. x = random.random() # Generate a random point between 0 and 1 for y. y = random.random() ## Calculate the radius of this point. r = math.sqrt(x ** 2 + y ** 2) # if this value is less than 1, the point is inside the circle. if r < 1: num_inside += 1 4 * num_inside / n_samples end = time.time() elapsed_time = end - start print(elapsed_time) ``` ```` ````{tab-set-code} ```{code-block} output 0.6184561252593994 ``` ```` Do the same thing with the method which uses NumPy: ````{tab-set-code} ```{code-block} python import time start = time.time() n_samples = 10000000 random_numbers = np.random.random(size=(n_samples,2)) vals = np.sum(random_numbers**2, axis=1) num_inside = np.sum(vals < 1) pi = 4 * num_inside / n_samples end = time.time() elapsed_time = end - start print(elapsed_time) ``` ```` ````{tab-set-code} ```{code-block} output 0.059281349182128906 ``` ```` We can see that our NumPy version is about 10 times faster than the one written in pure Python. This is a huge advantage to using NumPy - it can make your Python code much faster. In our case, we were only waiting 5 seconds for our first calculation, so the pure Python version may not bother us that much. However, there are some cases (such as our MC simulation) which become very slow using pure Python for large systems. For your homework, your task (it's a bonus), is to rewrite your MC simulation to use NumPy and to test the speed up. See the description on the next page for more information! ````{admonition} Key Points :class: key - NumPy arrays which are the same size use element-wise operations when added or subtracted. - NumPy uses something called *broadcasting* for arrays which are not the same size to allow arrays to be added or multiplied. - NumPy has extensive documentation online - you should check this out if you need to do a computation. ````