Numpy Tutorial 1

From Quantwiki
Jump to: navigation, search

########################################
# A brief introduction to numpy arrays #
########################################
#
# http://wiki.quantsoftware.org/index.php?title=Numpy_Tutorial_1
#
# Prereqs: Basic python. "import", built-in data types (numbers, lists, 
#          strings), range
#
# This short tutorial is mostly about introducing numpy arrays, how they're
# different from basic python lists/tuples, and the various ways you can
# manipulate them.  It's intended to be both a runnable python script, and
# a step by step tutorial. 
#
# This tutorial does NOT cover
# 	1) Installing numpy/dependencies. For that see 
#
#			http://docs.scipy.org/doc/numpy/user/install.html
#
#	2) Basic python. This includes getting, installing, running the python
#		interpreter, the basic python data types (strings, numbers, sequences),
#		if statements, or for loops. If you're new to python an excellent place
#		to start is here:
#
#			http://docs.python.org/2/tutorial/
#
#	3) Any numpy libraries in depth. It may include references to utility
#		functions where necessary, but this is strictly a tutorial for 
#		beginners. More advanced documentation is available here:
#
#			(Users guide)
#			http://docs.scipy.org/doc/numpy/user/index.html
#			(Reference documentation)
#			http://docs.scipy.org/doc/numpy/reference/
#
#
#
#
## Lets get started!
print "Importing numpy"
import numpy as np

## This loads the numpy library and lets us refer to it by the shorthand "np",
## which is the convention used in the numpy documentation and in many
## online tutorials/examples 

print "Creating arrays"
## Now lets make an array to play around with. You can make numpy arrays in
## a number of ways,
## Filled with zeros:
zeroArray = np.zeros( (2,3) ) # [[ 0.  0.  0.]
print zeroArray               #  [ 0.  0.  0.]]

## Or ones:
oneArray = np.ones( (2,3) )   # [[ 1.  1.  1.]
print oneArray                #  [ 1.  1.  1.]]

## Or filled with junk:
emptyArray = np.empty( (2,3) ) 
print emptyArray

## Note, emptyArray might look random, but it's just uninitialized which means
## you shouldn't count on it having any particular data in it, even random
## data! If you do want random data you can use random():
randomArray = np.random.random( (2,3) )
print randomArray

## If you're following along and trying these commands out, you should have
## noticed that making randomArray took a lot longer than emptyArray. That's
## because np.random.random(...) is actually using a random number generator
## to fill in each of the spots in the array with a randomly sampled number
## from 0 to 1.

## You can also create an array by hand:
foo = [ [1,2,3],
        [4,5,6]]

myArray = np.array(foo) # [[1 2 3] 
print myArray           #  [4 5 6]]


print "Reshaping arrays"
## Of course, if you're typing out a range for a larger matrix, it's easier to
## use arange(...):
rangeArray = np.arange(6,12).reshape( (2,3) ) # [[ 6  7  8]
print rangeArray                              #  [ 9 10 11]]

## there's two things going on here. First, the arange(...) function returns a
## 1D array similar to what you'd get from using the built-in python function
## range(...) with the same arguments, except it returns a numpy array
## instead of a list.
print np.arange(6,12) # [ 6  7  8  9 10 11 12]

## the reshape method takes the data in an existing array, and stuffs it into
## an array with the given shape and returns it.  
print rangeArray.reshape( (3,2) ) # [[ 6  7]
                                  #  [ 8  9]
                                  #  [10 11]]

#The original array doesn't change though.
print rangeArray # [[ 6  7  8]
                 #  [ 9 10 11]

## When you use reshape(...) the total number of things in the array must stay
## the same. So reshaping an array with 2 rows and 3 columns into one with 
## 3 rows and 2 columns is fine, but 3x3 or 1x5 won't work
#print rangeArray.reshape( (3,3) ) #ERROR
squareArray = np.arange(1,10).reshape( (3,3) ) #this is fine, 9 elements


print "Accessing array elements"
## Accessing an array is also pretty straight forward. You access a specific
## spot in the table by referring to its row and column inside square braces
## after the array:
print rangeArray[0,1] #7

## Note that row and column numbers start from 0, not 1! Numpy also lets you 
## refer to ranges inside an array:
print rangeArray[0,0:2] #[6 7]
print squareArray[0:2,0:2] #[[1 2]  # the top left corner of squareArray
                           # [4 5]]

## These ranges work just like slices and python lists. n:m:t specifies a range
## that starts at n, and stops before m, in steps of size t. If any of these 
## are left off, they're assumed to be the start, the end+1, and 1 respectively
print squareArray[:,0:3:2] #[[1 3]   #skip the middle column
                           # [4 6]
                           # [7 9]]

## Also like python lists, you can assign values to specific positions, or
## ranges of values to slices
squareArray[0,:] = np.array(range(1,4)) #set the first row to 1,2,3
squareArray[1,1] = 0                    # set the middle spot to zero
squareArray[2,:] = 1                    # set the last row to ones
print squareArray                       # [[1 2 3]
                                        #  [4 0 6]
                                        #  [1 1 1]]

## Something new to numpy arrays is indexing using an array of indices:
fibIndices = np.array( [1, 1, 2, 3] )
randomRow = np.random.random( (10,1) ) # an array of 10 random numbers
print randomRow
print randomRow[fibIndices] # the first, first, second and third element of
                             # randomRow 

## You can also use an array of true/false values to index:
boolIndices = np.array( [[ True, False,  True],
                          [False,  True, False],
                          [ True, False,  True]] )
print squareArray[boolIndices] # a 1D array with the selected values
                                # [1 3 0 1 1]

## It gets a little more complicated with 2D (and higher) arrays.  You need
## two index arrays for a 2D array:
rows = np.array( [[0,0],[2,2]] ) #get the corners of our square array
cols = np.array( [[0,2],[0,2]] )
print squareArray[rows,cols]     #[[1 3]
                                 # [1 1]]
boolRows = np.array( [False, True, False] ) # just the middle row
boolCols = np.array( [True, False, True] )  # Not the middle column
print squareArray[boolRows,boolCols]        # [4 6]

print "Operations on arrays"
## One useful trick is to create a boolean matrix based on some test and use
## that as an index in order to get the elements of a matrix that pass the
## test:
sqAverage = np.average(squareArray) # average(...) returns the average of all
                                    # the elements in the given array
betterThanAverage = squareArray > sqAverage
print betterThanAverage             #[[False False  True]
                                    # [ True False  True]
                                    # [False False False]]
print squareArray[betterThanAverage] #[3 4 6]

## Indexing like this can also be used to assign values to elements of the
## array. This is particularly useful if you want to filter an array, say by 
## making sure that all of its values are above/below a certain threshold:
sqStdDev = np.std(squareArray) # std(...) returns the standard deviation of
                               # all the elements in the given array
clampedSqArray = np.array(squareArray.copy(), dtype=float) 
                                    # make a copy of squareArray that will
                                    # be "clamped". It will only contain
                                    # values within one standard deviation
                                    # of the mean. Values that are too low
                                    # or to high will be set to the min
                                    # and max respectively. We set 
                                    # dtype=float because sqAverage
                                    # and sqStdDev are floating point
                                    # numbers, and we don't want to 
                                    # truncate them down to integers.
clampedSqArray[ (squareArray-sqAverage) > sqStdDev ] = sqAverage+sqStdDev
clampedSqArray[ (squareArray-sqAverage) < -sqStdDev ] = sqAverage-sqStdDev
print clampedSqArray # [[ 1.          2.          3.        ]
                     #  [ 3.90272394  0.31949828  3.90272394]
                     #  [ 1.          1.          1.        ]]


## Multiplying and dividing arrays by numbers does what you'd expect. It
## multiples/divides element-wise
print squareArray * 2 # [[ 2  4  6]
                      #  [ 8  0 12]
                      #  [ 2  2  2]]

## Addition works similarly:
print squareArray + np.ones( (3,3) ) #[[2 3 4]
                                     # [5 1 7]
                                     # [2 2 2]]

## Multiplying two arrays together (of the same size) is also element wise
print squareArray * np.arange(1,10).reshape( (3,3) ) #[[ 1  4  9]
                                                     # [16  0 36]
                                                     # [ 7  8  9]]

## Unless you use the dot(...) function, which does matrix multiplication
## from linear algebra:
matA = np.array( [[1,2],[3,4]] )
matB = np.array( [[5,6],[7,8]] )
print np.dot(matA,matB) #[[19 22]
                        # [43 50]]

## And thats it! There's a lot more to the numpy library, and there are a few
## things I skipped over here, such as what happens when array dimensions
## don't line up when you're indexing or multiplying them together, so if 
## you're interested, I strongly suggest you head over to the scipy wiki's
## numpy tutorial for a more in depth look at using numpy arrays:
##
##			http://www.scipy.org/Tentative_NumPy_Tutorial
Personal tools