This notebook corresponds to what was shown in class on September 3 on the topic of least square fitting of a linear or polynomial model to data.

The following line sets up a Matlab-like environment for scientific computing and data analysis. It should be issued each time when starting a new notebook. It is possible to add this to the local startup file so that it is executed automatically, but here we keep it explicit to make sure the code runs on a default install of the Anaconda Python distribution.

In [1]:
%pylab inline
Populating the interactive namespace from numpy and matplotlib

We take the data from MLS ("Mathematics for the Life Sciences") Example 3.1:

In [2]:
x=[2,5,2,4,6]
y=[4,7,5,8,11]

A plot of the data. 'o' is the plot style indicating to draw big dots for the data and no connecting line.

In [3]:
plot(x,y,'o')
Out[3]:
[<matplotlib.lines.Line2D at 0x7fb1f0f8b290>]

We now fit a straight line (=polynomial of degree 1) to the data, minimizing the mean square deviation. polyfit is returning an array which contains the coefficients of the fitting polynomial.

In [4]:
p=polyfit(x,y,1)
p
Out[4]:
array([ 1.40625,  1.65625])

We can evalate the polynomial using polyval, here at the point x=5.

In [5]:
polyval(p,5)
Out[5]:
8.6874999999999964

To plot the result of the fit, we define a vector of points at which to evaluate the model. Here, we use 50 equidistant points in the interval [0,10].

In [6]:
xx=linspace(0,10,50)

And the plot command, where we first plot the fitted curve, then the raw data points as above.

In [7]:
plot(xx,polyval(p,xx),x,y,'o')
Out[7]:
[<matplotlib.lines.Line2D at 0x7fb1f0e93150>,
 <matplotlib.lines.Line2D at 0x7fb1f0e933d0>]

We can also fit higher degree polynomials, here a cubic. But note that this is very likely "overfitting" the data, i.e., getting a seemingly good fit due to the many parameters which introduces spurious features into the model that are not robustly reproducible by the underlying experiment.

In [8]:
p=polyfit(x,y,3)
plot(xx,polyval(p,xx),x,y,'o')
Out[8]:
[<matplotlib.lines.Line2D at 0x7fb1f0dccc10>,
 <matplotlib.lines.Line2D at 0x7fb1f0dcce90>]