We start by setting up the environment.

In [1]:
%pylab inline
Populating the interactive namespace from numpy and matplotlib

The data is from Chapter 4 of MLS. The variable x corresponds to the weight of various animals measured in grams; the variable y corresponds to the heart rate of the animal in beats per minute.

In [2]:
x=[4,25,200,300,2000,5000,30000,50000,70000,450000,500000,3000000]
y=[660,670,420,300,205,120,85,70,72,38,40,48]
In [3]:
plot(x,y,'o')
Out[3]:
[<matplotlib.lines.Line2D at 0x7f048f4e6110>]

The data is very much clustered in one corner of the coordinate system. This is strongly suggests choosing some logarithmic scale. Let's first try a logarithmic scale on the data. If the y depended on x exponentially, we would see a straight line.

In [4]:
semilogy(x,y,'o')
Out[4]:
[<matplotlib.lines.Line2D at 0x7f048f3f97d0>]

Not much better. So let's test for an allometric relationship, which would show up as a straight line in a doubly logarithmic plot.

In [5]:
loglog(x,y,'o')
Out[5]:
[<matplotlib.lines.Line2D at 0x7f048f3b1790>]

This rather strongly indicates an allometric relationship. So let's fit a line on the log-log scale.

In [6]:
l1=polyfit(log(x),log(y),1)

The first and last data point look like outliers, so we might want to exclude them when fitting:

In [7]:
l2=polyfit(log(x[1:-1]),log(y[1:-1]),1)

Let's now define the fitting functions for both cases and plot them together with the data:

In [8]:
def f1(x):
    return exp(l1[1]) * x**l1[0]

def f2(x):
    return exp(l2[1]) * x**l2[0]

loglog(x, y, 'o',
       x,f1(x),
       x,f2(x))

legend(('Data','Allometric Fit','Allometric Fit Excluding Endpoints'))

xlabel('Weight [g]')
ylabel('Heart rate [bpm]')
Out[8]:
<matplotlib.text.Text at 0x7f048f206650>

We can also test for the goodness of fit using the correlation coefficient. Note that Scipy, different from Matlab, always returns a correlation matrix containing all pairwise correlation coefficients between the data vectors given.

In [9]:
corrcoef(x,y)
Out[9]:
array([[ 1.        , -0.33406368],
       [-0.33406368,  1.        ]])

So the value we are interested in is the off-diagonal element (the matrix is always symmetric), this is the element with index [0,1]. The 1 on the diagonal simply reflects that each data vector is perfectly correlated with itself.

In [10]:
corrcoef(x,y)[0,1]
Out[10]:
-0.33406368101442896

This shows that the correlation is not very good, in agreement what we see in the first plot above. Similarly, the correlation

In [11]:
corrcoef(x,log(y))[0,1]
Out[11]:
-0.44281354486633406

is still very weak, as seen in the second plot above. Testing for the correlation of an allometric fit,

In [12]:
corrcoef(log(x),log(y))[0,1]
Out[12]:
-0.97679283764252967

shows a rather strong correlation. Excluding the first and last point,

In [13]:
corrcoef(log(x[1:-1]),log(y[1:-1]))[0,1]
Out[13]:
-0.99554869734545437

is even better. This result says that

In [14]:
corrcoef(log(x[1:-1]),log(y[1:-1]))[0,1]**2 * 100
Out[14]:
99.111720878623117

percent of the variance on the doubly logarithmic scale is explained by the fit.