## How to do a Simple Linear Regression in Python

Have you ever wanted to fit a line to a set of data points using Python? Well, if you ever do, here’s how do to it:

**Method 1:**

The NumPy/SciPy package has a built in function; `stats.linregress`

you can use. I didn’t choose this method because I didn’t want to fool with installing a package for my simple one-off need. There’s an example of using SciPy’s linear regression function here.

**Method 2:**

I found and modified the linreg function on this helpful website: Simple Recipes in Python by William Park.

Here is my modified version, simply modified to clean up the formatting, and to include the R^2 value in the return values:

from math import sqrt def linreg(X, Y): """ Summary Linear regression of y = ax + b Usage real, real, real = linreg(list, list) Returns coefficients to the regression line "y=ax+b" from x[] and y[], and R^2 Value """ if len(X) != len(Y): raise ValueError, 'unequal length' N = len(X) Sx = Sy = Sxx = Syy = Sxy = 0.0 for x, y in map(None, X, Y): Sx = Sx + x Sy = Sy + y Sxx = Sxx + x*x Syy = Syy + y*y Sxy = Sxy + x*y det = Sxx * N - Sx * Sx a, b = (Sxy * N - Sy * Sx)/det, (Sxx * Sy - Sx * Sxy)/det meanerror = residual = 0.0 for x, y in map(None, X, Y): meanerror = meanerror + (y - Sy/N)**2 residual = residual + (y - a * x - b)**2 RR = 1 - residual/meanerror ss = residual / (N-2) Var_a, Var_b = ss * N / det, ss * Sxx / det #print "y=ax+b" #print "N= %d" % N #print "a= %g \\pm t_{%d;\\alpha/2} %g" % (a, N-2, sqrt(Var_a)) #print "b= %g \\pm t_{%d;\\alpha/2} %g" % (b, N-2, sqrt(Var_b)) #print "R^2= %g" % RR #print "s^2= %g" % ss return a, b, RR if __name__=='__main__': #testing X=[1,2,3,4] Y=[357.14,53.57,48.78,10.48] print linreg(X,Y) #should be: #Slope Y-Int R #-104.477 378.685 0.702499064

By the way, I used Excel’s built-in function: `linest`

to make sure this function is working properly.

*Please don’t hesitate to post any questions as comments. I wasn’t sure of the statistics background of my audience (all 3 of you , hi mom!).*

[tags]Python, Excel, Statistics, Python Statistics, Simple Linear Regression, Ordinary Least Squares, OLS, SLR, Linear Regression, Regression[/tags]

Instead of using the map(None, X, Y), you could also use the built-in zip function and do zip(X,Y). I had never seen the idiom of using map with function=None before and had to look up what it did…

Good point, Stan. It looks like the code I modified was written in 1998. Perhaps they didn’t have zip() back then?

Hey, thanks a lot for making the linear regression script! I’m using in a small project on the colour patterns of flounders!

Great work,

Rob