How to do a Simple Linear Regression in Python

Have you ever wanted to fit a line to a set of data points using Python? Well, if you ever do, here’s how do to it:

Method 1:
The NumPy/SciPy package has a built in function; stats.linregress you can use. I didn’t choose this method because I didn’t want to fool with installing a package for my simple one-off need. There’s an example of using SciPy’s linear regression function here.

Method 2:
I found and modified the linreg function on this helpful website: Simple Recipes in Python by William Park.

Here is my modified version, simply modified to clean up the formatting, and to include the R^2 value in the return values:

from math import sqrt
def linreg(X, Y):
        Linear regression of y = ax + b
        real, real, real = linreg(list, list)
    Returns coefficients to the regression line "y=ax+b" from x[] and y[], and R^2 Value
    if len(X) != len(Y):  raise ValueError, 'unequal length'
    N = len(X)
    Sx = Sy = Sxx = Syy = Sxy = 0.0
    for x, y in map(None, X, Y):
        Sx = Sx + x
        Sy = Sy + y
        Sxx = Sxx + x*x
        Syy = Syy + y*y
        Sxy = Sxy + x*y
    det = Sxx * N - Sx * Sx
    a, b = (Sxy * N - Sy * Sx)/det, (Sxx * Sy - Sx * Sxy)/det
    meanerror = residual = 0.0
    for x, y in map(None, X, Y):
        meanerror = meanerror + (y - Sy/N)**2
        residual = residual + (y - a * x - b)**2
    RR = 1 - residual/meanerror
    ss = residual / (N-2)
    Var_a, Var_b = ss * N / det, ss * Sxx / det
    #print "y=ax+b"
    #print "N= %d" % N
    #print "a= %g \\pm t_{%d;\\alpha/2} %g" % (a, N-2, sqrt(Var_a))
    #print "b= %g \\pm t_{%d;\\alpha/2} %g" % (b, N-2, sqrt(Var_b))
    #print "R^2= %g" % RR
    #print "s^2= %g" % ss
    return a, b, RR

if __name__=='__main__':
    print linreg(X,Y)
    #should be:
    #Slope	Y-Int	R
    #-104.477	378.685	0.702499064

By the way, I used Excel’s built-in function: linest to make sure this function is working properly.

Please don’t hesitate to post any questions as comments. I wasn’t sure of the statistics background of my audience (all 3 of you ;-) , hi mom!).

[tags]Python, Excel, Statistics, Python Statistics, Simple Linear Regression, Ordinary Least Squares, OLS, SLR, Linear Regression, Regression[/tags]

3 Responses to “How to do a Simple Linear Regression in Python”

  1. Stan Seibert says:

    Instead of using the map(None, X, Y), you could also use the built-in zip function and do zip(X,Y). I had never seen the idiom of using map with function=None before and had to look up what it did…

  2. Good point, Stan. It looks like the code I modified was written in 1998. Perhaps they didn’t have zip() back then?

  3. robert rankin says:

    Hey, thanks a lot for making the linear regression script! I’m using in a small project on the colour patterns of flounders!

    Great work,