How to do a Simple Linear Regression in Python
Have you ever wanted to fit a line to a set of data points using Python? Well, if you ever do, here’s how do to it:
Method 1:
The NumPy/SciPy package has a built in function; stats.linregress you can use. I didn’t choose this method because I didn’t want to fool with installing a package for my simple one-off need. There’s an example of using SciPy’s linear regression function here.
Method 2:
I found and modified the linreg function on this helpful website: Simple Recipes in Python by William Park.
Here is my modified version, simply modified to clean up the formatting, and to include the R^2 value in the return values:
from math import sqrt
def linreg(X, Y):
"""
Summary
Linear regression of y = ax + b
Usage
real, real, real = linreg(list, list)
Returns coefficients to the regression line "y=ax+b" from x[] and y[], and R^2 Value
"""
if len(X) != len(Y): raise ValueError, 'unequal length'
N = len(X)
Sx = Sy = Sxx = Syy = Sxy = 0.0
for x, y in map(None, X, Y):
Sx = Sx + x
Sy = Sy + y
Sxx = Sxx + x*x
Syy = Syy + y*y
Sxy = Sxy + x*y
det = Sxx * N - Sx * Sx
a, b = (Sxy * N - Sy * Sx)/det, (Sxx * Sy - Sx * Sxy)/det
meanerror = residual = 0.0
for x, y in map(None, X, Y):
meanerror = meanerror + (y - Sy/N)**2
residual = residual + (y - a * x - b)**2
RR = 1 - residual/meanerror
ss = residual / (N-2)
Var_a, Var_b = ss * N / det, ss * Sxx / det
#print "y=ax+b"
#print "N= %d" % N
#print "a= %g \\pm t_{%d;\\alpha/2} %g" % (a, N-2, sqrt(Var_a))
#print "b= %g \\pm t_{%d;\\alpha/2} %g" % (b, N-2, sqrt(Var_b))
#print "R^2= %g" % RR
#print "s^2= %g" % ss
return a, b, RR
if __name__=='__main__':
#testing
X=[1,2,3,4]
Y=[357.14,53.57,48.78,10.48]
print linreg(X,Y)
#should be:
#Slope Y-Int R
#-104.477 378.685 0.702499064
By the way, I used Excel’s built-in function: linest to make sure this function is working properly.
Please don’t hesitate to post any questions as comments. I wasn’t sure of the statistics background of my audience (all 3 of you
, hi mom!).
[tags]Python, Excel, Statistics, Python Statistics, Simple Linear Regression, Ordinary Least Squares, OLS, SLR, Linear Regression, Regression[/tags]
Instead of using the map(None, X, Y), you could also use the built-in zip function and do zip(X,Y). I had never seen the idiom of using map with function=None before and had to look up what it did…
Good point, Stan. It looks like the code I modified was written in 1998. Perhaps they didn’t have zip() back then?
Hey, thanks a lot for making the linear regression script! I’m using in a small project on the colour patterns of flounders!
Great work,
Rob