Linear Regression in Python

The most familiar statistical models provided by Python in Machine Learning is the Linear Regression. The crucial part of Data Science is to understand the Algorithm and how it works. It allows us to understand the relationship between one dependent variable and other (one or more) independent variables.


Understanding regression in simple steps.
Regression analysis is defined as a predictive modeling form that makes us understand the relationship between a dependant and an independent variables. It got many types to study:

  • Linear Regression
  • Logistic Regression
  • Polynomial Regression
  • Stepwise Regression

Linear Regression is being used in various fields in business and helps in understanding the market. Using a few examples, we can understand its efficiency in a better way.

  1. It helps to evaluate sales and estimates the progress.

Linear Regression forecasts the growth and estimates the path of business trends. It gives a graph of how the next season sales based on the previous sales records.

  1. Understand the Price Change Impacting your business

When the change in product price is the primary goal, the linear regression estimates the impact on the consumers and their behavior over the change. This helps the business to take challenging decisions.

  1. Analysing Risk Factor

Business is always a risk-taking task. So when the plan is in action, based on the previous records and observations risk can be minimized.

How to use Linear Regression through different techniques in Python

Least Square Method

The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems, i.e., sets of equations in which there are more equations than unknowns (Wikipedia).

Regression Line y = mx + c; here y = Dependant variable, x = Independent variable and c = y-intercept.


Implementation using Python

Let’s understand this by using dataset of head size and brain weight of various people.

# Importing Necessary Libraries

%matplotlib inline

importnumpy as np

importpandas as pd

importmatplotlib.pyplot as plt

plt.rcParams[‘figure.figsize’] =(20.0, 10.0)

# Reading Data

data =pd.read_csv(‘headbrain.csv’)



# Collecting X and Y

X =data[‘Head Size(cm^3)’].values

Y =data[‘Brain Weight(grams)’].values

To get the values of m and c, first mean of X and Y should be calculated.

# Mean X and Y

mean_x =np.mean(X)

mean_y =np.mean(Y)

# Total number of values

n =len(X)

# Using the formula to calculate m and c

numer =0

denom =0

fori inrange(n):

numer +=(X[i] -mean_x) *(Y[i] -mean_y)

denom +=(X[i] -mean_x) **2

m =numer /denom

c =mean_y -(m *mean_x)

# Print coefficients

print(m, c)

Now the calculated values will be added to the following equation: brainWeight = c + m*headSize

So we can get the value of y against each value of x. Hence, a graph can be plotted with these values.

# Plotting Values and Regression Line

max_x =np.max(X) +100

min_x =np.min(X) -100

# Calculating line values x and y

x =np.linspace(min_x, max_x, 1000)

y =c +m *x

# Ploting Line

plt.plot(x, y, color=’#52b920′, label=’Regression Line’)

# Ploting Scatter Points

plt.scatter(X, Y, c=’#ef4423′, label=’Scatter Plot’)

plt.xlabel(‘Head Size in cm3’)

plt.ylabel(‘Brain Weight in grams’)


R-square Method

This method displays the closeness of the date to the fitted regression line.

y = actual value
y ͞= mean value of y
yp = predicted value of y

This method doesn’t explain if the regression model is correct. You can have a very low R-square value for a good model, or a top R-square value of models that don’t fit the given data.

Implementation using python

Scikit Learn Method

This is a machine learning technique for finding the Linear Regression.

This method simplifies the effort by using the Libraries of Machine Learning.

These were the techniques used in Python to calculate Linear Regression.