Sayan Saha
5 min readMay 15, 2020

Machine Learning Basics: Polynomial Regression

We have seen the linear regression model. But, not all kinds of data is best fit in a linear regression model.

Suppose, we have a curve. Thus to represent the curved data points, we need a curve and not our linear straight line. Hence, here comes Polynomial Regression.

Required Modules:

  • pandas: for dataset reading & extraction.
  • sklearn: for polynomial regression.
  • matplotlib: for plotting data.

Now, Let’s Start.

Importing dataset

import pandas as pddata = pd.read_csv('..\Datasets\polynomial.csv')
data.head()
Output

.head() prints first 5 rows in a dataset

Extracting X_data and Y_data

X_data = data.iloc[ : , 0:1]
Y_data = data.iloc[ : , 1]

Plotting the data points

import matplotlib.pyplot as pltplt.scatter(x= X_data, y= Y_data)
plt.xlabel('Age')
plt.ylabel('Height')
Output: Scatter plot of datapoints

Here, matplotlib.pyplot is used to plot the dataset.

.scatter() is a function to plot points in round fashion. This we call it as a scatter plot. We need to supply the ‘x value’ and the ‘y value’ to plot a “scatter plot”.

The graph shows that the points are not in a straight line. Hence, linear regression will not hold good.

Transforming the x_data

We have to transform the x data because we only have 'x' column. But, the form of polynomial regression is :

y = w₀ + w₁*x + w₂*x² + . . + wₙ*xⁿ

We need, values of x² , x³, ... , xⁿ

These values we will not compute manually, so we will use a function from sklearn ie, PolynomialFeatures()

This function will compute the required degrees of x that is required for the regression and return us a matrix where columns represent the computed values of degrees of input. In case of only 1 input variable ie ‘x’ the following fashion is followed: (if the input variable has two variables suppose ‘x’ and ‘w’ then the ith column may not represent ith power. However it is not our concern)

0 index column → 0th power

1st index column → 1st power

2nd index column → 2nd power

3rd index column → 3rd power

Let x = 2 and degree of polynomial= 3

Output: Example of Polynomial Feature

After getting this type of matrix, we will fed this data into the input of a simple linear regression model.

This transformation is what makes our simple linear regression model into polynomial regression model.

poly_feat = PolynomialFeatures(degree = 3)
x_poly = poly_feat.fit_transform(X_data)
print(x_poly[0:5])
Output: Polynomial Features Matrix (shown only 5 rows)

Model creation

poly_model = LinearRegression()

Object of Linear Regression is created.

Training the model

poly_model.fit(x_poly,Y_data)

We trained our model using .fit() and providing x data and y data.

Testing the model (Prediction)

y_pred_poly = poly_model.predict(x_poly)print(y_pred_poly)
Output: Prediction Matrix

The above matrix is our predicted values.

Plotting the graph (scatter plot & line plot)

plt.scatter(x= X_data, y= Y_data)
plt.plot(X_data, y_pred_poly, color='red')
plt.xlabel('Age')
plt.ylabel('Height')
Output: Blue represents original scatter plot and Red represents our polynomial regression

.scatter() is used to plot the points

.plot() is used to plot the points as a line.

Finding r2_score

accuracy_score() cannot handle multi-class data, so it cannot be used here. By “class” I meant, columns. Hence, we use another metrics ie, r2_score.

from sklearn.metrics import r2_scorer2_score(Y_data, y_pred_poly)
Output: r2_score

NOW, If we have done our simple linear regression what is the r2_score ? Lets see.

lin_model = LinearRegression()
lin_model.fit(X_data, Y_data)
y_pred_lin = lin_model.predict(X_data)
plt.scatter(x= X_data, y= Y_data)
plt.plot(X_data, y_pred_lin, color='red')
plt.xlabel('Age')
plt.ylabel('Height')
Output: Linear regression

What is the r2_score ?

r2_score(Y_data, y_pred_lin)
Output: r2_score of linear model

Thus, we visually can see that linear regression is not best fit, but polynomial regression of degree 3 fits best. Also, the r2_score of polynomial regression model is high than that of linear regression.

Below is a full implementation of the above polynomial regression

# import all required modulesimport pandas as pd
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
# dataset import and value extractiondata = pd.read_csv('..\Datasets\polynomial.csv')
data.head()
X_data = data.iloc[ : , 0:1]
Y_data = data.iloc[ : , 1]
# plot the datapoints to visually see the dataplt.scatter(x= X_data, y= Y_data)
plt.xlabel('Age')
plt.ylabel('Height')
# transform the X_data to polynomial featurespoly_feat = PolynomialFeatures(degree = 3)
x_poly = poly_feat.fit_transform(X_data)
print(x_poly)# model creation and trainingpoly_model = LinearRegression()
poly_model.fit(x_poly, Y_data)
# predictiony_pred_poly = poly_model.predict(x_poly)# plot the polynomial regressionplt.scatter(x= X_data, y= Y_data)
plt.plot(X_data, y_pred_poly, color= 'red')
plt.xlabel('Age')
plt.ylabel('Height')
# find out r2_scoreprint(r2_score(Y_data, y_pred_poly))

CONCLUSION

Here, we fitted a curve with the help of polynomial regression. The results were much better than that of linear regression.

It depends on the dataset to which model you should select for prediction. So, at first we plot the datapoints to get an idea about model.

Sayan Saha
Sayan Saha

Written by Sayan Saha

I am a student. Currently studying Engineering BTech Information Technology @Techno Main Salt Lake

No responses yet