Machine Learning Basics: Polynomial Regression
We have seen the linear regression model. But, not all kinds of data is best fit in a linear regression model.
Suppose, we have a curve. Thus to represent the curved data points, we need a curve and not our linear straight line. Hence, here comes Polynomial Regression.
Required Modules:
- pandas: for dataset reading & extraction.
- sklearn: for polynomial regression.
- matplotlib: for plotting data.
Now, Let’s Start.
Importing dataset
import pandas as pddata = pd.read_csv('..\Datasets\polynomial.csv')
data.head()
.head() prints first 5 rows in a dataset
Extracting X_data and Y_data
X_data = data.iloc[ : , 0:1]
Y_data = data.iloc[ : , 1]
Plotting the data points
import matplotlib.pyplot as pltplt.scatter(x= X_data, y= Y_data)
plt.xlabel('Age')
plt.ylabel('Height')
Here, matplotlib.pyplot is used to plot the dataset.
.scatter() is a function to plot points in round fashion. This we call it as a scatter plot. We need to supply the ‘x value’ and the ‘y value’ to plot a “scatter plot”.
The graph shows that the points are not in a straight line. Hence, linear regression will not hold good.
Transforming the x_data
We have to transform the x data because we only have 'x' column. But, the form of polynomial regression is :
y = w₀ + w₁*x + w₂*x² + . . + wₙ*xⁿ
We need, values of x² , x³, ... , xⁿ
These values we will not compute manually, so we will use a function from sklearn ie, PolynomialFeatures()
This function will compute the required degrees of x that is required for the regression and return us a matrix where columns represent the computed values of degrees of input. In case of only 1 input variable ie ‘x’ the following fashion is followed: (if the input variable has two variables suppose ‘x’ and ‘w’ then the ith column may not represent ith power. However it is not our concern)
0 index column → 0th power
1st index column → 1st power
2nd index column → 2nd power
3rd index column → 3rd power
Let x = 2 and degree of polynomial= 3
After getting this type of matrix, we will fed this data into the input of a simple linear regression model.
This transformation is what makes our simple linear regression model into polynomial regression model.
poly_feat = PolynomialFeatures(degree = 3)
x_poly = poly_feat.fit_transform(X_data)print(x_poly[0:5])
Model creation
poly_model = LinearRegression()
Object of Linear Regression is created.
Training the model
poly_model.fit(x_poly,Y_data)
We trained our model using .fit() and providing x data and y data.
Testing the model (Prediction)
y_pred_poly = poly_model.predict(x_poly)print(y_pred_poly)
The above matrix is our predicted values.
Plotting the graph (scatter plot & line plot)
plt.scatter(x= X_data, y= Y_data)
plt.plot(X_data, y_pred_poly, color='red')
plt.xlabel('Age')
plt.ylabel('Height')
.scatter() is used to plot the points
.plot() is used to plot the points as a line.
Finding r2_score
accuracy_score() cannot handle multi-class data, so it cannot be used here. By “class” I meant, columns. Hence, we use another metrics ie, r2_score.
from sklearn.metrics import r2_scorer2_score(Y_data, y_pred_poly)
NOW, If we have done our simple linear regression what is the r2_score ? Lets see.
lin_model = LinearRegression()
lin_model.fit(X_data, Y_data)
y_pred_lin = lin_model.predict(X_data)plt.scatter(x= X_data, y= Y_data)
plt.plot(X_data, y_pred_lin, color='red')
plt.xlabel('Age')
plt.ylabel('Height')
What is the r2_score ?
r2_score(Y_data, y_pred_lin)
Thus, we visually can see that linear regression is not best fit, but polynomial regression of degree 3 fits best. Also, the r2_score of polynomial regression model is high than that of linear regression.
Below is a full implementation of the above polynomial regression
# import all required modulesimport pandas as pd
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt# dataset import and value extractiondata = pd.read_csv('..\Datasets\polynomial.csv')
data.head()X_data = data.iloc[ : , 0:1]
Y_data = data.iloc[ : , 1]# plot the datapoints to visually see the dataplt.scatter(x= X_data, y= Y_data)
plt.xlabel('Age')
plt.ylabel('Height')# transform the X_data to polynomial featurespoly_feat = PolynomialFeatures(degree = 3)
x_poly = poly_feat.fit_transform(X_data)print(x_poly)# model creation and trainingpoly_model = LinearRegression()
poly_model.fit(x_poly, Y_data)# predictiony_pred_poly = poly_model.predict(x_poly)# plot the polynomial regressionplt.scatter(x= X_data, y= Y_data)
plt.plot(X_data, y_pred_poly, color= 'red')
plt.xlabel('Age')
plt.ylabel('Height')# find out r2_scoreprint(r2_score(Y_data, y_pred_poly))
CONCLUSION
Here, we fitted a curve with the help of polynomial regression. The results were much better than that of linear regression.
It depends on the dataset to which model you should select for prediction. So, at first we plot the datapoints to get an idea about model.