Machine Learning Basics : Our first model (Linear Regression)

3 min readMay 13, 2020

Machine Learning Basics : Our first model (Linear Regression)

What is Machine Learning ?

Machine learning is a subset of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed by exploiting various algorithms.

Today, we will see a simple algorithm which is linear regression.

y = w*x + b

This equation is linear equation in x, we all know from mathematics.

Here, we supply the input in “x” and get output in “y”.

But before that we need to know values of coefficient “w” and intercept “b”. Hence we train our linear regression model.

After training, the model will automatically get the values of “w” and “b” for our provided dataset.

Let w=2 and b=1

So, equation will be y = 2*x + 1

Now if we provide input x = 5 (say), we can get value y = 2*5 + 1 = 11.

We can decompose the above into certain steps for every model.

Get dataset.
Split the dataset into train and test.
Create appropriate model.
Train the model.
Test the model (prediction)
Find accuracy of the model.

BEFORE WE START CODING, We require some modules.

sklearn
pandas

If you don’t have these, open CMD & install using pip install module_name

pip install pandas

ALSO, you can use Jupyter Notebook for writing the code. You can execute certain blocks of code independently from the rest.

pip install jupyter notebook

Now Lets Start

1. Getting Dataset

We can get dataset from any website. I use the ones from kaggle.

Download link: Real estate.csv

extract data from dataset

We import pandas module which helps to read various dataset. Read the dataset csv file using .read_csv(‘full_file_path.csv’)

.head() prints the first 5 rows of our dataset.

.iloc[ row_start_index : row_final_index , column_start_index : column_final_index] -> it extracts our input column from the dataset into X_data (Indexing start from 0)

Similarly we extract output column from our dataset into Y_data.

The X_data has data of the 3rd column “X2 house age” of csv file.

The Y_data has data of the 8th column “ Y house price of unit area” of csv file.

2. Splitting the dataset into train and test

split the whole dataset into train & test part

Here we imported our train_test_split function from sklearn module and split the X_dataset and Y_dataset into train (70%) and test (30%).

We stored the splitted data into respective variables.

3. Creating appropriate model

Now this is based on your problem statement, what you need to predict. Whether it’s a classification problem or a linear relation problem.

Here , I am using an example which has linear relations.

create model

An object of LinearRegression() was created.

4. Training the model

train model

We trained our model by using .fit() and providing x_train as input and y_train as output.

Hence as explained above, we now have the values of “w” and “b”.

5. Testing the model (prediction)

prediction

We predict values using .predict() and providing x_test as input.

We store the predicted values into a variable named 'y_pred’.

6. Finding accuracy of model

accuracy score

We find accuracy using accuracy_score().

The function is provided with two parameters.

1st -> true output value ( stored in y_test)

2nd -> predicted values (stored in y_pred)

Finally, we print the accuracy.

SO, WE END THE CODE HERE. Below is a full implementation of the above.

Full Implementation

Here we took only 1 column for input and 1 column for output. But it is possible to take multiple columns for input, so that the model can predict more accurately. So, here comes Multiple Linear Regression.

In the same example we can consider multiple columns by using .iloc[row_start : row_end, column_start : column_end]

In the above code, we wrote data.iloc[ : , 2]. It took all rows and only the 2nd column.

Now suppose, we want 1st three columns ie. column with index 0,1, 2 as input

So, we write: data.iloc[ : , 0 : 3]. It will take all rows and first 3 columns. We write 3 because it do not include the last specified index.

Rest of the code is same as above.

CONCLUSION:

We learnt how to implement a simple linear regression model to predict values based on a dataset which has datapoints of linear relationship.