Machine Learning (Part 2)
Linear Regression
Predicting the outputs using a single feature
A method to predict dependent variable (Y) based on values of independent variables (X) in which they are linearly related.
Goal: find a linear function that predicts the dependent variable (Y) as a function of the feature or independent variable (X).
Linear function ,
y = b0 + (b1 * x1)
y = dependent variable
x = independent variable
How to find the best fit line?
In this regression model, we are trying to minimize the errors in prediction by finding the line of best fit. In other words, we try to minimize the length between the actual value (Yi) and the predicted value from our model (Yp).
we usually use MSE to calculate the error.
min { SUM(Yi — Yp)² }
Step 1: Data Preprocessing
We will follow the same steps as in my previous article Machine Learning (Part 1).
Step 2: Feature Engineering
We can filter the independet variables (X) that has a high colerration with the dependent variables (Y) by plotting its correlation matrix.
use seaborn to draw the matrix in a colorful plot.
import pandas as pddf = pd.DataFrame(filename)
df.corr()import seaborn as sn
import matplotlib.pyplot as pltsn.heatmap(corrMatrix, annot=True)
Step 3: Fitting linear regression model to the training set
from sklearn.linear_model import LinearRegressionregressor = LinearRegression()
regressor = regressor.fit(X_train, Y_train)
Step 4: Predicting the outputs
Y_pred = regressor.predict(X_test)
Step 5: Visualization
We will use matplotlib.pyplot library to make Scatter Plots of our training and testing sets to see how close our model predicts the values.
import matlpotlib.pyplot as plt// visualize the training dataset
plt.scatter(X_train, Y_train, color='blue')
plt.plot(X_train, regressor.predict(X_train), color='red')// visualize the testing dataset
plt.scatter(X_test, Y_test, color='blue')
plt.plot(X_test, regressor.predict(X_test), color='red')