Improving Linear Regression — The Why and the How
Linear Regression is a linear model used for regression problems. It has long been overtaken by other models in terms of accuracy. However, is is extremely useful as baseline, giving you fast results, and being explainable while doing it. Moreover, it is a perfect ‘first model’ for people diving into Machine Learning successfully depicting the gist of Machine Learning to the uninitiated. Thus, this article starts with implementing the humble Linear Regression model using the scikit-learn API in python, and goes further, introducing techniques to make our implementation fancier. These techniques can be applied to a wide range of models, and the ideas behind them are central to machine learning.
But what is a linear model?
Put simply, a linear models tries to find linear relationships in your data. It is perfect where the target value is expected to be a linear combination of the features.
from sklearn import linear_model
linear_model.LinearRegression
Regression is used when the target value of our data is continuous, i.e., real-valued. For Linear Regression, We want to build a model f(w,b,x) as a linear combination of features of example x as:
Where our aim is to find W and b to minimize the objective function, in this case, the Mean Squared Error. Thus, the aim of the algorithm is to minimize:
where N is the number of data samples.
The Dataset
We use the diabetes data which is a part of the sklearn API itself. For our intents, consider it to be toy or mock data, which we use simply because it is easy and convenient to do so.
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
#importing the datasetfor our use
diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)
plt.scatter(diabetes_X[:, np.newaxis, 2], diabetes_y)
plt.xticks(())
plt.yticks(())
plt.show()
Let us now see how we can use Linear Regression on this data. With sklearn, we can do so in three simple lines of code.
from sklearn import linear_model
regr = linear_model.LinearRegression() #Create a Linear Regression model
diabetes_X = diabetes_X[:, np.newaxis, 2] #Reformat the data for the model
regr.fit(diabetes_X, diabetes_y) #train the model on the data
We can now use our model to make predictions on new data, and see how well it can do it. We can also dig a bit to find the coefficient (W) and the intercept (b)
W = regr.coef_ #Returns the weight W
b = regr.intercept_ #Return the intercept b
X = diabetes_X
print('Weight Value: ',W)
print('Intercept Value: ',b)
plt.scatter(X[:50], diabetes_y[:50], color="black")
plt.plot(X, W * X + b)
And that is it for linear regression. Many variations exist which build on the humble regression model. We shall look at some of them, but first,
Regularization
Regularization methods are methods that force the learning algorithm to create less complex models, in other words, they prevent over-fitting. Many of the models simply add regularization to the LinearRegression() model’s objective function, so it is useful to understand this concept.
There are three models which add regularization to the Linear Regression’s objective function.
+------------------+----------------------------------+-----------------------+
| Regularization | Command | Type of regularization|
+------------------+----------------------------------+-----------------------+
| Ridge Regression | sklearn.linear_model.Ridge() | l2 |
| Lasso Regression | sklearn.linear_model.Lasso() | l1 |
| ElasticNet | sklearn.linear_model.ElasticNet()| l1+l2 |
+------------------+----------------------------------+-----------------------+
Regularization is of two types. Recall the objective function for linear regression was:
Let us take a hyperparameter ‘C’, multiply it to the weight matrix W, and add it to the objective function to convert the equation to:
This is referred to l1 regularization. On the other hand, if we instead multiply the square of the weight matrix to ‘C’ as:
This is l2 regularization. The main difference between them is that the objective function for l2 regularization only tends to 0, while using l1 regularization we can actually reach 0. In practice, start by using Ridge, l2 regularization, on your model if you suspect it is overfitting.
Cross Validation and Hyperparameter Tuning
We referred to the ‘C’ in the previous section as a hyperparameter. It means, it is a numeric value which we as the programmers inputs into the model. (Unlike W and b, which the algorithm learns for us). Each model has its own set of hyperparameters. The programmers tunes the hyperparameters experimentally. Normally, this means a whole lot of trial-and-error. Luckily, sklearn gives us functionalities to automate this process somewhat. It is to be noted that manually tuning the hyperparameters for a while might help the data scientist understand the data better, and is generally considered best practice.
Imagine you wish to try Ridge Regression with the following values of C: [0.001,0.01,0.1,1,10,100], and would like to save the model with the highest accuracy. You can do so by using:
regr = linear_model.RidgeCV(alphas = [0.001,0.01,0.1,1,10,100],cv = 3)
regr.fit(diabetes_X, diabetes_y)
Scikit-learn uses the variable alpha in place of the ‘C’ we have been using. But what does the ‘cv = 3’ mean? It means the dataset is divided into three parts. Two parts are used for training (called the training set) and one part is used to test the efficiency of the hyperparameter (called the validation set)
More often, you won’t have just one hyperparameter to tune. And that is where Grid Search comes in. For example, the elasticNet Regressor takes two hyperparameters, l1_score, and alpha. If we want to test out [0.1, 0.2, 0.3] for our l1_score, and [1, 2, 3] for our l2 score, a.k.a our alpha value, we can use linear_model.ElasticNetCV(), which will create combinations of the hyperparameters for you. A total of 9 models (3*3) will be trained in this way, assessed, and the best returned.
Let us show this in practice, with the same dataset as before:
diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)
diabetes_X = diabetes_X[:, np.newaxis, 2]
#We import the dataset as usual
diabetes_X_test = diabetes_X[-20:] #We will test how good our model is
diabetes_y_test = diabetes_y[-20:]
diabetes_X = diabetes_X[:-20]
diabetes_y = diabetes_y[:-20]
regr = linear_model.ElasticNetCV(
l1_ratio = [0.1, 0.5, 0.7, 0.9, 0.95, 0.99, 1],
#range of values for l1
fit_intercept=[True,False],
#do we want the intercept or not?
alphas=[ 0.0125, 0.025, 0.05, 0.125, 0.25, 0.5, 1.0, 2.0, 4.0],
#range of values for l2
cv = 3
)
regr.fit(diabetes_X,diabetes_y)
Let us now write a simple helper function to test our model, and plot it against the data.
from sklearn.metrics import mean_squared_error, r2_score
def plot_regr(model,X,Y):
W = model.coef_
b = model.intercept_
Y_pred = model.predict(diabetes_X_test)
print(f"Mean squared error: {mean_squared_error(Y, Y_pred)}")
print(f"R2 error: {r2_score(Y, Y_pred)}")
plt.scatter(X, Y, color="black")
plt.plot(X, W * X + b)
plot_regr(regr,diabetes_X_test,diabetes_y_test)
Mean squared error: 2554.4111933368745
R2 error: 0.47126338325849815
Naturally, this model is…awful. Remember that regularization is useful the make the model less complicated, and LinearRegression, which ElasticNet mathematically builds on, is already about as uncomplicated as it gets. Then again, LinearRegression itself isn’t all that great. By now however, we know that it is fast, and perfect as a baseline model.
I hope you found this useful. Make sure to follow me to keep up with my content! Feedback appreciated!