Support Vector Regressors#
Support vector regressors are built on many of the concepts of support vector machines but modified for use in regression problems instead of classification. This means using a slightly different loss function, so the ε-insensitive loss function is used instead of the hinge loss function.
Instead of optimising a hyperplane in support vector regression, an ε-tube is created around the true regression line. The optimisation is then to find a (typically linear) function, \(f(x)\), that deviates from the target values by no more than &epsilon for as many data points as possible. All while keeping the model complexity to a minimum.
Support vector regression can still use the kernel trick that makes support vector machines so powerful. Let’s look at using support vector regressors for the student performance dataset.
import pandas as pd
from sklearn.model_selection import train_test_split
data = pd.read_csv('../data/student-performance.csv')
data['Encoded EA'] = [1 if x == 'Yes' else 0 for x in data['Extracurricular Activities']]
train, test = train_test_split(data, test_size=0.2, random_state=42)
X = train.drop(['Performance Index', 'Extracurricular Activities'], axis=1)
y = train['Performance Index']
Similar to the sklearn.svm.SVM
method, the sklearn.svm.SVR
method applies the RBF kernel trick by default.
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error
svr = SVR()
svr.fit(X, y)
X_test = test.drop(['Performance Index', 'Extracurricular Activities'], axis=1)
y_test = test['Performance Index']
mean_squared_error(y_test, svr.predict(X_test))
5.386855029604294
We see that the RBF kernel produces a worse model than linear regression. We can make sense of this, as the data is linear (we saw this for the polynomial regression). Therefore, applying a different kernel, i.e., the linear one, would make sense.
svr_lin = SVR(kernel='linear')
svr_lin.fit(X, y)
X_test = test.drop(['Performance Index', 'Extracurricular Activities'], axis=1)
y_test = test['Performance Index']
mean_squared_error(y_test, svr_lin.predict(X_test))
4.084595877236296
This returns an MSE very similar to the previous methods.