Regression models using single predictor

- August 15, 2020

The same datasets used shown in the classification blog post, will be used for training and testing various regression models. The datasets present in an excel spreadsheet are imported as tables in MATLAB. The training data is used to train various relevant regression models. In this use case, there is only one predictor input, which is the gate oxide thickness. The response is either the threshold voltage or transconductance. In all the models, the cross-validation parameter is set to 10. It is done to protect the model from over-fitting by partitioning the dataset into 10 groups and estimating the accuracy in each group. The models vary by virtue of the underlying mathematical function used to estimate the relationship between the predictor(s) and response.

The training of various models produces the following. As in the case of classification models, the accuracy of a regression model is computed on the training data itself on MATLAB. The accuracy of a model is expressed in terms of RMSE (Root Mean Square Error). Root Mean Square (RMS) error, is a frequently used measure of the differences between values predicted by a model to the values actually observed. These individual differences are called residuals when the calculations are performed over the training data that was used for estimation and are called prediction errors when computed on a test data. The value differences may occur because of randomness or because the model doesn’t include information that could result in a more accurate estimate. RMSE aggregates the magnitudes of the differences in various cases into a single measure of error. It serves as a good measure of accuracy, but only to compare errors of different models for a given response and not between variables in a particular model, as it is scale dependent. RMSE is the square root of Mean Squared Error (MSE), which is a risk function corresponding to the expected value of the squared error loss or quadratic loss. MSE measures the average of the squares of the differences. The MSE is the second moment (about the origin) of the error, and thus incorporates both the variance of the estimator and its bias. For an unbiased estimator, the MSE is the variance of the estimator.