A Comparison of Supervised Learning Algorithm (Part II)

For those 4 classification problems, we also did the same processes as we mentioned in part I. Instead of going over the details, we would only show the model selection result of the four classification problems as following:

Table 1. Model Selection Results by Data Sets
  Wdbc Ionosphere Hypothyroid
Gradient Boosting(n.trees, depth, n.minobsinnode) 500,9,10 450,9,15 300,4,10
Random Forest(mtry) 6 16 18
Neural Networks(num. of neurons in the hidden layer) 10 5 3
SVM(type, param (degree/sigma),cost) Linear,1,0.32 Radial, 5,1 Linear,1,1.32
Ridge regression (lambda) 0.0001 0.0001 100000
Logistic regression No parameter to tune
Model Averaging (logistic regression)

(AIC for the 3 model)

3 logistic model

(AIC:31.0;35.9;41.3)

3 logistic model

(AIC: 217.2; 231.6; 248.8)

3 logistic model

(AIC: 176.28; 176.64; 177.79  )

 

Model Averaging

To improve the performances of linear ridge regression and logistic regression, we used the R packages (glmulti and MuMIn) for model averaging. For each data set, we chose the best 3 or 5 models to do model averaging to see if it will improve the performance of linear ridge regression (for regression) and logistic regression (for classification).

1.Regression Problem (Boston Housing)

We selected the top 5 ridge regression models based on their AIC scores:

weightable(avg.model)
                                                                                  model        aicc        weights
1                             y ~ 1 + CRIM + NOX + RM + AGE + DIS + TAX + PTRATIO + B + LSTAT 1327.777772 0.056535233839
2                  y ~ 1 + CRIM + ZN + NOX + RM + AGE + DIS + RAD + TAX + PTRATIO + B + LSTAT 1328.092560 0.048301873324
3                        y ~ 1 + CRIM + ZN + NOX + RM + AGE + DIS + TAX + PTRATIO + B + LSTAT 1328.147654 0.046989452154
4                       y ~ 1 + CRIM + NOX + RM + AGE + DIS + RAD + TAX + PTRATIO + B + LSTAT 1329.091435 0.029313049215
5                                     y ~ 1 + CRIM + NOX + RM + AGE + DIS + TAX + PTRATIO + B 1329.205842 0.027683297814

By averaging the top 5 models, the results shows the variables {CRIM, RM, AGE, DIS, TAX, PTRATIO, B}are all significant in both full-averaged model and conditional- averaged model


2.Classification Problem

For classification problem, we choose 3 logistic model based on their AIC scores, and do model averaging.  The results are shown as below:

  • Wdbc
Table 3.1 Model Averaging Results of Wdbc
Model 1 y ~ V4 + V8 + V9 + V15 + V22 + V23 + V29 + V30 + V32
Model 2 y ~ V3 + V4 + V8 + V9 + V15 + V22 + V23 + V29 + V30 + V32
Model 3 y ~ V3 + V4 + V7 + V8 + V9 + V15 + V22 + V23 + V29 + V30 + V32
Variable importance V15  V22  V23  V29  V30  V32  V4   V8   V9   V3   V7

Importance: 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.31 0.08

N containing models:    3    3    3    3    3    3    3    3    3    2    1

Note The model-averaged coefficients for both full-averaged model and conditional- averaged model, no variables are significant.

 

  • Ionosphere
Table 3.2 Model Averaging Results of Ionosphere
Model 1 y ~ V3 + V4 + V5 + V6 + V8 + V9 + V10 + V11 + V13 + V14 + V15 +  V16 + V18 + V22 + V23 + V26 + V27 + V30 + V31
Model 2 y ~ V3 + V4 + V5 + V6 + V8 + V9 + V10 + V11 + V13 + V14 + V15 + V16 + V18 + V22 + V23 + V24 + V26 + V27 + V28 + V30 + V31
Model 3 y ~ V3 + V4 + V5 + V6 + V8 + V9 + V10 + V11 + V13 + V14 + V15 + V16 + V18 + V22 + V23 + V24 + V26 + V27 + V30 + V31
Variable importance V10  V11  V13  V14  V15  V16  V18  V22  V23  V26  V27  V3   V30  V31  V4   V5   V6   V8   V9   V24  V28

Importance: 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.53 0.23

N containing models:    3    3    3    3    3    3    3    3    3    3    3    3    3    3    3    3    3    3    3    2    1

 

Note The results shows the variables { V14 V15 V16 V22  V23  V26  V27 V3 V30  V31 V4 V5 V9 }are all significant in both full-averaged model and conditional- averaged model

 

  • Hypothyroid
Table 3.3 Model Averaging Results of Hypothyroid
Model 1 y~ V4 + V6 + V14 + V15 + V16 + V24
Model 2 y ~V4 + V6 + V7 + V14 + V15 + V16 + V24
Model 3 y~ V4 + V6 + V7 + V14 + V15 + V16 + V17 + V24
Variable importance Relative variable importance:

V14  V15  V16  V24  V4   V6   V7   V17

Importance:          1.00 1.00 1.00 1.00 1.00 1.00 0.57 0.20

N containing models:    3    3    3    3    3    3    2    1

 

Note The results shows the variables { V16 V24 }are all significant in both full-averaged model and conditional- averaged model

 

Performances by Data sets

The tables and plots below show the estimate accuracy based on the hold-out test data set. The best performing model for each data set is boldfaced while the worst one is italic.

1.Boston Housing

Below shows the MSE of test data set for Boston Housing data, the Random Forest model has the lowest MSE 32.64723. The ridge regression performs the worst, whose MSE is 303.57214. But we can improve the ridge regression model by using model averaging (whose MSE is 264.95846) and feature selection (whose MSE is 55.51494).

Table 4. MSE(test) for Boston Housing Data
Data sets MSE (test) Note
Gradient Boosting 33.30788
Random Forest 32.64723
Neural Networks 87.52996 Used scaled data set
SVM 97.98975 Used scaled data set
Ridge regression 303.57214 1. Used scaled data set

2. The test error can be reduced to 55.51494 with feature selection

Logistic regression 0.69170
Model Averaging

(linear ridge regression)

264.95846

2. Classification Problems’ MSE (test) by Data Sets

In this table, we show the MSE of test data sets for classification problems, random forests performs the best on both Wdbc and Hypothyroid data sets; while the best model on Ionosphere is SVM.

Overall, the gradient boosting has the best average performance. In addition, the linear ridge regression has the poorest performance, due to it’s not quite suitable for classification problems. Also, logistic regression model has very bad average performance, but it performs very well on Hypothyroid. Just as No Free Lunch Theorem said there is no best learning algorithm for all data sets.

The model averaging models of Wdbc and Hypothyroid have improves the performance of original logistic regression, while the model of Ionosphere doesn’t.

Table 5. Classification Problems’ MSE (test) by Data Sets
  Wdbc Ionosphere Hypothyroid Average performance
Gradient Boosting 0.033457249070632 0.0610687022900763 0.0167548500881834 0.0370936004829639
Random Forest 0.0260223048327138 0.0763358778625954 0.0149911816578483 0.0391164547843858
Neural Networks 0.0446096654275093 0.0687022900763359 0.027336860670194 0.0468829387246797
SVM 0.0371747211895911 0.0534351145038168 0.027336860670194 0.039315565454534
Ridge regression 0.5278810409 0.1832061069 0.5987654321 0.436617526633333
Logistic regression 0.0594795539 0.1297709924 0.02557319224 0.0716079128466667
Model Averaging

(logistic regression)

0.04832713755 0.1374045802 0.02469135802 0.0701410252566667

3. ROC plots for Classification Problems

For the 3 classification problems, SVM, random forests, and gradient boosting have excellent performance on the area under the ROC. While Logistic regression and ridge regression performs the poorest. Comparing these plots, Wdbc’s models tend to have larger areas under the ROC curve than those of Ionosphere. These can be also shown by MSE of test data (in Table 5).

Conclusion

With the excellent performance on all 4 data sets, gradient boosting and random forests are the best algorithms overall.

Table 6. Difference Between the Algorithms
  Problem Type Training Speed Prediction Speed Data need scaling? Handle lots of features well?
Gradient Boosting Either Slow Fast No Yes
Random Forest Either Slow Moderate No Yes
Neural Networks Either Slow Moderate Yes Yes
SVM (with kernel) Either Fast Fast Yes Yes
Ridge regression Regression Fast Fast Yes No(need feature selection)
Logistic regression Classification Fast Fast No (unless regularized) No(need feature selection)

For regression problems, the traditional linear ridge regression can be improved with model selection and model averaging. While using logistic regression to classify the response is not a good idea. For classification problems, SVM, gradient boosting and random forests perform very well. And the model averaging doesn’t improve all the logistic regression performance in each data set.

Just as Rich Caruana and  Alexandru Niculescu-MizilEven mentioned in their great paper:

“The best models sometimes perform poorly, and models with poor average An Empirical Comparison of Supervised Learning Algorithms performance occasionally perform exceptionally well.”

 

 

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s