Skip to main content

The statistics behind stepwise regression models

Updated over a year ago

Stepwise regression is an automated way of selecting predictive variables. The approach works in forward and reverse modes. In the forward mode, the starting point is the intercept only model, and at each step the term that gives the most statistically significant improvement of the model fit is added to the model. This is repeated until no term gives a statistically significant improvement to the model fit. In the backwards mode, the starting point is the full model with all possible terms, and at each step the term whose loss gives the most statistically insignificant deterioration of the model fit is removed. This is repeated until no term can be removed without a statistically significant decrease in the model fit.

Statistics

Stepwise regression is implemented using the stepAIC function from the MASS R package (MASS Documentation). The k value used with the stepAIC function, which is multiplied with the number of degrees of freedom and is used as a penalty term by the information criterion, is given by:

Where n is the number of data points. The metric can be changed by the user, and either corresponds to the Akaike information criterion (AIC) or the Bayesian information criterion (BIC).

stepAIC is run twice, once in forward mode and once in backwards mode. The outcomes of these runs are shown in the table in the “Term Selection” section of the “Create Models” tab:

The top row corresponds to the intercept only model, and each row after that shows the effect of adding the term shown in the “Change” column for the algorithm in forward mode. The remaining rows show the steps taken by the algorithm run in backwards mode, starting with the full model (row 3 in the figure above) and and depicting the effect of removing terms on, in this case, the AIC.

The model chosen by the algorithm is the final model outputted by the approach when run in forward mode.

Citations

  • R version: 4.0.5

  • MASS version: 7.3-58.1

To learn more about stepwise regression models, click here

To learn how to apply stepwise regression to your data in Synthace click here (Coming Soon)

To learn about other modelling techniques, click here.

To learn how to assess the quality of your fitted model, click here.

Did this answer your question?