Skip to main content

What are stepwise regression models?

Updated over a year ago

Stepwise regression is a type of search procedure which attempts to find a good model in one of two ways:

  • Starting with an empty model and sequentially adding terms until there are no terms left which significantly improve it (forwards regression)

  • Starting with the model that includes all possible terms and sequentially removing terms until there are none left which are having a big effect on its accuracy (backwards regression)

Generally, forwards regression finds smaller models than backwards regression.

At each step the algorithm needs to assess how well the model fits the data. Synthace does this with one of two metrics which you can choose between: the Akaike information criterion (AIC) and the Bayesian information criterion (BIC).

Both metrics attempt to summarize how well the model fits the data while applying a penalty for the number of terms it contains. In general, adding more terms to a model will improve how well it fits the data, however, as terms with smaller and smaller effect sizes are included the model will essentially be modelling random noise rather than real effects. This means it will not generalize well, a situation known as overfitting.

AIC and BIC differ slightly in the way that they apply this penalty, with BIC generally penalizing extra terms more aggressively and leading to smaller models.

Synthace will run both forwards and backwards regression with the chosen metric and report the last model from forward selection.

To learn about the statistics behind fitting stepwise regression models, click here.

To learn how to apply stepwise regression to your data in Synthace click here (Coming Soon)

Did this answer your question?