Skip to main content

Model fundamentals, and what does linear mean anyway?

Updated over a year ago

In general, any model defines a mathematical relationship between some inputs (the raw inputs are called factors in DOE, these can be combined and transformed into effects) and some outputs (DOE calls these responses). You can think of a model as a black box which you input some labels and numbers into and which then gives you something (usually a number) in return.

Model Anatomy: Effects and Parameters

Models typically have two parts: the model effects and the model parameters.

The effects are derived from the values which are set as inputs to the system: think temperatures, concentrations, amounts and types of substances. Importantly these can be any function of these inputs, so we might calculate the square or the cube or the square root or the product of any set of the base (or “main”) effects.

The parameters quantify how the input effects cause a change in the response. These can be thought of as expressing the amount of change in the response which results from a unit change in the given effect. So if the parameter corresponding to the main effect of temperature is 2, then we expect a change of temperature of 1℃ to change the response by 2 units.

What puts the “Linear” in “Linear Model”?

From a mathematical point of view what makes the model linear is that there are no parameters to be estimated inside the transformations applied to derive the effects, they are entirely separate. While this technicality may not seem important or easy to intuit it has a very important consequence: “Linear” models in the statistical sense can fit nonlinear data.

This fact is something that DOE makes a lot of use of. For optimization purposes using quadratic terms in the model allows modelling peaks, which of course can’t be captured using straight lines.

Linear models are therefore extremely general. DOE typically makes use of a specific subset of linear models, polynomial models. These have the great property that they can be as simple or complicated as you like (Figure 1).

Figure 1. A depiction of two linear models. The left-hand model can only fit straight lines while the model to the right, which includes a quadratic term, can fit some curved shapes as well. What makes these both linear is that the parameters m, a, b and c are just coefficients of the independent variable x and related terms like x^2.

The simplest model is just a constant, whatever the inputs are it always has just the same constant value. The next simplest is a straight line. This can move up or down and change slope. But you can keep going up and up, adding first a squared term, then a cube term and so on.

For the kinds of physical systems DOE is usually applied to, it’s rarely necessary to go above a cubic term, and in many cases only quadratic terms are needed. Synthace DOE allows models with at most quadratic terms to be built, so we will restrict discussion to these for now. To use DOE to fit more complex models requires using an external DOE package such as JMP, DesignExpert or MODDE.

Interactions: DOE’s secret sauce

So far we’ve only talked about effects which derive from a single factor. A key property of DOE is the ability to determine when two or more factors combine to behave differently: detecting interactions between them. Interaction effects simply look like the product of two or more other effects.

The most important thing to know for this case is that the model should be specified hierarchically: if we include the interaction between two terms (X1 and X2, say) then we must include both X1 and X2 in the model as main effects as well. This is not strictly essential from a technical point of view but substantially simplifies the interpretation of the model.

Modelling Noise

The final important parameter of a statistical model aims to capture the amount of noise in the system, since real world systems are always noisy to a greater or lesser extent.

The standard deviation of a normal distribution can be thought of as the “noise” term for building linear models in DOE. This is one of the key assumptions of linear modelling, and one which is important for model validation: once we take out all the predictable effects of changing the effects in the model, the random stuff left over looks like it comes from a normal distribution.

To learn more about the modelling process, click here.

Did this answer your question?