Design of experiments is fundamentally an application of a reasonably simple and very flexible type of statistical model to experimentally explore an interesting system.

The main tools of DOE are built around helping you to manage the process of creating and interpreting models all the way from designing an experiment to allow you to fit the right kind of model through to interpreting the meaning of the model created from the data you have collected to answer your research questions.

This document will provide you with a brief summary of the modelling techniques behind DOE to help you understand what’s going on enough to use the tools most effectively.

What Does “Model” Mean?

The word ‘model’ can be used in many specific ways depending on the context: a model organism, for example, or in the sense of a specific variant of a particular make of car. For our purposes we can think of a model as a way to try to capture the relationship between a set of influences on a system or process and some observable behaviour that results from being exposed to a particular set of influences.

There are of course many possible ways to do this, generally what all models have in common however is that they have some part which is fixed and another which is variable (Figure 1). The fixed part - the structure of the model - dictates the range of possible behaviours it can accurately capture. The variable parts - the parameters of the model - are then chosen based on some data about a specific system so that it reproduces the observed behaviour as closely as possible.

Figure 1. Examples of linear models and the different working parts. Linear models are not as the name suggests just straight lines but can also fit curves. Linear models comprise the fixed structure (black) and variable parameters (red). The structure influences the type of behaviour that a model can capture while the parameters influence the shape of that behaviour.

What kind of models do we use in DOE?

In its most general sense, design of experiments as a discipline is about doing the best experiment to build the kind of model you’re looking for. So it can be applied to literally any kind of model. However the most common meaning of the term “DOE” is when we apply these techniques to a specific family of models: linear statistical models (or just linear models).

You may have already encountered these in some form or another. It’s very likely at least the term ‘linear’ is familiar to you, but please try to reserve judgement if so. The terminology in this area of statistics is quite unfortunately confusing and often taken to mean that all the models in question can do is fit straight lines through things. This is not actually true, although the reasons are a little obscure. For example, both of the models in figure 1 above represent linear models despite the second model containing a quadratic term.

For the purposes of getting started with DOE it’s important to understand the following about linear models:

They are very flexible, and with a few tricks can fit a wide variety of shapes of data
They have proven very successful in modelling all sorts of real world systems both for the purposes of prediction and explanation
They have a natural hierarchy which means they can be made simple at first and gradually refined as needed when the data prove this is required

What are Linear Models Made of?

As we said above, linear models being models have a fixed part and a variable part. The fixed part of a linear model is the set of terms or (in DOE speak) effects it contains (Figure 2). There are typically three levels of effect you will use:

Main effects (a.k.a. first order effects): capture how each input variable affects your system independently of everything else.
Interaction terms: define how each pair of effects work together to influence the response.
Quadratic effects (a.k.a. second order effects): allow you to capture peaks or troughs in the response - allowing models to be curved rather than just flat surfaces. Essentially this is like asking how an effect interacts with itself.

The parameters of the model are then numbers which define how strong each of these effects is and whether it affects the response negatively or positively.

Figure 2. Linear models and how the terminology relates to DOE. In DOE linear models are described as having effects, or the fixed structure of the model, with the parameters that influence the magnitude of those effects. Commonly you will be interested in the main effects, two factor interactions and up to quadratic effects.

How does DOE use Linear Models?

This is essentially the subject of a lot of the documents in our collection! For introductory purposes we can sketch out the process of doing a particular DOE experiment as follows:

Choose an experimental design
Collect data
Build a model from the data
Interpret the model and decide what to do next

Modelling and Design

At stage 1 we are in effect defining the key features of the model we hope to fit later: different models require different designs.

For linear models the choice is typically whether we include:

Just main effects
Main effects and interactions
everything up to quadratic effects.

This choice dictates what design you need and can be used to assess how well a given design will work to let you build that model.

Model Building

At stage 3 we have collected response data, so we can then build the model. This is the process of estimating the parameters of the model, which is automatic but you can choose whether to leave some potential parameters out (effectively changing the structure of the model) or do other things to the data which can help to get a better model.

At this point we need to scrutinise the model carefully to help understand whether it accurately represents what the data are saying. You can read more about this in the later section on assessing your model.

Using the Model

Finally at stage 4 once the model is built we can look at it in different ways to understand what it tells us about our system.

There are two main ways we can do this:

We can look at the estimated model parameters and see what that tells us about how the different influences seem to affect the behaviour of the system
We can try to predict which parameter settings give us optimal behaviour in terms of getting the best response

Based on this interpretation we can then say what we need to do next: if the model is good and we just want to use it to understand our system better we may well be done. If we’re making predictions with it we almost certainly want to do another experiment to test whether those predictions are accurate.

Summary: Don’t Fear the Modeller

Building models can seem like a very abstract and mathematically motivated activity, and indeed in many cases that’s what it is. But models are tools, and used properly they are extremely effective at organising experimental knowledge and helping to guide how you explore a system experimentally. The beauty and power of DOE is that it puts these tools in a practical context and makes them accessible to any researcher doing any kind of research.

While modelling is a whole discipline in its own right it’s possible to apply it in a wide range of situations without having to become an expert, the first step is to understand how it underpins all the steps you take in a DOE even if you don’t yet understand how it all fits together. Hopefully that is the step we’ve helped you to take in this document.

To learn how to get started on your DOE journey in Synthace, click here.

Model fundamentals, and what does linear mean anyway?

Validating Your Model

Fixing Model Validation Issues: Transformations and Row Sets

What are optimal designs and when are they useful?

When and why to use simple linear model fits to derive responses in Synthace

DOE and Modelling