Skip to main content

What are the statistics behind optimal designs?

Updated over a year ago

Optimal designs are the most general class of design, and provide you with ways to come up with efficient designs for almost any design problem.

This article will give you enough background to understand the components which are used to define a particular optimal design, how they are created and how to use them. We’ll also consider how they differ from other design types available in Synthace.

Optimal Designs vs Other Types

Optimal designs are like any other type of experimental design in that they have a number of runs which sample different combinations of levels for a set of factors of interest. Each run gives you information about how choosing a particular combination of levels for the factors affects the system, and when combined with the other runs can be used to build a model to explore the behaviour and potentially make predictions for use in various ways.

What distinguishes optimal designs from many other design types used in DOE is the way they are created. Consider factorial designs as a comparison: factorial designs are created by applying a particular set of rules. Full factorials result from enumerating all possible combinations of the given factor levels. Fractional factorials can be created from full factorials by removing some of these combinations (in practice this isn’t an efficient way to actually go about generating them, but for our purposes it’s sufficient that this is conceptually true).

The properties of factorial designs and others created in the same way derive from the symmetries which result from the rules used to create them. These symmetries, happily, correspond to the sorts of properties you want in designs with specific goals in mind. However this is a logical connection rather than a direct one. And the need to satisfy the specific rules of symmetry means there are inevitably restrictions on things like the number of runs, types of factors and other features of the design.

Creating Optimal Designs

Optimal designs by contrast aim to take a much more direct approach. The designer specifies what they are trying to do by providing a model they want to try and fit, a means to say how well a design will perform for their purpose (the design criterion) and a run budget (N). They can also optionally provide constraints on the combinations of levels which can be used in the design.

The design engine then applies an optimization method (hence the name optimal design) to try to find the best set of N runs it can given the model and design criterion, subject to any constraints. This process typically consists of generating a starting design at random then trying to improve it across a large number of steps, each step consisting of removing the least informative runs (as judged using the design criterion) and trying to find more informative runs to replace them with.

Note that optimization of this type can often be somewhat hit-and-miss, so (perhaps ironically) optimal designs are actually not usually optimal.

Components of Optimal Designs

The information we need to specify an optimal design described above determines what kind of design we will end up with and how useful it will be to us. It’s therefore important to understand the different components and how they work together.

Model

The model is essentially the most important defining feature of the design. Only terms included in the model will actually be part of the assessment of the design’s optimality. While this doesn’t actually mean the only terms you can ever fit the model to are the ones you put in your initial model they are the only terms guaranteed to be potentially fit to the data (assuming the design process is successful).

The terms in the model are the same ones you’ve seen in other contexts: main effects, interactions and power terms like squares. In statistical packages you typically have full control over exactly which of these to include, you can for example have all main effects and quadratic terms but only a subset of the two-factor interactions. In Synthace we only offer three choices of model for simplicity: main effects, main effects + 2-factor interactions and main effects + 2-factor interactions + quadratic effects.

Including or excluding certain sets of model terms effectively defines the way the resulting data can be used to learn about your system. Main effects only models are used in some screening experiments. Models including 2-factor interactions as well as main effects are most useful for screening and iteratively exploring the design space to find optima. Models which also include quadratic terms allow you to do optimization once you have found a region containing an optimum.

Design Criterion

The second-most important defining feature of a design is the criterion used to judge how well it performs. There are many possibilities for defining how well a design will perform, mostly named using single alphabetical letters. The most commonly used include D, I, A, G and E.

Why are there so many? There are essentially two reasons

  • The different purposes you might be designing for - such as screening and optimization - have different concerns and therefore need different measures of fitness for purpose. For example D-optimality is used for designs intended for screening, while I-optimality is used for designs intended for optimization experiments.

  • For any given purpose it is usually not possible to define a uniquely best measure of how well it suits that purpose.

We therefore often encounter different criteria which are used for the same purpose. Sometimes opinion on which one should be used changes, or varies depending on who you ask. Most design packages offer only a couple of choices as options although they may report more criteria than you can actually directly optimize for.

Behind the scenes, Synthace uses two criteria: D-optimality and I-optimality. It does not, however, give you a choice between them directly. The reason for this is simplicity: we aim to give you only the choices you most need to make and work out the details which follow. So the choice of criterion is entirely dictated by the choice of model: if you choose a model which includes quadratic effects we assume you are optimizing and therefore pick I-optimality as the criterion. For the other two choices (main effects or up to 2-factor interactions) we assume you are screening and choose D-optimality.

Mathematically speaking the criteria are measures applied to the characteristic matrix of the design, the Fisher Matrix. This matrix essentially summarizes how the different model terms are correlated with one another and therefore how much they can be distinguished. D-optimality is calculated by finding the determinant of this matrix, while I-optimality uses the matrix to determine the average variance of predictions made with the model.

Run Budget

The run budget defines how many runs you will perform, the size of the design in effect. This is obviously much simpler to understand than the preceding two components but is worth a quick discussion to see how the different components interact.

The main thing to know about setting the run budget is that given a particular model there is a minimum number of runs required to estimate all the terms in that model. This is the sum of the degrees of freedom (typically written df) required to estimate each term. For continuous factors this is always 1 but for multi-level categorical factors it is the number of levels -1.

However you are usually advised to do more runs than this. This helps in two ways: firstly doing more runs will increase your power to detect real effects, secondly as you add more runs the designer will naturally use them to help distinguish terms which are currently highly correlated, reducing the amount of aliasing in the design.

Note that the design criterion does not affect the minimum number of runs required.

Synthace restricts your maximum run budget based on the size of the equivalent full factorial.

Constraints and Hard to Change Factors

For completeness, we’ll briefly cover the subject of constraints although the main point is to note that Synthace does not presently allow you to constrain your runs, we recommend that in cases where this is necessary you design using an external package.

Constraints allow you to prevent certain combinations of factor levels appearing in the design itself. For example if you are using your design to explore volumes of 4 different liquids A-D to mix you may very well need to specify that the sum of the four volumes cannot exceed a particular value (in practice it’s not usually a good idea to explore mixtures like this because you can easily end up with the same mixture at different volumes cropping up multiple times, but it’s an intuitive example). For some situations there are alternative ways to achieve the same result, however constraints typically require less effort from the user.

A related feature which optimal designs can accommodate well is hard-to-change factors. Declaring a factor as hard to change and creating an optimal design produces what is known as a split-plot design. The full description of these is covered in the article on hard-to-change factors. Briefly the idea here is to accommodate a restriction in randomization, such as you often encounter when incubating mixtures. In a plate-based system the whole plate is incubated at a given temperature rather than the individual wells. This means that in this case the plate itself (rather than the wells) is the experimental unit. Split plot designs account for this situation and ensure that the analyses work statistically.

Assessing Optimal Designs

Assessing the quality of your design is much the same as with other design types. There are two principal features to consider: resolution and power.

Resolution

Resolution is about how well different effects can be distinguished. Two effects which are completely indistinguishable are said to be (fully) aliased. Two effects which are completely distinguishable are said to be clear or independent.

Aliasing between effects occurs when the underlying columns in the design matrix are correlated with one another, with full aliasing corresponding to a correlation of 1 between the columns. For optimal designs it is usually the case that effects have intermediate correlation values (between 0 and 1), which is called being partially aliased.

Unlike full aliasing (in which there is no way to tell which of the possible effects in the aliased set are actually active) partial aliasing still allows you to potentially identify which specific effects are active but makes it harder to distinguish the signals from background noise: in effect the aliasing between the effects means that each contributes some noise to the others, making everything somewhat noisier.

Resolution can be assessed using the correlation matrix, which shows darker colours for pairs of effects that are more highly correlated. Typically there will be some degree of aliasing between related effects, particularly between terms which only include a single effect (e.g. the main effect x1 and the related quadratic term x1^2).

Power

Power describes how well an effect can be distinguished from random noise. The key determinant of power is the number of runs which contribute to estimating that effect. This means that typically power is highest for main effects and gets smaller as the number of terms in the effect increases, since as effects include more terms there are typically fewer runs which contribute to its estimation.

Calculating power for any given effect is framed as a question of whether that effect would be deemed significant using a significance test. This calculation depends on several things

  • The design

  • The whole model being estimated (because of the partial aliasing which usually occurs, see above)

  • The criterion used to define significance (typically 0.05)

  • The size of the effect

  • The amount of noise in the system

Synthace simplifies this for optimal designs (as well as other design types) by applying standard assumptions about the model, significance criterion and amount of noise. This gives you some idea of the power to detect specific effects at particular effect sizes. Since calculating power is always an exercise in educated guesswork this is not overly restrictive and can help to understand how allocating resources affects your ability to detect weaker effects.

Implementation Details

General design calculations

Optimal designs are created using the R package skpr (v1.1.6).

Designs are generated by first creating an initial design and then running the skpr function gen_design to optimize using the initial design as a starting point.

The candidate set for design generation is created using a full factorial design. Where this design is very large (>10^5 runs) it is reduced to 10^5 runs by random sampling to improve performance.

The initial candidate set is then used as a starting-point for skpr. The routine gen_design is invoked using the model selected by the user as input. Hard-to-change factors are automatically parsed to supply the appropriate split-plot model to the package.

Search parameters are determined as follows:

  • If the total number of runs in the full factorial is > 100,000 only 1 restart is used

  • Otherwise we use the default 20 restarts

The search algorithm in either case is the modified Federov.

Model Diagnostics

Model diagnostics are reported using a combination of the results of skpr’s gen_design eval_design routines, with eval_design being used for power calculations and correlation matrices.

Optimality criteria

The optimality criterion is chosen based on the model selected by the user: models containing main effects only or up to two factor interactions use D-optimality, models containing quadratic effects use I-optimality.

Restrictions on models

To prevent long requests degrading performance, we intentionally limit the size of optimal designs as follows:

  • Main effects models: 20 2-level factors

  • Up to two-factor interaction models: 12 2-level or 9 3-level factors

  • Up to quadratic models: 8 3-level factors

In cases where a particular model type cannot be calculated according to these rules it will not be offered as a choice to the user. Similarly, optimal designs are not available unless the user has at least one 3-level factor in the design.

Split-plot designs

For split-plot designs the skpr gen_design routine is invoked with the addition of a model in the hard-to-change factors and a number of whole plots.

The model for the plot-level effects is a main-effects only model in the set of hard-to-change factors.

The number of whole plots is calculated as 3 + the number of degrees of freedom required for the hard-to-change factors.

Citations

To learn how to calculate an optimal design in Synthace, click here.

To learn about other design types, click here.

Did this answer your question?