Skip to main content

What are the statistics behind space-filling designs?

Updated over a year ago

Space-filling designs are a broad class of ways to sample a multidimensional space at random while attempting to distribute the sample as ‘evenly’ as possible - in other words to fill the space.

The central idea is to try and explore as much of the space as you can given the available resources while introducing as little bias as possible towards exploring particular areas more than others.

In practice there is no unique way to do this, since in general by only sampling a small number of points compared to the space as a whole there is no way to avoid making some trade-offs. Different space-filling designs make these trade-offs in different ways, the differences are most relevant for the usefulness in particular applications. For example, some versions of space-fills are good for building specific kinds of nonlinear models.

In Synthace, space-filling designs are intended mostly for initial exploration of a design space for areas of potential interest and understanding where the limits are. Therefore the most important consideration is to sample the space as uniformly as possible: ideally we would get as complete a ‘map’ of the space as we can.

Since we are only sampling a small fraction of the space relative to the whole thing we cannot hope to get a complete map, however we can make the assumption that there is some kind of regularity to the space so that what we find at a point in our sample tells us something about points nearby. We can then zoom in on interesting regions in later stages of the experiment and explore the nearby space in detail.

Comparison With Other Design Types

Compared to more conventional DOE designs like D-optimal or factorial designs, space-filling designs have quite different properties:

  • They make very few assumptions about the underlying space

  • They are therefore usable for building a wide variety of types of model, linear and non-linear, and can work for a variety of applications

  • However they are far from optimal for any particular model type and specific application

By comparison, optimal and factorial designs are optimal or nearly optimal for applying linear models to screening or possibly prediction. However they require some pretty strong assumptions about the space you are modelling and can fail to provide you with a usable model if those assumptions are not valid, such as when a large fraction of your runs generate no data.

This manifests in design evaluation as an apparently lower statistical power (see this article for more on power) compared to these other designs. While this might seem problematic it’s important to understand that in some ways this is not a fair comparison: space-filling designs are not really intended to be good at detecting effects via building linear models. While they can do this (and with enough runs they do it well) they are not efficient, but they can provide you with a lot of information which other design types do not.

Synthace Implementation

Synthace implements space-filling designs using Latin Hypercube Sampling, as implemented by the R package lhs. Latin hypercube designs are intended to provide a reasonably uniform sample of the space, and are therefore a reasonable choice for our application.

The routine randomLHS from the lhs package is used to generate a random hypercube with all factors in the interval [0-1]. lhs internally first creates a random permutation of the number of levels requested, then maps each integer to a floating point value within the corresponding subinterval of [0,1].

We run randomLHS with parameters n and k set to the user-defined number of runs and the number of factors respectively.

The Synthace software then maps the floating point values to the appropriate domain for the given factor as chosen by the user, mapping discrete choices back to their relevant levels (e.g. for categoric factors) and scaling and transforming continuous values to the sampling intervals defined.

For example if we have

  • A categorical factor with levels [A, B, C] (Factor 1)

  • A continuous factor in range [-2, 4] (Factor 2)

The value 0.8 would be mapped as follows:

  • For factor 1 values in the range [0, 0.333) would map to A, [0.333, 0.666) to B and [0.666, 1] to C, so the result would be C

  • For factor 2 we would scale the value by 6 (4 - -2) to get 4.8, then translate by -2 for the result 2.8

Citations

To learn how to calculate a space-filling design in Synthace, click here.

To learn about other design types, click here.

Did this answer your question?