Design overview

Once you have defined your factors and their levels or ranges, you can explore the design quality in Calculate Design. There, you have four design options: Spacefill, Optimal, Full Factorial or a Custom design which you can create in a third party tool and upload to Synthace. To learn more about these options, click here.

Design Diagnostics

We show three design diagnostics: power, coverage and a correlation plot. The package skpr is used to generate power values for a sample of effect sizes for all design types, using the function eval_design. However for the level of interactivity we want to provide this approach would require storing a lot of power values to cover a reasonable range of effect sizes. We therefore use the power values reported as part of our own implementation as follows.

Power

Power is the probability of correctly detecting an active effect, and it ranges between 0 and 100%.

In the Design Diagnostics plots, the power is graphically represented in two ways. In the bar-like plots, the bar height represents the power for each of the main effects (from the factors) and up to quadratic effects (including two-factor interactions). In the distribution plot, the power is represented as the yellow-shaded area under the H1 curve, which passes the F critical value (α=0.05) of H0.

For a given design, power calculations are derived from values reported by the skpr routine eval_design. This routine is employed once, for a standard effect size (signal-noise ratio) of 1, α 0.05 and all effects in the current model given equal effect sizes..

To support interactivity in the power analysis slider, which can range from 0 to 3, the new power values are then recalculated without employing expensive calls to eval_design.

For the effect size set by the slider, manipulations of the non-central F distribution functions are used. Briefly, the overall logic is that the power value computed for effect size=1 can be used to find the appropriate non-centrality parameter (λ). This can then be rescaled according to the new desired effect size and used to recalculate power from the relevant null and alternative F distributions. A more detailed explanation, which includes the use of SciPy functions ncfdtr, ncfdtri, ncfdtrinc, is outlined below.

For a more fundamental exemplification, see, e.g., https://www.stat.cmu.edu/~hseltman/309/Book/chapter12.pdf.

Power calculation

For all terms we first calculate the power using eval_design (α=0.05, effect_size=1). Then, given that we know the degrees of freedom for both the numerator and denominator from the model, the F critical value of the null distribution (hypothesis H0) is computed using ncfdtri (the inverse CDF) for a non-centrality parameter of zero. This is then used to compute the power for the alternative distribution (hypothesis H1), alongside the appropriate non-centrality parameter for the alternative, using ncfdtr (the CDF).

Power is calculated using the standard SciPy functions for working with the non-central F-distribution (ncfdtr, ncfdtri, ncfdtrinc). In all cases we calculate the power for an α of 0.05 and equal effect sizes for all model terms. The logic is standard in most respects (see, e.g., https://www.stat.cmu.edu/~hseltman/309/Book/chapter12.pdf for an explanation): given that we know the degrees of freedom for both the numerator and denominator from the model, the power can be found by first calculating the critical value for the assumed α (fixed at 0.05) of the null distribution using ncfdtri (the inverse CDF) and a non-centrality parameter of zero, then finding the mass of the alternative distribution which exceeds this value using ncfdtr (the CDF) with the appropriate non-centrality parameter for the alternative, calculated from the user-defined expected coefficients.

The non-centrality parameter is calculated using the ncfdtrinc function, which is the inverse CDF with respect to the non-centrality parameter. The reason to do this is that calculation for split-plot designs is complex, and this approach has shown to work well when compared to reported power values. For effect sizes different than 1, the non-centrality parameter is calculated as follows:

Firstly,

where λ is the non-centrality parameter, df_n are the degrees of freedom for n levels, and σ_e is a measure of error/noise. β_i are the model coefficients, which, for a linear model, are a measure of the effect size. In our case, ∀ i and ∀ j, β_i = β_j, i.e., the model coefficients are the same for all points. By assumption, variance is 1, which leads to σ_e = 1. Therefore, it follows that

where β_a denotes the effect size. Given a constant df_n, it follows that

Therefore, we can calculate the non-centrality parameter λ_x (for effect size x) from λ_1 (effect size 1) simply by scaling by x^2.

Correlation Plots

Correlation plots are created using data (correlation.matrix) returned by skpr when running the eval_design routine for power calculations.

Coverage Plots

The coverage plots are simple graphical representations of the chosen points as an upper-triangular scatterplot matrix, using jitter to ensure all points are visible. For all factors, pair-wise plots are presented. The size of the plotted circles is proportional to the number of runs that contain that specific factor combination.