Skip to main content

The Statistics Behind Model Prediction

Updated over a year ago

Model Exploration

Synthace gives you the option of exploring model predictions in two ways. Firstly, the Explore Model tab provides the tools for investigating one model at a time, such as model optimization, surface plots and the graphical profiler. If you want to investigate multiple models simultaneously, then in the Explore Multiple Models tab you can visualize multiple models on the contour plot and perform constrained multi-model optimisation.

Explore Model

The Explore Model tab provides options for users to find optimal values for the currently selected model. Additionally, it allows you to visualize a surface plot and use sliders in order to explore how changing factor levels impact others, according to the model.

Optimization

The values that the model predicts at the design points are already known, as they were part of the model fitting procedure. However, by using the optimization, you can explore if a better optimum (maximum or minimum) can be predicted for intermediate points.

Optimization is performed using the scipy minimize function, restricted to the current design space as bounds. The space search is performed once, starting from the point in the design where the minimum/maximum value was predicted by the model fit. The objective function, which the optimizer is trying to minimize, is the model prediction itself, in the case of a minima search, or the negative of the model prediction, in the case of a maxima search.

Note that scipy minimize is a local optimizer. This means that, compared to a global optimizer, the optimum found depends on the starting point for the space search, and might not be the best one possible across the entire space (global optimum). Nonetheless, since the models created in Synthace are linear, with terms up to quadratic order, it is expected that the optimization, starting from the seed point described above, will converge towards the global optimum available.

For models including categorical factors, the optimization is performed for all combinations of categorical levels, with the optimum reported being the best found across all combinations.

Surface plot

Surface plots are created using the vis.js library Surface3D routine. For numerical factors, whether they are continuous or discrete, the space is divided between 20 evenly distributed points, and the model response is plotted across those points. For categorical factors, the model response is plotted across the categorical factor levels. The X and Y axes correspond to the selected plotting factors, while the Z axis and surface colour correspond to the model response. The numerical values on the axes and hover tooltip are rounded to 4 significant figures.

Model response value

In the Profiler section on the right hand side of the page, under the Maximize/Minimize buttons, the response value for the current selected model is shown. The reported value corresponds to the model fit (predicted response) for the combination of selected slider factor values. For displaying purposes, the reported value is rounded to 4 significant figures, while the slider values are rounded to 3 significant figures.

Explore Multiple Models

The Explore Multiple Models tab provides functionality to do multi-objective optimization with constraints on response values. Multi-objective optimization is particularly useful when the best conditions for satisfying one objective are not aligned with the best conditions of another objective, since it allows you to find trade-offs between conflicting objectives and balance out the factor conditions. Moreover, you can restrict the expected response ranges, which further constraints the optimization. This way, the optimizer will find an optima that is within the desired ranges.

For each model response, you have the option to either maximize or minimize the objective.

Contour Plot

The contour plot shades the response interval for the range selected in the slider, for each model. It provides a single contour level, as it is only intended to illustrate the area covered by the slider range. The X and Y axes are a subset of the union of factors for the selected models. However, not all models need to depend on all factors. The plotted response values are responsive to the factor settings of the non-plotted factors on the right hand side. Additionally, the green lines and circle indicate the current value.

Note that, for display purposes, the slider values have are rounded to 3 significant figures.

Optimization

As in the single-model case, the optimization is performed using scipy minimize, with the bounds restricted by the design space. For multi-model optimization, the seed point, for each involved factor, is set as the mid-point. Moreover, the optimization is constrained to find an optimum that is in the interval selected for each model response through the slider values. The space search is performed once.

Note that, as a local optimizer, scipy minimize might find a local optimum, depending on the shape of the space that the models’ combination is producing.

The optimizer searches for a best solution by minimizing an objective function, J. This is given by the average L2 norm across the n selected models:

where the objective J_i for each model is computed as:

with (m,M) denoting the minimum and maximum values selected through the model sliders, and r denoting the model response.

Each individual objective J_i ranges between 0 and 1, with 0 being achieved if the response corresponds to the desired maxima/minima for maximization/minimization, respectively, and 1 being achieved if the opposite is obtained (minimum/maximum for maximization/minimization, respectively). Due to each model ranging on the same scale, from 0 to 1, it is ensured that each model carries an equal weight in the overall optimization. The overall objective J also ranges between 0 and 1, as a measure of error.

For models including categorical factors, the optimization is performed for all combinations of categorical levels. The optimum reported is the best found across all combinations which obeys the model response constraints.

Optimization desirability

Following the multi-model optimisation, a desirability percentage, rounded to 4 significant figures, is presented for each included model. The desirability percentage, d, ranges between 0 and 100% and is computed as:

where, for each model, r is the obtained optimised response value and m, M are the minimum and maximum set through the model slider.

References

To learn how to predict the best conditions from your model in Synthace, click here.

To learn how to predict the best conditions from multiple models in Synthace, click here.

Did this answer your question?