Technical Documentation for Logistic Curve Fits

Overview

This article provides details of how curve fitting for 4 and 5-parameter logistic curve fits is implemented for reference purposes.

See here for details on how to use the feature here.

Like other transforms in the platform, the logistic curve fitting functionality operates on data that have previously been reshaped (or pivoted) on some independent variable (e.g., time, substrate concentration) so that measurements are arranged in columns, one per value of the independent variable.

For each row, the appropriate logistic function (4 or 5 parameters, depending on the choice made by the user) is fitted as follows:

Fits are done in two stages. The first stage attempts to find an initial estimate of the parameters using a genetic algorithm, while the second refines the initial estimate using nonlinear regression.

Initial Parameter Estimation

Initial parameter estimation uses the differential evolution metaheuristic as implemented by SciPy (version 1.9). Briefly, this is a genetic algorithm method that attempts to do global optimization by simulating an evolutionary process over a population of potential solutions to the problem of interest, subject to any constraints.

An initial population is set up by generating a Latin hypercube of values within specified bounds. The quality of each population member as a parameter set for the curve fit (known as its fitness) is estimated by finding the sum of squares deviation between the logistic equation with those parameters at the independent values in the dataset and the corresponding dependent values.

Once all population members in the current generation are evaluated, a new population is generated by preferentially selecting fitter members of the current generation to reproduce. The daughter solution generated inherits a random set of parameters, which are based on the parents’ values. Newly generated solutions are mutated to introduce further random variation.

This process is repeated until either a convergence criterion is satisfied or a pre-set number of generations is reached. The best solution discovered in the whole search is then reported. If no valid solution was discovered, an error is reported.

The exact call is scipy.optimize.differential_evolution The specific parameters used at this step are as follows:

seed=0 maxiter=1000 popsize=15 mutation=(0.5, 1) recombination=0.7 polish=False strategy="best1bin" tol=0.01 init="latinhypercube"

Default initial bounds are

A: [0, min(dependent variable)] B: [-1,1] C: [0, max(independent variable)] D: [0, max(depdendent variable)] E: [0.1, 10.0] (5PL fit only)

Where the user has specified bounds, these override the defaults. If all bounds are set equal, no fitting is performed.

Parameter Refinement

This uses the trust region reflective algorithm for nonlinear least squares fitting with the parameters found in the first step as the starting point for optimization.

Optimization runs until one of several convergence criteria or the pre-set limit of 10,000 function evaluations is reached.

This step uses scipy.optimize.curve_fit The specific parameters used at this step are as follows:

maxfev 10000 
sigma None a
bsolute_sigma False 
method trf 
jac None

Default initial bounds are

A: [0, ♾️] 
B: [-♾️,♾️] 
C: [0, ♾️] 
D: [0, ♾️] 
E: [0, ♾️] (5PL fit only)

Where the user has specified bounds, these override the defaults. If all bounds are set equal, no fitting is performed.

Return Values

When fitting is successful, the transforms create new columns and report row-wise values.

In the case of the 4-parameter fit, the $C$ parameter, the inflection point, is equivalent to the EC50.

Contrastingly, for the 5-parameter fit, the EC50 value is calculated as follows:

Details of the returns are as follows

**Both 4 and 5 parameter fits return the following parameters -----------------------------------------------------------**  
**Minimum asymptote** - fitted A value post refinement 
**Maximum asymptote** - fitted D value post refinement 
**Inflection point** - fitted C value post refinement 
**Slope factor** - fitted B value post refinement 
**Minimum asymptote error -** square-root of variance in fitted A value **Maximum asymptote error -** square-root of variance in fitted D value **Inflection point error -** square-root of variance in fitted C value **Slope factor error** square-root of variance in fitted B value 
**Root mean squared error** - directly calculated from fitted values 
**Fit Error** -   
**For 5 parameter fits only the following parameters are also reported ---------------------------------------------------------------------**  **Asymmetry factor** - fitted E value post refinement  
**Asymmetry factor** **error** - square-root of variance in fitted E value **EC50** - calculated from the B, C and E values; see above

Error Values

The transforms will always return a fit if one is possible. Where one is not possible, errors are returned row-wise for each fit, and the fit_error column will contain a True value.

❗ Failures are typically down to convergence issues; in these cases, it’s best to check the raw data and determine whether the model is adequate for the data. Excessive noise or missing values are often the cause. In many cases, experimenting with parameter bounds can help enable fits that previously failed.

Summary

The underlying methods used to fit the two models are identical and have the same parameters by default.

How to use the simple linear fit transformation in Synthace

Performing non-linear curve fits using 4 & 5-parameter logistic regression

Fitting bespoke curves to your data

Technical Documentation: Generic Curve Fitting