Transforming a response may be necessary when it is unsuitable for linear modelling due to certain modelling assumptions being violated. Alternatively, it may be best practise or biologically meaningful to model the mean or variance between replicates instead of individual replicates. Synthace provides multiple options for when it comes to data transformation.
Statistics
Log
Calculates the natural logarithm (base $e$) of the data. Uses the log
function from the NumPy Python library. No pre-processing of the data is performed, so zero and negative values will be transformed to minus infinity and NaN respectively
Box-Cox
Uses the boxcox
function from the scipy.stats
submodule of the SciPy Python library. No pre-processing of the data is performed, which means that all values of the response must be greater than zero for this response to be applied. SciPy Documentation
Yeo-Johnson
Uses the yeojohnson function from the scipy.stats submodule of the SciPy Python library. SciPy Documentation
Python Expression
Uses the eval
function from the Pandas Python library to interpret code entered by the user to perform the transformation. Any infinity values, either positive or negative, in the resulting column are replaced by NaN values. Pandas Documentation
Mean
Calculated as:
Where n is the total number of x values. Uses the mean
method of the DataFrame
class from the Pandas Python library to calculate the mean of each row using values from the specified columns. NaN values can optionally be excluded when calculating the mean. Any infinity values, either positive or negative, in the resulting column are replaced by NaN values.
Variance
Measures the variability of values with respect to their mean and is calculated as:
Uses the var
method of the DataFrame
class from the Pandas Python library to calculate the variance of each row using values from the specified columns. NaN values can optionally be excluded when calculating the variance. Any infinity values, either positive or negative, in the resulting column are replaced by NaN values.
Standard Deviation
Measures the dispersion of values with respect to their mean and is calculated as:
Uses the std
method of the DataFrame
class from the Pandas Python library to calculate the standard deviation of each row using values from the specified columns. NaN values can optionally be excluded when calculating the standard deviation. Any infinity values, either positive or negative, in the resulting column are replaced by NaN values.
Fitted Slope
Performs a linear least-squares regression for each row using values from the specified columns as the y values and the manually entered values as the x values. The slope of the fit is given by:
NaN values can optionally be excluded when performing the regression. Uses the linregress
function from the scipy.stats
submodule of the SciPy Python library to perform the regression, with the slope
property of the fitted model returned. Any infinity values, either positive or negative, in the resulting column are replaced by NaN values.
Citations
Python version: 3.9.15
NumPy version: 1.23.5
SciPy version: 1.9.3
Pandas version: 1.4.4
G.E.P. Box and D.R. Cox, “An Analysis of Transformations”, Journal of the Royal Statistical Society B, 26, 211-252 (1964)
I. Yeo and R.A. Johnson, “A New Family of Power Transformations to Improve Normality or Symmetry”, Biometrika 87.4 (2000)
To learn to select and save subsets of your data, click here.
To learn how to apply a predefined transformation to your data, click here.
To learn how to apply predefined column based calculations to your data, click here.
To learn how to apply custom column based calculations to your data, click here.