When an experiment finishes loading, it will present you with this table to make a choice.
Ideally, each of these choices has a tag assigned to it that explains why you would pick each experiment. Locations represents the treatment locations. The budget represents the value to be added or taken away depending on the experiment type. The duration is how long the experiment should run. The expected lift Z tells us that if we remove Z amount over the next Y days, we would expect a Z% change to our output.
Once we select one of these experiments, We move on to a more detailed view:
Here we are presented with a simple guide in natural language on a reecap of the experiment design.
Treatment Percentage: This indicates the percentage of the total audience that was exposed to the treatment (e.g., saw the ad), helping define the size of the test group.
MDE: This is the smallest change or lift in performance the test is designed to detect with statistical confidence, given the sample size and variability.
Power: Power is the probability that the test will correctly detect a real effect when one exists, usually targeted at 80% or higher to ensure reliability.
MAPE (Mean Absolute Percentage Error): MAPE measures how accurate a model's predictions are by calculating the average of the absolute percentage errors between predicted and actual values. It tells you, on average, how far off your predictions are as a percentage of the actual result β lower MAPE means better accuracy. Values under 20% are Good, under 10% is perfect.
SMAPE (Symmetric Mean Absolute Percentage Error): SMAPE is a variation of MAPE that adjusts for cases where actual or predicted values are very small or zero, to avoid distortions. It calculates the error as a percentage of the average of the actual and predicted values, making it more stable and fair in cases of small numbers.
Abs Lift in Zero: This value represents the average estimated lift our test markets had when we actually simulated a 0% Lift in conversions. Great test market selections often have values of abs_lift_in_zero very close to zero. (Value above not representative, still to be Bugfixed)
Scaled L2 imbalance: The Scaled L2 Imbalance metric is a goodness of fit metric between 0 and 1 that represents how well the model is predicting the pre-treatment observed values through the Synthetic Control. Scaled L2 Imbalance values of 0 show a perfect match of pre-treatment values. Scaled L2 Imbalance values of 1 show that the Synthetic Control algorithm is not adding much value beyond what you could accomplish with a simple average of control locations.
Power Curve: The power curve visually shows how the probability of detecting an effect increases as the true effect size grows, helping to plan for adequate test sensitivity. We look for symmetric power curves, that have a power of 0% at 0% simulated lift. Anything else indicates that we were detecting an effect when none was simulated which can be dangerous. A power curve should be somewhat symmetric between positive and negative simulated effects, otherwise it indicates that the design is biased to one sort of design.