Significance level: Our tolerance for false positives in the experiment design. I would always keep it at 5%, and soon request to remove this.
Maximum Treatment percentage: Upper limit on share of output (revenue, conversions etc.) from our Geos that will be targeted in the experiment. A Larger percentage means a larger proportion targeted which means more costly but higher powered experiments. A maximum between 25-30% is a good balance.
Delta range: This parameter contains a vector of different lifts we want to simulate. Since these are relative values, they do depend on the product and vertical we’re analyzing. For instance, while a small single-digit percent increase in volume for a retailer can translate to millions of units, a large lift might be required for an automotive company that sells fewer products per day. By default this parameter is set to a sequence of lifts between 0 and 25% with 5% increments, that is
sequence from: 0% to: 20% by_steps_of: 1%
. A smaller step size means a more refined search. For example, if the step size was 5%, we would search for [5%, 10%, 15%, 20%] and find that our MDE is 10%, but it might be that 8% is also viable yet we didn’t search for it. The only downside to smaller steps is longer runtime because we simulate more values.Period Range: List of treatment periods to calculate power for. If we’re unsure of the ideal test duration, we can specify and assess the power of different durations. The ideal test length heavily depends on the product and vertical we are working with. Nevertheless, a good rule of thumb when deciding duration is to make sure that the test period can contain at least the funnel length of the campaign we are targeting. The values here also represent the start, end, step that we saw above, and the step size has the same implication as above.
Lookback Window: A set of number indicating how far back in time the simulations for the power analysis should go. In general, when the historical data is stable (it doesn’t vary wildly from one day/region to another) we can obtain a very robust and reliable power analysis by just looking into the most recent possible test (in other words,
lookback_window left empty
). By increasing thelookback_window
, we subtract the last period from the previous simulation’s total duration and repeat the process over the remaining periods. Finally, we calculate the average metrics for each treatment group over all of the runs in different periods. On the platform we ask for up to 5 values. If passed 10 20 30, we would run the example below 4 times: once over the full period, once on full period minus last 10 days, minus last 20, minus last 30.Include Markets: Markets to be included into the treatment region, usually a business requirement. Ex: Cos asked to see Los Angeles and Chicago in their lift test in order to be sure that the incrementality measurement is also valid in their largest markets.
Exclude Markets: Markets to be excluded into the treatment region, usually a business requirement. Ex: Ferragamo asked to target New York in their holdout test since it’s their largest market and they don’t want to disrupt their business as usual.
Treatment Location Range: List of number of test markets to calculate power for. The values in this list represent the different test-market sizes we want to explore. This parameter is often guided by the budget and scope of the test we want to implement. For our example, Looking at italy with 20 markets we will analyse smaller tests with between 2 and 6 markets. For a DMA test, we may target around 30 or 40 markets
Advanced configuration

Written by Rashan
Updated this week