Types of Geo-Experiments
Hold-out Tests
Reduce or Completely stop spending in test regions
Measures baseline contribution of channel
Best for validating channel incrementality
Compare Decrease in Spend to Decrease in Revenue
Scale-up Tests
Increase spending in test regions
Measures potential for growth
Best for testing saturation points
Compare increase in spend to increase in revenue
New Channel Tests
Test new channel in specific regions
Measures incremental impact of new activity
Best for validating expansion plans
Compare increase in spend to increase in revenue
Design Parameters
Duration: Typically 14-21 days minimum
Budget: Determined by the expected lift and ROI
Geography Selection: Algorithm selects regions to create comparable test/control groups
Expected Lift: Minimum detectable effect needed for statistical significance
Channel-Specific Considerations
When testing upper-funnel activities (e.g., Meta Awareness, YouTube), consider the delayed effect:
Example Scenario:
Test Duration: 3 weeks
Channel’s Known Lag Effect: 2 weeks
Analysis Approach:
First Analysis: At the end of 3-week test period
Final Analysis: At 5 weeks (3 weeks test + 2 weeks lag)
This ensures we capture the full impact, including delayed conversions
This is particularly important for:
Brand awareness campaigns
Video advertising
Content marketing
Other upper-funnel activities with known lag effects
Understanding Expected Lift
The expected lift shown in the experiment design represents:
The minimum change needed to validate the input ROI assumption
NOT a prediction of actual results
A threshold for statistical significance
Calculated based on:
Input ROI/ROAS
Historical performance
Geographic variance
Test duration
Example:
If input ROAS = 10 Expected Lift = -5% This means: To validate a ROAS of 10, we need to see at least a 5% reduction in revenue when reducing spend. If we see less impact, it indicates the actual ROAS is lower than 10.
When designing the experiment, the initial estimate we input for iROAS or CPIC sets the lower bound of what we expect to detect. For example, if we input an iROAS of 10 and the experiment requires a minimum detectable lift of $10,000 to be statistically significant, the required investment would be $1,000. This $1,000 would theoretically return 10× its value, providing the minimum detectable lift needed. However, if we overestimated the iROAS and the actual value is only 2, our spend wouldn't generate enough lift to reach the detection threshold, causing the experiment to fail.