Data is often one of the biggest challenges in producing an accurate Marketing Mix Model and this guide should support you and your team in getting us the right data, in the right format.
The best way to get started is with a single dataset, with a column for each variable.
We can then work together to build integrations that will automate the data gathering and the model refreshes.
For all variables we need:
104 weeks minimum of historical data (recommended)
Weekly or daily granularity
Formatting Notes
Dates should be in format YYYY-MM-DD. The model requires weeks to start on Mondays (Monday through Sunday).
You should not include currency or other symbols in both column titles and cells Decimal separator should be “.” (DOT) and not “,” (COMMA)
Data
The data we required can be organized in 4 groups:
Output Variable
Spending Variables
Organic Variables
Context Variables
Output Variable
Is the backbone of the model and is the metric we’ll try to improve with our Budget Allocations.
It needs to be:
Stable
Accessible
Consistent
Usually a great candidate for the output variable is Sales, but we can also model for other metrics such as: Conversions, Conversions from new customers, CAC etc.
💡Please Note
It’s important that we place our output variable in the right historical place.
If you’re an ecommerce/retail company, revenues and conversions should be accounted on the day they occurred, if you’re a lead generation business we strongly advise you to reconcile the sales by bringing it back to the day the lead was generated (as we’ll look for correlation that will not be easily found if we have the sales conversion cycle to account for).
🥊 An example:
An ecommerce/retail gets a new sales on 31/01 for 100$ → will have that 100$ and that conversion on date 31/01 of the dataset
A lead generation business gets a lead on 01/01 and converts on 31/01 for 100$ → will have that 100$ and that conversion on 01/01 (as media will not have any power to affect their conversion and the time it takes)
Media Spending
We need to include ALL media spending that occurred during the modeling window.
We’ll split our spending into campaign types as it will give us the best level of actionability (if we keep it too generic, eg Google, we will have an hard time to execute when the model recommends the amount of budget to be allocated into that - how much in search? and how much in PMax?).
Eg. Your spending on Meta needs to be splitted into:
Prospecting
Retargeting
Shop
Etc.
Your spending on Google will be splitted into:
Search Brand
Search Not Brand
PMax
Demand Gen
Etc.
💡Please Note
It will be the model to guide us to the optimal level of data granularity.
The model will assess:
If there are spending variables that are below 2.5% of spend-share
If there are spending variables that have too little historical data
If there are spending variables that are highly correlated with one another
You’ll have the chance to then group “problematic” variables into newly created variables that will be compliant with our modeling best-practices.
We only need the daily/weekly spending. No Clicks, no Impressions.
Again it is crucial that we account for our spending as accurately as possible.
🥊 Challenges can present themselves with offline media (TV, Radio, OOH etc.) where reporting is usually on the monthly basis.
Eg. If you spend 100k for 4 weeks of billboards in a major city you should account for that by splitting the monthly sum into weekly/daily numbers. It would be 25k/week or 3.3k/day
Organic Variables
Here we’re looking to get the impressions you got from your organic posting on social media, the clicks you got on your organic search console, the number of SMS or Emails you sent every day/week (non transactional).
Below a quick summary of the main details we will need:
Item | KPI | Notes |
Social Media Posts | Impressions | // |
Search Console | Organic Clicks | // |
Blog Posts | Number | // |
Email Sent | Number | You can use also “Number of Opens” or “Number of Opened Emails” |
SMS Sent | Number | // |
Newsletter | Number | You can use also “Number of Opens” or “Number of Opened Emails” |
Context Variables
We don’t have a list of variables that can be used horizontally across different industries, as we use context variables to better refine our modeling and to give our users an holistic view that encompasses the contribution of promos, discounts, product launches, macro and microeconomic shifts.
🥊 An example:
An ecommerce/retail might use:
Discounts
Promos
Product Launches
Out-of-Stock periods
A Mortgage Broker might use:
Discounts
Promos
Interest rates value (as lower interest rates will drive more real estate purchases and thus the volume of conversions for brokers)
You can use this “template” to get an idea of what a starting dataset should look like
Date | Output_Variable | Meta_Prospecting_Spend | Meta_Retargeting_Spend | Meta_Shop_Spend | Google_Search_Spend | Google_Display_Spend | YouTube_Spend | TikTok_Spend | Organic_Social_Impressions | Organic_Search_Clicks | Email_Sent | Promo_Discount | Product_Launch | Macroeconomic_Indicator_1 | Macroeconomic_Indicator_2 |
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
YYYY-MM-DD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.... |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|