Skip to main content

What data you need to provide for our MMM implementation

Rashan avatar
Written by Rashan
Updated today

Data is often one of the biggest challenges in producing an accurate Marketing Mix Model and this guide should support you and your team in getting us the right data, in the right format.

The best way to get started is with a single dataset, with a column for each variable.

We can then work together to build integrations that will automate the data gathering and the model refreshes.

For all variables we need:

  • 104 weeks minimum of historical data (recommended)

  • Weekly or daily granularity

Formatting Notes

Dates should be in format YYYY-MM-DD. The model requires weeks to start on Mondays (Monday through Sunday).

You should not include currency or other symbols in both column titles and cells Decimal separator should be “.” (DOT) and not “,” (COMMA)

Data

The data we required can be organized in 4 groups:

  • Output Variable

  • Spending Variables

  • Organic Variables

  • Context Variables

Output Variable

Is the backbone of the model and is the metric we’ll try to improve with our Budget Allocations.

It needs to be:

  • Stable

  • Accessible

  • Consistent

Usually a great candidate for the output variable is Sales, but we can also model for other metrics such as: Conversions, Conversions from new customers, CAC etc.

💡Please Note

It’s important that we place our output variable in the right historical place.

If you’re an ecommerce/retail company, revenues and conversions should be accounted on the day they occurred, if you’re a lead generation business we strongly advise you to reconcile the sales by bringing it back to the day the lead was generated (as we’ll look for correlation that will not be easily found if we have the sales conversion cycle to account for).

🥊 An example:

  • An ecommerce/retail gets a new sales on 31/01 for 100$ → will have that 100$ and that conversion on date 31/01 of the dataset

  • A lead generation business gets a lead on 01/01 and converts on 31/01 for 100$ → will have that 100$ and that conversion on 01/01 (as media will not have any power to affect their conversion and the time it takes)

Media Spending

We need to include ALL media spending that occurred during the modeling window.

We’ll split our spending into campaign types as it will give us the best level of actionability (if we keep it too generic, eg Google, we will have an hard time to execute when the model recommends the amount of budget to be allocated into that - how much in search? and how much in PMax?).

Eg. Your spending on Meta needs to be splitted into:

  • Prospecting

  • Retargeting

  • Shop

  • Etc.

Your spending on Google will be splitted into:

  • Search Brand

  • Search Not Brand

  • PMax

  • Demand Gen

  • Etc.

💡Please Note

It will be the model to guide us to the optimal level of data granularity.

The model will assess:

  • If there are spending variables that are below 2.5% of spend-share

  • If there are spending variables that have too little historical data

  • If there are spending variables that are highly correlated with one another

You’ll have the chance to then group “problematic” variables into newly created variables that will be compliant with our modeling best-practices.

We only need the daily/weekly spending. No Clicks, no Impressions.

Again it is crucial that we account for our spending as accurately as possible.

🥊 Challenges can present themselves with offline media (TV, Radio, OOH etc.) where reporting is usually on the monthly basis.

Eg. If you spend 100k for 4 weeks of billboards in a major city you should account for that by splitting the monthly sum into weekly/daily numbers. It would be 25k/week or 3.3k/day

Organic Variables

Here we’re looking to get the impressions you got from your organic posting on social media, the clicks you got on your organic search console, the number of SMS or Emails you sent every day/week (non transactional).

Below a quick summary of the main details we will need:

Item

KPI

Notes

Social Media Posts

Impressions

//

Search Console

Organic Clicks

//

Blog Posts

Number

//

Email Sent

Number

You can use also “Number of Opens” or “Number of Opened Emails”

SMS Sent

Number

//

Newsletter

Number

You can use also “Number of Opens” or “Number of Opened Emails”

Context Variables

We don’t have a list of variables that can be used horizontally across different industries, as we use context variables to better refine our modeling and to give our users an holistic view that encompasses the contribution of promos, discounts, product launches, macro and microeconomic shifts.

🥊 An example:

  • An ecommerce/retail might use:

    • Discounts

    • Promos

    • Product Launches

    • Out-of-Stock periods

  • A Mortgage Broker might use:

    • Discounts

    • Promos

    • Interest rates value (as lower interest rates will drive more real estate purchases and thus the volume of conversions for brokers)

You can use this “template” to get an idea of what a starting dataset should look like

Date

Output_Variable

Meta_Prospecting_Spend

Meta_Retargeting_Spend

Meta_Shop_Spend

Google_Search_Spend

Google_Display_Spend

YouTube_Spend

TikTok_Spend

Organic_Social_Impressions

Organic_Search_Clicks

Email_Sent

Promo_Discount

Product_Launch

Macroeconomic_Indicator_1

Macroeconomic_Indicator_2

YYYY-MM-DD

YYYY-MM-DD

YYYY-MM-DD

YYYY-MM-DD

YYYY-MM-DD

YYYY-MM-DD

YYYY-MM-DD

YYYY-MM-DD

YYYY-MM-DD

YYYY-MM-DD

YYYY-MM-DD

YYYY-MM-DD

YYYY-MM-DD

YYYY-MM-DD

YYYY-MM-DD

YYYY-MM-DD

YYYY-MM-DD

YYYY-MM-DD

YYYY-MM-DD

YYYY-MM-DD

YYYY-MM-DD

YYYY-MM-DD

YYYY-MM-DD

....

Did this answer your question?