Skip to main content
BigQuery Schema Normalization

Understanding how Flatly sanitizes your data for BigQuery

Support avatar
Written by Support
Updated over a year ago

BigQuery requires uniform data with consistent field types.

Example: Consider the "Deals" data set from a CRM. In all Deals a company name must be a String [i.e. a text word]. If any Deal has 1234 or [ Acme, Nike, Pepsi ] as a company name, BigQuery will get confused and reject it. BigQuery cannot unilaterally decide which is right: a String or an Integer or an Array.

Flatly performs basic normalization and sanitization on your data before it is uploaded to BigQuery.
​
Flatly will remove any fields that are abnormal relative to those fields defined in the first record fetched from your data source.

In our example: company_name will be removed from all Deals because it is abnormal.

To see a reason for each removal, do the following:

  1. In your job settings make sure Attach Metadata is enabled.

  2. Run your job.

  3. Look for the _metadata table created in your BigQuery data set, then follow these instructions: https://cloud.google.com/bigquery/docs/export-file
    ​
    ​Copy to Clipboard is the best because this metadata is plaintext. You can paste it into any document editor you prefer.

  4. If you decide the removed fields are of analytical importance, go about editing your records in your app (CRM, SaaS, etc.) to make sure they are all normalized.

  5. Rerun your job and the fields important to you will be included if you sufficiently normalized them.

Did this answer your question?