All Collections
Forward & Reverse Engineering
Google BigQuery - Direct Connect and Reverse Engineer
Google BigQuery - Direct Connect and Reverse Engineer

Reverse engineer BigQuery DDL manually or using direct connection

S
Written by Serge Gershkovich
Updated over a week ago

Google BigQuery (GBQ) projects can use direct connect to reverse engineer entire schemas in seconds. Users can obtain the DDL from GBQ for manual upload or use the direct connect option.

To use Direct Connect for GBQ, you will need a user with the requisite permission to obtain DDL and a private key file to authenticate. SqlDBM obtains the DDL by running a query against the information schema. The instructions for generating the key file, the SQL script, and a user with minimal required permissions to run it are covered in this article.

Creating a service account with metadata access

To use Direct Connect for GBQ, you will need a user with the requisite permission to obtain DDL. If such a user already exists, please skip to the next section, which describes generating the key file.

Required Permissions

The following object privileges will be required for obtaining BigQuery DDL.

(The DDL script can be located at the end of this article)

Permission

Description

Note

Bigquery.jobs.create

Run jobs (including queries) within the project

Required to run a job within the selected dataset in order to obtain the list of objects and DDL for each of them.

Bigquery.datasets.get

Get metadata about a dataset.

Required to obtain the information of datasets (schemas)

Bigquery.routines.get

Get routine definitions and metadata.

Required to obtain DDL of procedures and functions

Bigquery.routines.list

List routines and metadata on routines.

Required to get list of procedures and functions

Bigquery.tables.get

Get table metadata

Required to obtain DDL of tables and views

Bigquery.tables.list

List tables and metadata on tables.

Required to get list of tables and views

Create a Custom Role

  • Open the panel

  • IAM & Admin > Roles

  • Select "Create Role" from the top menu

  • Give your role a name and add a description

Now, add the required permissions.

  • Select "Add Permissions"

  • A pop-up will appear. Search for "BigQuery" in the filter

  • Select "BigQuery Job User" and "BigQuery Metadata Viewer"

  • Select "Ok"

  • Also add the following permissions

    1. Bigquery.datasets.get

    2. Bigquery.jobs.create

    3. Bigquery.routines.get

    4. Bigquery.routines.list

    5. Bigquery.tables.get

    6. Bigquery.tables.list

  • Review that the permissions you have added are correct.

  • Click "Create" to complete the process.

Direct Connect using a private key file

Users can connect securely and directly to their GBQ account from SqlDBM to obtain the DDL for selected schemas. To do so, you will need to generate a private key file from the Google Cloud Console (GCC) using the instructions provided here.

Obtain a private key file

Use the steps below to generate a private key file from your GCC.

1. Select your GBQ project and head to the "Credentials" area of your GCC using the menu or the following link to begin the process.

2. Click Create Credentials and select Service account (or go to step 4 if a service account already exists)

3. In your Service account details, add two new roles: BigQuery Data Viewer and BigQuery Job User. The latter is needed to run a query job to obtain table metadata from the INFORMATION_SCHEMA. BigQuery Admin role will also work but this may be a security concern.

Click "Done" to finish creating the Service Account

4. Click on the Service Account to open it and click on the "KEYS" tab.

4.1.

4.2 Click, "Create new key"

5. Select "JSON" and click "Create"

Save the generated file in a secure location.

Direct Connect

Once you have a private key file created, you can direct connect to pull DDL from your associated GBQ project. Direct connection parameters can be obtained by uploading the key file directly (a) or entering them separately into the UI (b).

a. Key File upload

Click "Drop your file" and select your private key file. You will be authenticated and taken to the list of datasets.

Skip to the "Import datasets" section below to select the datasets whose DDL you wish to import.

b. Provide key file parameters

Instead of uploading the key file, you can enter the necessary parameters directly into the UI. These include:

  • Client email – this can be found in the "Details" tab of your GCC Service Account or in the key file as the value for "client_email".

  • Private key – The full value contained in double quotes (including "-----BEGIN") in the key file for "private_key".

  • Project id – the project id from the GBQ console or the key file in the "project_id" parameter.

Note that the direct connection and its parameters will only persist in the browser session. These credentials are not stored or recorded by SqlDBM.

Click the "Connect to BigQuery" button to authenticate and import your datasets.

Import datasets

Select the desired datasets and press "Apply" to retrieve their DDL details.

The following options are also available:

  • Select all – highlight all datasets in your project

  • Only selected – filter and display only selected (checked) datasets

Hit the "Apply" button to retrieve the DDL.

Upload and review the DDL and import it into the project following the general "Reverse Engineering" instructions (related article link at the bottom).

Upload DDL from GBQ

DDL is obtained from GBQ by querying the INFORMATION_SCHEMA directly. Unfortunately, GBQ export settings for query results insert quotation marks around strings that may complicate DDL parsing. For this reason, we ask that you use the following query to generate a JSON file, which you can upload straight into SqlDBM.

Clean DDL without GBQ-generated quotation marks can be uploaded or pasted into the reverse engineering screen as usual (see the "Reverse engineering" link at the end of this article).

Query the DDL

Please run the following query in GBQ to obtain the required JSON string. The query only includes the statements for supported object types. More information schema tables will be added to this code as new GBQ object support is unveiled.

Replace <DATASET_NAME> with the name of your dataset before executing.

SELECT STRING_AGG(statement, '\n') FROM (
SELECT ddl AS statement, 1 AS sort_order FROM INFORMATION_SCHEMA.SCHEMATA WHERE schema_name = '<DATASET_NAME>'
UNION ALL
SELECT ddl AS statement, 2 AS sort_order FROM <DATASET_NAME>.INFORMATION_SCHEMA.TABLES
UNION ALL
SELECT ddl || ";" as statement, 3 as sort_order FROM <DATASET_NAME>.INFORMATION_SCHEMA.SEARCH_INDEXES
ORDER BY sort_order
);

Export to JSON and upload to SqlDBM

Once you successfully run the query to obtain the DDL string, export it to a JSON file using the corresponding option in GBQ.

Once the file is saved, upload it into SqlDBM using the reverse engineering screen (more details at the link at the end of this article).

The end-to-end process can be seen in the following video:

IP allowlisting and private networking

Corporate networking and security policies may interfere with connectivity between SqlDBM and GBQ. If you are unable to connect despite following the guidelines in this article, please make sure that IP blocking or private networking is not interfering.

An easy way to diagnose this type of issue is by attempting to connect to your GBQ account from a home computer, without using a VNP or any corporate software.

For instructions on allowing SqlDBM through the firewall or private network, please see the related article at the bottom of this page.


See also:

Did this answer your question?