Redshift projects can use direct connect to reverse engineer entire schemas in seconds. Please refer to the instructions below to configure and perform a direct connection to your Redshift account.
Direct Connect
To establish a direct connection to Redshift, you will require the following details
Server: e.g., https://accout.region.redshift.amazonaws.com
Port: 5439 (default)
Database
Username
Password
Using these details, follow the highlighted steps to establish the connection (or get more details in the related articles at the end of this post):
โ
Navigate to the Reverse Engineering screen
Click the "Connect to DW" button
Enter your connection details from the section above
Connect and select the schema to import
Obtaining Redshift parameters
To obtain the connection parameters for your Redshift instance, head to the "Provisioned clusters dashboard" page in the AWS console
Click on the cluster you wish to connect to in order to see its related properties.
In the "Endpoint," you will find the connection string that contains the server, port, and database in the following format:
server:port/database
Required database permissions
To obtain DDL, SqlDBM direct connect runs several queries that reference metadata in the information_schema
and pg_catalog
. Access to the database schemas/catalogs and their contents is not required.
The connecting user will require read access to:
pg_catalog
information_schema
SELECT privilege on the table being imported
To reverse engineer from SqlDbm, the connecting user requires SELECT on the desired tables. This is due to the row-level security of the information_schema.tables
as well as the requirements to execute the generated SQL to create a table, which is granted via SELECT
privileges: show table {tablename}
. You can apply these privileges via the following commands:
GRANT SELECT ON TABLE {schema.table} TO {user}
or
โGRANT SELECT ON ALL TABLES IN SCHEMA {schemaName} TO {user}
Without SELECT
privileges on the desired table, the row will not appear in information_schema.tables
and you cannot run show table {tablename}
For reference, the following SQL is run to obtain the DDL:
--Get schema list
SELECT table_schema as schema_name
FROM information_schema.tables
WHERE table_schema not in ('pg_internal', 'pg_catalog', 'information_schema')
AND table_catalog = @database
GROUP BY 1
ORDER BY 1
;
--------------------
--Create schema ddl
SELECT
'CREATE SCHEMA ' + QUOTE_IDENT(nspname) +
CASE
WHEN nspowner > 100
THEN ' AUTHORIZATION ' + QUOTE_IDENT(pg_user.usename)
ELSE ''
END
+ ';' AS ddl
FROM pg_catalog.pg_namespace as pg_namespace
LEFT OUTER JOIN pg_catalog.pg_user pg_user
ON pg_namespace.nspowner=pg_user.usesysid
WHERE nspName = any(@schemas)
ORDER BY nspname
;
--------------------
--Create table ddl
SELECT 'SHOW TABLE ' + quote_ident(t.table_schema) + '.' + quote_ident(t.table_name) + ';' as sqlText
FROM information_schema.tables t
WHERE t.table_schema = any(@schemas)
AND t.table_type = 'BASE TABLE'
;
IP allowlisting and private networking
Corporate networking and security policies may interfere with connectivity between SqlDBM and Databricks. If you are unable to connect despite following the guidelines in this article, please make sure that IP blocking or private networking is not interfering.
An easy way to diagnose this type of issue is by attempting to connect to your Databricks account from a home computer, without using a VNP or any corporate software.
For instructions on allowing SqlDBM through the firewall or private network, please see the related article at the bottom of this page.