Stream DB V3

Know the component and how to use it.

Micaella Mazoni avatar
Written by Micaella Mazoni
Updated over a week ago

IMPORTANT: This documentation has been discontinued. Read the updated Stream DB V3 documentation on our new documentation portal.

Stream DB V3 allows the establishment of a connection with a service that supports the JDBC (Java Database Connectivity) protocol and the execution of the SQL (Structured Query Language) instructions.

Differently from the DB V1 component, Stream DB has been designed to make execution in batches, which means, each return (resulting line or row) of the executed SQL instruction is individually treated through a subpipeline, being able to have its own processing flow. Learn more about subpipelines by clicking here.

Take a look at the configuration parameters of the component:

  • Account: for the component to make the authentication to a JDBC service, it's necessary to use a BASIC or KERBEROS-type account (check the topic "Authentication via Kerberos").

  • Database URL: URL (Uniform Resource Locator) to establish connection to the database server with support to the JDBC protocol. This parameter supports Double Braces.

  • SQL Statement: SQL instruction to be executed.

  • Column Name: If an error occurs while processing the On Process subpipeline, the value associated with the column defined in this field will be added to the error message in a new field called "processedId" that can be handled by the On Exception subpipeline. See the following example:

{
"timestamp": 1600797662733,
"error": "Error message",
"code": 500,
"processedId": "2"
}

  • Parallel Execution Of Each Iteration: when activated, this option causes each one of the passes through the pipeline to be made in parallel, reducing the total execution time. However, there's no guarantee that the items will be executed in the order returned by the database.

  • Blob As File: if activated, this option causes the blob-type field to be stored in the pipeline context as files; otherwise, the fields are stored as normal texts (strings) and coded in base64, as follows:

// "Blob As File" true
{
"id": 12,
"blob": "d3X8YK.file",
}

// "Blob As File" false
{
"id": 12,
"blob": "iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAIAAACQkWg2AAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAAAeSURBVDhPY1Da6EMSYiBJNVDxqAZiQmw0lAZHKAEAaskfEED3lr0AAAAASUVORK5CYII="
}

  • Clob As File: if activated, this option causes the clob-type field to be stored in the pipeline context as files; otherwise, the fields are stored as normal texts (strings), as follows:

    // "Clob As File" true
    {
    "id": 15,
    "clob": "f7X9AS.file",
    }

    // "Clob As File" false
    {
    "id": 15,
    "clob": "AAAAABBBBBCCCC”

  • Charset: this option will only be shown in case the Clob As File option is activated. This parameter allows you to set the Clob file encoding.

  • Fail On Error: when activated, this parameter suspends the pipeline execution only if there’s a severe occurrence in the iteration structure, disabling its complete conclusion. The “Fail On Error” parameter activation doesn’t have any connection with the errors occurred in the components used for the construction of the subpipelines (onProcess and onException).

  • Custom Connection Properties: specific connection properties defined by the user.

  • Keep Connections: if activated, the option will keep the connection with the database for a maximum of 30 minutes; otherwise, it will be for 5 minutes only.

  • Advanced: advanced configurations.

  • Output Column From Label: for some databases, it's important to keep this option activated if your SELECT is using any alias, because that way you guarantee the name of the column will be shown exactly like the configured alias.

  • Connection Test Query: SQL instruction to be executed before each connection is established (i.e. select 1 from dual) - this parameter is optional and must be applied only to databases that don't have reliable information about the connection status.

Technology

Authentication via Kerberos

To make an authentication to a database via Kerberos, it's necessary to:

  • inform a KERBEROS-type account

  • set a main Kerberos principal

  • set a keytab (that must be the base64 of the own generated keytab file)

Messages flow

Message structure available in the onProcess subpipeline

Once the SQL instruction is executed, the subpipeline will be triggered receiving the execution result through a message in the following structure:

{
"column1": "data1",
"column2": "data2",
"column3": "data3"
}

Output with error

{
"code": error_code,
"error": error message,
"processId": the_id_column_value
}

Output

After the component execution, a message is returned in the following structure:

{
"total": 0,
"success": 0,
"failed": 0
}

  • total: total number of processed lines

  • success: total number of successfully processed lines

  • failed: total number of lines whose processing failed

IMPORTANT: to detect if a line has been correctly processed, each onProcess subpipeline must respond with { "success": true } to each processed element.

Stream DB V3 makes batch processing. To better understand the concept, click here.

Connection pool

By standard, we use a pool based on the configurations of the deployed pipeline. If it's a SMALL pipeline, then the pool size will be 10; for the MEDIUM it will be 20 and for the LARGE 40.

It's possible to manage the pool size during the implantation as well. For that, it will be necessary to enable the "Pool Size By Actual Consumers" property in the component. With it, it will be used whatever is manually configured in the implantation screen.

In the example of the image above, a SMALL pipeline with 5 concurrent executions was configured. If you want the pool of the database components (DB V2 and Stream DB V3) to use this size, it's necessary to enable the “Pool Size By Actual Consumers Executions” property in all the existing components.

Be extra careful when configuring the pool size manually so there's no deadlock in concurrent calls to the same database.

Our pool is shared between the database components that access the same database inside the pipeline. If you need an exclusive pool for a determined component, enable the "Exclusive Pool" property.

Did this answer your question?