Pipeline Engine

Know the execution engine of the pipelines.

Micaella Mazoni avatar
Written by Micaella Mazoni
Updated over a week ago

IMPORTANT: This documentation has been discontinued. Read the updated Pipeline Engine documentation on our new documentation portal.

The Digibee Platform uses an engine that not only interprets the pipelines built through the interface, but also executes them. This engine is called Pipeline Engine. 

               
Take a look at the concepts and the operation architecture of Pipeline Engine. 

                      

Concepts

Have the high-level view of Pipeline Engine in relation to our other components of the Platform:

  • Pipeline Engine: component responsible for the execution of the flows built in our Platform  

  • Trigger: component that receives invocations from different technologies and forwards them to Pipeline Engine

  • Queues mechanism: queues management central mechanism of  our Platform 

                           

Operation Architecture

Each flow (pipeline) is converted into a Docker container, executed with the Kubernetes technology - base of the Digibee Platform. See the main guarantees of this operation model:

  • isolation: each container is executed in the infrastructure in an exclusive way (the memory spaces and CPU consumption are exclusive for each pipeline)

  • safety: pipelines DON'T talk to each other, unless it happens through the interfaces provided by our Platform 

  • specific scalability: it's possible to increase the number of pipelines "replicas" in a specific way, which means, increase or decrease that demands more or less performance 

                               
Pipeline Sizes

We use the serverless concept, that means, you don't need to worry about infrastructure details for the execution of your pipelines. That way, each pipeline must have its size defined during the implantation. The Platform allows the pipelines to be used in 3 different sizes:

  • SMALL, 10 concurrent executions, 20% of 1 CPU, 64 MB memory

  • MEDIUM, 20 concurrent executions, 40% of 1 CPU, 128 MB memory

  • LARGE, 40 concurrent executions, 80% of 1 CPU, 256 MB memory


Concurrent Executions

On top of each pipeline size, you define the maximum number of concurrent executions that the pipeline allows. Depending on the size, a maximum number of concurrent executions can be configured for the execution. That way, a pipeline with 10 concurrent executions can process 10 messages in parallel, while a pipeline with 1 concurrent execution can process 1 message only.                              


Resources (CPU and Memory)

Besides, the pipeline size also defines the performance and the amount of memory it has access to. The performance is defined by the cycles quantity of a CPU to which the pipeline has access to. On the other hand, the memory is given by the addressable space to treat messages and consume information. 

                           
Replicas

The messages are sent to the queues mechanism of the Platform through triggers, that are the entry points for the pipelines execution. There're triggers for different types of technology, such as REST, HTTP, Scheduler, Email, etc. As soon as the messages arrive at the queues mechanism, they become available for the pipelines consumption. During the pipeline implantation, you must define its replicas number. A SMALL pipeline with 2 replicas has twice the processing and scalability power and so on. Besides providing more processing and scalability power, the replicas guarantee more availability - if one of the replicas fail, there're others that take over.

Generally, the replicas deliver horizontal scalability, while the pipeline size delivers vertical scalability. For this reason, even if a pipeline of LARGE size is equivalent to 4 pipelines of SMALL size according to the infrastructure logic, it doesn't mean they're equivalent to all work loads. In many situations, mainly the ones that involve the processing of big messages, only the use of "vertically" bigger pipelines deliver the expected result. 

                               
Timeouts and Expiration

Triggers can be configured with the following main time control types for the messages processing:

  • timeout: indicates the maximum amount of time the trigger waits for the pipeline return

  • expiration: indicates the maximum amount of time a message can be at the queue until being captured by a Pipeline Engine

                                           
The timeout configuration is possible for all the triggers, but only some of them allow the expiration to be configured. That happens because the triggers characteristic can be synchronous or asynchronous. 

The events trigger makes executions in a synchronous way and can keep messages on queue for a long time until they're consumed. For this reason, it makes sense to define the expiration time of the messages produced by this type of trigger. 

However, REST Trigger depends on the pipeline answer to give a return. In this case, the expiration configuration doesn't make sense. Even so, internally, the synchronous triggers estimate the expiration time, non-configurable, for the messages. It guarantees that the messages don't get lost in the queueing process.  

                           
Execution Control

The messages processing is sequentially made for each concurrent execution. Therefore, if a pipeline is implemented with 1 concurrent execution, 1-by-1 messages are sequentially processed. While the message is processed, it receives a mark and no other pipeline replica will be able to receive it for processing. If a pipeline has any problem in the execution and needs to restart (eg.: OOM, crash, etc.), then the messages under execution will be returned to the queue mechanism. 

                               
Messages returned to the processing queue 

Messages returned to the processing queue become available for the consumption of other pipeline replicas or even of the same replica that has an issue and had to be restarted.

In this case, it's possible to configure the pipeline to define the approach in cases of messages reprocessing.  

All the triggers of the Platform have a configuration named "Allow Redeliveries". When this option is activated, the pipeline accepts the message to be reprocessed. When inactivated, the pipeline receives the message, detects it's a reprocessing and declines it with an error. 

It's also possible to detect if the message under execution is reprocessed or not. To do that, just use the Double Braces language by accessing the metadata scope. 

Example:  

{{ metadata.execution.redelivery }} 

Did this answer your question?