Kafka Trigger is responsible for the consumption of messages from a Kafka broker.


To use this trigger, it's necessary to get in touch with our Support Team to obtain the liberation.


This trigger has 2 configurable offsets commit strategies:

1. Commit with no delivery guarantee

All the messages received by the trigger are sent to the pipeline in a faster way, but with no delivery guarantee (that means, the pipeline return won't be waited for the message processing to be confirmed). With auto commit activated, we use the commit default implemented by Kafka. The message dispatch can be configured by:

  • Batch dispatch

All the messages received by the consumer polling will be sent together in an array. For example, if during this poll 10 messages are returned, then the trigger will send an array with these 10 messages.

  • One-at-one message dispatch

The dispatch to the pipeline will be made through the total array (only 1 message at a time). For example, if during this poll 10 messages are returned, then the trigger will send only 1 message at a time. In total, 10 messages dispatch will be made to the pipeline.

2. Commit with delivery guarantee

The trigger will be responsible for making the offsets commit, which will be made after the receival of a message of success from the pipeline. Only the batch dispatch of the messages is possible, through which all the messages received by the consumer polling will be sent together in an array.

Example: if during this poll 10 messages are returned, then the trigger will send an array with these 10 messages.

IMPORTANT: there might be a rebalancing of the consumer and/or Kafka's partitions. If that occurs between the pipeline answer return to the trigger, the offsets will receive the commit. It can cause losses or duplicated messages.


Autocommit "false" and Batch Mode "true"
In this option, the poll can bring a message array and its maximum size is defined by Max Poll Records. The messages go through commit only after the pipeline returns a successful transaction. If there's timeout during the pipeline deployment, the messages won't go through commit.


Autocommit "false" and Batch Mode "false"
In this option, the poll will send 1 message only and not a message array. That way, the messages dispatch/receival throughput decreases, but the guarantee of a successful processing is greater - which means, there's no messages loss.

IMPORTANT: if the Topic gets rebalanced in the Kafka Broker during the messages processing and the consumers have to take on other partitions, the messages will go through commit if there's an error in the end of the pipeline deployment. That way, the messages won't be processed in the following poll. To solve this issue, go for the Autocommit "false" and Batch Mode "false" configurations.

Take a look at the configuration parameters of the trigger:

  • Account: name of the account to be used.

  • Brokers: brokers of the server (HOST: PORT) used to send registers. To inform multiple HOSTS, you can separate them by comma. Example: HOST1:PORT1,HOST2:PORT2,...,HOSTn:PORTn

  • Topic: name of the topic that recovers the registers.

  • Protocol: protocol used to communicate with the brokers.

  • Consumer Group Name: a single string that identifies the consumer group which this consumer belongs to.

  • Auto Commit: if “true”, the message will pass automatically for commit as soon as it's received by the trigger; otherwise, the trigger will make the commit manually after the pipeline processing confirmation.

  • Send Batch: it can only be used with autoCommit - if "true", a poll of more than 1 message will be sent as array; otherwise, only 1 message at a time will be sent.

  • Max Poll Records: maximum number of registers recovered by a long poll.

  • Include Headers: if the option is enabled, the message headers will be included in the pipeline input payload.

  • Binary Headers: if the option is enabled, the input header values will be considered as binary and presented as a base64 representation. This option will be displayed only when Include Headers is enabled as well.

  • Headers Charset: name of the characters code to codify the header values (standard UTF-8). This option will be displayed only when Include Headers is enabled as well.

  • Maximum Timeout: how long a pipeline can be executed (in milliseconds).

  • Kerberos Service Name: value defined in the sasl.kerberos.service.name property configured in the Kafka broker server side.


IMPORTANT: we accept no more than 5MB of message dispatch per poll. It's not part of the standard to use Kafka to transmit big messages; We recommend you to use the (message.max.bytes) property in the broker for 1MB maximum. 

Consumers

The consumers configuration has direct impact on the messages input and output throughput when Kafka Trigger is activated. The ideal use scenario is to have the same configured consumers and partition quantity in a given topic.

If there're more consumers than partitions, the exceeding consumers will be idle until there's a partition increase. And, if this increase occurs, Kafka will start the consumers balancing process. 

Consumer Group

It's the consumer group to which your pipeline will make the subscription in Kafka's topic. A topic can have "n" Consumer Groups and each of them will have "n" consumers that consume the topic's registers.

  • Scenario 1

Let's say there's a topic named kafka-topic, a pipeline that uses a trigger configured by the consumer group (Consumer Group Name) named digibee and a second pipeline that uses a trigger configured with the same topic, but with a consumer group named digibee-2. In this case, both pipelines will receive the same messages.

  • Scenario 2

Let's say there's a topic named kafka-topic, a pipeline that uses a trigger configured by the consumer group (Consumer Group Name) named digibee and a second pipeline that uses a trigger configured with the same topic and consumer group (digibee). Both pipelines will receive the messages given by this topic. However, Kafka is in charge of balancing the partitions between the consumers registered in the two triggers. In this case, both pipelines will receive messages in an intercalated way, according to the partitions distribution. 

Technology

Authentication using Kerberos

To use the authentication via Kerberos in Kafka Trigger is necessary to have registered the configuration file “krb5.conf” in the Realm parameter. If you haven't done it yet, get in touch with us by the chat service. After finishing this step, all you have to do is to correctly set a Kerberos-type account and use it in the component.

Message format in the pipeline input

Pipelines associated with Kafka trigger receive the following message as input:

{
"data": [
{
"data": <STRING message content>,
"topic": <STRING The topic from which the record is received>,
"offset": <LONG The position of the record in the corresponding Kafka partition>,
"partition": <INT The partition from which the record is received>,
"success": <BOOLEAN Indicates whether the individual message was successfully consumed or not>,
"headers": {
"header1": "value1", … (when included)
}
}
],
"success": <BOOLEAN Indicates whether all the messages were successfully consumed or not>
}

Did this answer your question?