Stream File Reader Pattern reads a local text file in blocks of line according to the configured pattern and triggers subpipelines to process each message. This resource must be used for large files.

Take a look at the configuration parameters of the component:

  • File Name: name of the local file.

  • Tokenizer: XML, PAIR and REGEX. By using the XML option, it's possible to inform the name of the XML tag for the component to send thr block that has it. By using the PAIR option, it's possible to configure start token and an end token for the component to return to the subflow all the lines between both tokens. By using the REGEX option, it's necessary to inform a regular expression for the component to return the block between the regular expressions.

  • Token: token to be used to search the pattern in the informed file.

  • End Token: end token. Used for PAIR Tokenizer only.

  • Include Tokens: for the inclusion of start and end token. Used for PAIR Tokenizer only.

  • Group: whole value that determines the grouping value returned by the component when finding a match with the defined pattern.

  • Element Identifier: attribute to be sent in case of errors.

  • Parallel Execution Of Each Iteration: occurs in parallel with the loop execution.

  • Fail On Error: when activated, this parameter suspends the pipeline execution only if there’s a severe occurrence in the iteration structure, disabling its complete conclusion. The “Fail On Error” parameter activation doesn’t have any connection with the errors occurred in the components used for the construction of the subpipelines (onProcess and onException).

Messages flow

Input

{
"filename": "fileName"
}

“Local File Name” substitutes the local pattern file.

Output

{
"total": 0,
"success": 0,
"failed": 0
}

  • total: total number of processed lines

  • success: total number of successful processed lines

  • failed: total number of lines of whose processing failed

IMPORTANT: to know if a line has been correctly processed, each processed line must return { "success": true }.

The component throws an exception if the “File Name” doesn't exist or can't be read.

The files manipulation inside a pipeline occurs in a protected way. All the files can be accessed with a temporary directory only, where each pipeline key gives access to its own files set.

Stream File Reader Pattern makes batch processing. To better understand the process, click here.

Stream File Reader Pattern in Action

See below how the component behaves in a determined situation and what its respective configuration is.

  • Using XML Tokenizer XML and searching tags information that can be in multiple lines

Given that the following XML file must be read:

file.xml

<m:documents>
<m:hashes>
<m:hashe>4rt4</m:hashe>
<m:hashe>6565g</m:hashe>
</m:hashes>
<m:orders xmlns:m="urn:shop" xmlns:cat="urn:shop:catalog">
<m:order>
<id>1</id><date>2014-02-25</date>
</m:order>
<m:order>
<id>2</id><date>2014-02-25</date>
</m:order>
</m:documents>

Configuring the component to return just the XML block of the "order" tag:

File Name: file.xml

Tokenizer: XML

Token: order

The result will be 2 subflows containing the values that are inside the “order” tag:

First:

<m:order>
<id>1</id><date>2014-02-25</date>
</m:order>

Second:

<m:order>
<id>2</id><date>2014-02-25</date>
</m:order>

  • Using the PAIR Tokenizer to read a file where there's a start token and an end token for each block

file.txt

###
Log1: Log info
Log2: Log info
--###
###
Log1: Log info
--###
###
Log1: Log info
Log2: Log info
Log3: Log info
--###

File Name: file.txt

Tokenizer: PAIR

Token: ###

End Token: --###

Include Tokens: disabled

The result will be 3 subflows containing the values that are inside the start (###) and end tokens (--###):

First:

Log1: Log info
Log2: Log info

Second:

Log1: Log info

Third:

Log1: Log info
Log2: Log info
Log3: Log info

  • Using REGEX Tokenizer to search all the lines among patterns

file.txt

ID-3591d344-d74f-446e-867a-210d17345b50
Some text
xpto
ID-033e8b36-6b1e-42e8-aeb1-dc8498ffa6cb
Other text
xxx

The following pattern must be searched:

ID-\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b

File Name: file.txt

Tokenizer: REGEX

Token: ID-\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b

The result will be 2 subflows containing the values that match with the informed REGEX pattern.

First:

Some text
xpto

Second:

Other text
xxx

  • Using the REGEX Tokenizer to search all the lines among patterns and grouping every 2 results

file.txt

ID-3591d344-d74f-446e-867a-210d17345b50
Some text
xpto
ID-033e8b36-6b1e-42e8-aeb1-dc8498ffa6cb
Other text
xxx

The following pattern must be searched:

ID-\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b

File Name: file.txt

Tokenizer: REGEX

Token: ID-\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b

Group: 2

The result will be 1 subflow containing the values that match the informed REGEX pattern.

Some text
xpto
ID-\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b
{12}\\b
Other text
xxx

When the REGEX Tokenizer is used to group, the pattern found as output is shown.

IMPORTANT: if the pattern informed in the file isn't found, then the return will be an execution of the whole file. Be careful when specifying the REGEX.

Did this answer your question?