Stream XML File Reader performs the reading of a local XML file and, based on the configuration of a desired node and context fields, delivers a XML structure and context properties for each node found and triggers subpipelines to process each message. The component should be used for large files when parts of the whole need to be read efficiently.

Take a look at the configuration parameters of the component:

  • File name: name of the local XML file.

  • Charset: name of the characters code for the file reading (standard UTF-8).

  • Node Path: path of the desired node to stream from the XML file (e.g.: //root/level1/level2/desirednode).

  • Context Paths: define here tag paths that represent fields that add context to the desired node (e.g.: //root/node1/code,//root/node2/description).

  • Ignore Paths: define here paths that will be ignored and not returned into the desired node (e.g.: //root/node1/email,//root/node2/city).

  • Ignore Nested Child Nodes: if the option is enabled, the nested child code (nodes that are not direct children to the desired node) will be ignored. In this case, the node of the same level of the desired node will be returned, but the nodes below it will be ignored.

  • Element Identifier: attribute that will be sent in case of errors.

  • Parallel Execution Of Each Iteration: occurs in parallel with the loop execution.

  • Fail On Error: when enabled, this parameter suspends the pipeline execution if there’s a severe occurence in the iteration structure, disabling its complete conclusion. The “Fail On Error” parameter activation doesn’t have any connection with the errors occurred in the components used for the construction of the subpipelines (onProcess and onException).

Messages flow

Input

No specific input message is expected, but the existence of a XML file in the pipeline local directory and the filling of the “File Name” and “Node Path” fields for the file processing.

Output

{
"total": 0,
"success": 0,
"failed": 0
}

  • total: total number of processed lines

  • success: total number of successfully processed lines

  • failed: total number of lines whose process failed

IMPORTANT: when the lines are correctly processed, their respective subpipelines return { "success": true } for each of them.

The component throws an exception if the “File Name” doesn’t exist or can’t be read.

The file manipulation inside a pipeline occurs in a protected way. All the files can be accessed with a temporary directory only, where each pipeline key gives access to its own files set.

Stream XML File Reader makes batch processing. To better understand the concept, click here.

Stream XML File Reader in Action

The following scenarios are based on the following XML file:

File name: file.xml

Content:

<?xml version="1.0" encoding="UTF-8"?>
<root>
<list-info qty="4">products</list-info>
<products>
<product>
<price>20.75</price>
<product>Chair</product>
<tags>
<element>NEW</element>
<element>FURNITURE</element>
</tags>
</product>
<product>
<price>399.99</price>
<product>TV</product>
<tags>
<element>NEW</element>
<element>FURNITURE</element>
</tags>
</product>
<product>
<price>100</price>
<product>Couch</product>
<tags>
<element>NEW</element>
<element>FURNITURE</element>
</tags>
</product>
<product>
<price>78.99</price>
<product>Table</product>
<tags>
<element>NEW</element>
<element>FURNITURE</element>
</tags>
</product>
</products>
</root>

Streaming the file informing the desired node

Input

  • file.xml

File Name: file.xml

Node Path: //root/products/product

Output

{
"total": 4,
"success": 4,
"failed": 0
}

Each element identified by the desired node path will be processed independently:

  • First subflow

{
"node":"<product><price>20.75</price><product>Chair</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}

  • Second subflow

{
"node":"<product><price>399.99</price><product>TV</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}

  • Third subflow

{
"node":"<product><price>100</price><product>Couch</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}

  • Forth subflow

{
"node":"<product><price>78.99</price><product>Table</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}

Streaming the file informing the desired node and context fields:

Input

  • file.xml

File Name: file.xml

Node Path: //root/products/product

Context Paths: //root/list-info

Output

{
"total": 4,
"success": 4,
"failed": 0
}

Each element identified by the desired node path will be processed independently:

  • First subflow

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>20.75</price><product>Chair</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}

  • Second subflow

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>399.99</price><product>TV</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}

  • Third subflow

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>100</price><product>Couch</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}

  • Forth subflow

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>78.99</price><product>Table</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}

Streaming the file informing the desired node, context fields and nodes to be ignored

Input

  • file.xml

File Name: file.xml

Node Path: //root/products/product

Context Paths: //root/list-info

Ignore Paths: //root/products/product/tags

Output

{
"total": 4,
"success": 4,
"failed": 0
}

Each element identified by the desired node path will be processed independently:

  • First subflow

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>20.75</price><product>Chair</product></product>"
}

  • Second subflow

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>399.99</price><product>TV</product></product>"
}

  • Third subflow

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>100</price><product>Couch</product></product>"
}

  • Forth subflow

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>78.99</price><product>Table</product></product>"
}

Streaming the file informing the desired node and ignoring nested child nodes

Input

  • file.xml

File Name: file.xml

Node Path: //root/products/product

Ignore Nested Child Nodes: enabled

Output

{
"total": 4,
"success": 4,
"failed": 0
}

Each element identified by the desired node path will be processed independently:

  • First subflow

{
"data": {
"node": "<product><price>20.75</price><product>Chair</product><tags></tags></product>"
},
"success": true
}

  • Second subflow

{
"node": "<product><price>399.99</price><product>TV</product><tags></tags></product>"
}

  • Third subflow

{
"node": "<product><price>100</price><product>Couch</product><tags></tags></product>"
}

  • Forth subflow

{
"node": "<product><price>78.99</price><product>Table</product><tags></tags></product>"
}

Additional information

Stream XML File Reader uses an event reading mechanism, through which each type of data present in the file is an event to be processed. With that, there are some types of events that are not covered during the file stream. These are they:

  • PROCESSING INSTRUCTION

  • START DOCUMENT

  • END DOCUMENT

  • SPACE

  • ENTITY REFERENCE

  • ENTITY DECLARATION

  • DTD

  • NOTATION DECLARATION

Did this answer your question?