A pipes robot connects other robots and data together - and can apply transformation to that data. Pipes can also interact with 3rd party services like API's, send HTTP requests and more. We often refer to Pipes as a sort of super robot as it is able to control and govern other robots.
A few examples as to what you can use it for:
- Execute multiple extractors and/or crawlers, connecting outputs to inputs
- Transform text and images
- Perform calculations
- Join separate data sets
- Parse JSON and XML
- Send HTTP requests
- Send data to an SQL database
- Read data from datasets
- Do lookups in a dictionary
- Interact with 3rd party applications like Google Geocoding or Foursquare
In real world use, this means easier price comparisons, simple image resizing, unifying data sets from multiple web sites and also normal Extract-Transform-Load (ETL) operations.
A pipes robot contains nodes, each of which performs a specific action. To add nodes to your pipe simply drag and drop them from the right-hand list of actions onto the pipes grid:
You can also debug pipes robots - both while they're running and once they've finished. You do this by opening up the execution and clicking the green "bug" icon found next to each result.
Due to the extremely flexible nature of Pipes robots as well as for technical reasons, there are some limitations on their design and the amounts of data they support. Read more below.
Multiple inputs to the same connector
A node in a Pipes robot needs values for all connectors that have inputs before it can run. However, if a single input connector receives values from multiple nodes, it can lead to unpredictable results:
In the example above, the "title" connector has multiple input values.
Dexi will try to figure out how to use the inputs but this might not always produce the result that you expect. We advice against using this design.
Unlimited number of rows not supported
Pipes robots are fully streaming, ie. they process rows on a true row-by-row basis but for technical reasons, it might not always support very large data sets.
We cannot provide a hard-fast limit because it depends on the size of each row as well as the Pipes robot design. The recommendation therefore is to create multiple Pipes robots when needed.
Specifically, the "Collect" (and "Collect Bulk") action can cause issues. The action awaits all input data from all previous (ancestor) nodes before it will run. An execution of a Pipes robot with this issue can fail with an empty result log (because the process running the Pipes robot runs out of memory and crashes).