What are a parent task and a child task?
In Octoparse, you can associate more than 2 tasks and less than 100 tasks together by the field of URL in the extracted field, meaning the previous task can provide URLs for the next task. The task which provides the URL is called a parent task, while the task which uses the URLs from a parent task is called a child task.
How to set up the child task?
There are two methods to set up the child task, depending on whether a child task is ready or not.
1. If the child task is ready:
Click the Loop Item box
Click the Edit button next to the Manual Input box
On the following page, choose Import from task
Select the right parent task
Select the correct data field in the parent task (which contains URLs)
Click Save to save the settings
Then you have successfully transferred URLs from a parent task into a child task.
Notes:
Inputting URLs from another task are only supported by Octoparse Cloud Extraction.
If there is no data extracted in the parent task, you'll need to paste at least one URL manually to start configuring the child task.
Check Batch URL input if you'd like to know more about how to input URLs.
2. If the child task is not ready:
Click the +New button at the sidebar on the Octoparse home screen
Choose Custom Task
When you are on the setting page, repeat the same steps in method 1.
Choose Import from task
Select the right parent task
Select the correct data field in the parent task (which contains URLs)
Click Save to save the settings
If the parent task doesn't have any results yet, there will be a notification as Invalid URLs found. Please make sure you have run the parent task before setting the child task.
How to schedule parent tasks and child tasks
Scheduling a parent task is just the same as normal tasks. You can check Schedule tasks to run.
For child tasks, you can schedule it to run based on the status of its parent task.
Click on Run -> Parent task settings
You can choose to run the child task once the parent task is completed or stopped.
Tips:
|