What is Batch URL input?
The Batch URL input feature is importing a large number of URLs into Octoparse. Octoparse supports batch/bulk URL import from local files (text or spreadsheet), from another task, or even generates the URLs based on a pre-defined pattern.
How to batch input URLs?
Click +New from the sidebar menu and select Custom Mode and you will see the URLs importing panel.
There are three ways to batch import URLs to any single task/crawler (up to a million URLs):
TIP: Once the number of imported/generated URLs reaches the limit of 1 million, Octoparse would stop importing/generating immediately. |
1. Import URLs from a file
You can import URLs from any of these file formats: CSV/ TXT/ Excel (.xlsx & .xls)
Select "Import from file".
Click "Select" then choose the file containing the URLs and then select the sheet and column that contains the URLs.
Click "Save" to complete the import process.
NOTE:
|
2. Import URLs from another task
This feature makes it possible to integrate two tasks seamlessly when URL extraction needs to be done separately with another task. No more manual URL export and import are needed.
Select "Import from task".
Select the task containing the target URLs, then specify the proper data field.
Click "Save" to complete the import process.
Note that the selected task (one that contains the URLs needed for more crawling) is referred to as the parent task, and the new task to be configured with the URLs becomes the child task. Two tasks will be associated automatically and can be executed in association with one another.
TIPS: 1. You can set up to run the child task according to the status of the parent task in the Cloud. If you set up an associated run by selecting an option from Parent task settings, both tasks will be executed in the cloud via Octoparse Cloud Service. The associated run is not available for Local Extraction. 2. When an associated run is set up, task scheduling is not available for running the child task. 3. Importing from another task supports importing more than 1 million URLs. |
3. Batch generate URLs based on a pre-defined pattern
With the "Batch generate" feature, you can easily generate a large number of URLs following specific patterns by modifying various parameters of one given URL.
Select "Batch generate".
Input one URL as a base for batch generating.
Highlight the selected URL parameter and click "Add parameter".
Select from the four Parameter Type options to define the pattern you need and click "Save URL" to save the list.
Four Parameter Type options
1. Numbers
You can enter the initial number, choose to increase(+) or decrease(-) a number every time, and enter Repeat or an end value. For example, if you want to generate URLs for different pages, you may need to set up the parameter of page number from 1 to 100. You should enter the initial number as 1, every time + 1, and Repeat 100 times. The end value will be automatically filled as 100.
2. Letters
You can enter the starting letter and the ending letter.
3. Time
4. Custom list
You can enter your own list, like a list of search keywords or product numbers.
TIP: You can set up multiple parameters to generate URLs. For example, if the base URL is www.octoparse.com/[parameter1]/[parameter2] Parameter1={A, B}, Parameter2={1, 2} The final URL list would be like: |