Skip to main content
Batch URL input
Updated over 8 months ago

What is Batch URL input?

The Batch URL input feature is importing a large number of URLs into Octoparse. Octoparse supports batch/bulk URL import from local files (text or spreadsheet), from another task, or even generates the URLs based on a pre-defined pattern.


How to batch input URLs?

Click +New from the sidebar menu and select Custom Task and you will see the URLs importing panel.

There are three ways to batch import URLs to any single task/crawler (up to a million URLs):

TIP: Once the number of imported/generated URLs reaches the limit of 1 million, Octoparse would stop importing/generating immediately.


1. Import URLs from a file

You can import URLs from any of these file formats: CSV/ TXT/ Excel (.xlsx & .xls)

  • Select Import from file

  • Click Select, then choose the file containing the URLs and select the sheet and column that contains the URLs.

  • Click Save to complete the import process.

1.png

NOTE:

  1. Only the first 100 URLs will be shown for preview purposes.

  2. When importing from a CSV file, please make sure the file only has one column with the URLs. If the file has several columns, the URLs won't be imported and will be recognized as invalid URLs.


2. Import URLs from another task

This feature makes it possible to integrate two tasks seamlessly when URL extraction needs to be done separately with another task. No more manual URL exports or imports are needed.

  • Select Import from task

  • Select Task Group and the task containing the target URLs

  • Specify the field

  • Click Save to complete the import process

Tips:

  • The selected task (one that contains the URLs needed for more crawling) is referred to as the parent task, and the new task to be configured with the URLs becomes the child task. Two tasks will be associated automatically and can be executed in association with one another.

  • Child tasks can only use URLs scraped from Cloud runs.

  • When the parent task gets new URLs, the URLs in the child task will be updated too.

  • You can schedule the child task in the Cloud according to the status of the parent task.

2.png

  • Importing from another task supports importing more than 1 million URLs.


3. Batch generate URLs based on a pre-defined pattern

With the "Batch generate" feature, you can easily generate a large number of URLs following specific patterns by modifying various parameters of one given URL.

  • Select Batch generate

  • Input one URL as a base for batch generating

  • Highlight the selected URL parameter and click Add parameter

  • Select from the four Parameter Type options to define the pattern you need and click Save URL to save the list.

__4.gif

Four Parameter Type options

3.1 Numbers

You can enter the initial number, choose to increase(+) or decrease(-) a number every time, and enter Repeat or an end value. For example, if you want to generate URLs for 100 pages, you may need to set up the parameter of page number from 1 to 100. You should enter the initial number as 1, every time + 1, and Repeat 100 times. The end value will be automatically filled as 100.

3.2 Letters

You can enter the starting letter and the ending letter.

4.png

3.3 Time

You can specify the format of the date and select the date range to generate. This is helpful when you are scraping from hotel websites and want to enter the check-in and out date in the URL.

5.png

3.4 Custom list

You can enter your own list, like a list of search keywords or product numbers.

6.png

TIP: You can set up multiple parameters to generate URLs. For example, if the base URL is www.octoparse.com/[parameter1]/[parameter2]

Parameter1={A, B}, Parameter2={1, 2}

The final URL list would be like:

Did this answer your question?