Skip to main content

What is a task in Octoparse

Updated over 5 months ago

Everything you do in Octoparse starts with building a task. A scraping task in Octoparse is also referred to as "a bot", "an agent", or "a crawler". Regardless of what it is called, a task is essentially a set of configurations for the program to follow. One task usually scrapes a page or multiple pages with the same page design.

Building a task in Octoparse is straightforward. First, load your target webpage in Octoparse and click to select the data you need to fetch. Once you've finished selecting the data you need, a workflow is auto-generated according to how you've interacted with the webpage, for example, if you've clicked a certain button, hovered on the navigation menu, or if you've clicked selected any data on the page.

Octoparse simulates the real browsing actions as it clicks, searches, paginates, etc, and finally reaches and fetches the target data, all done by following the steps in the workflow. This is how Octoparse works to extract data from any webpage.


Custom Task vs. Task Template

There are two ways to create a scraping task in Octoparse. You can create a task under Custom Task or pick up a Task template right off the bat.

Custom Task

With Custom Task, you'll get to customize your own scraping task in any way you like, such as searching with keywords, logging into your account, clicking through a dropdown, and much more. Simply put, Custom Task is all you need to scrape data from any website.

Task Template

Contrary to Custom Task, Task Template provides a large number of pre-set scraping templates for some of the most popular websites. These tasks are pre-built so you'll only need to input certain variables, such as the search term and the target page URL, to fetch a pre-defined set of data from the particular website.

Ready to get your hands on some data? Follow the introductory lessons for step-by-step guidance on how to create your first task.

NOTE:

  1. The interface of version7 and version8 is different, the auto-detect feature only comes with version8

  2. You can utilize the auto-detection feature to get the basic workflow first, then modify or optimize it to meet your own needs

  3. Usually to scrape data from one website (or URLs under one domain) will use one task/crawler. Because one task/crawler can only scrape data from pages with a similar page structure. But you can try scraping email addresses from a list of websites by using one crawler, here are the tutorials for your reference: Can I extract email addresses from a series of websites without similarities?


Tips on managing your tasks

1. Task Information Editing

The task name is automatically created as you save the URL entered.

  • To modify the task name, click the textbox above the workflow panel and enter a new name.

  • Or click the edit button to rename a saved task


2. More Actions for Task Management

Quick actions

  • "Duplicate" – Replicate task

  • "Delete" – Delete a task

Actions

Export

Export the task file. The task file can be saved on your device or submitted to the support team for troubleshooting.

Task ID (API)

ID for the task. Can be utilized in API requests.

Local Run

Options for Local Run: Start/ Stop, or Schedule

Cloud Run

Options for Cloud Run: Start/ Stop, Schedule, or Cloud Run History

Move to Group

Relocate the task to a different group.

View Data

View the Cloud or Local data

More Actions

Edit, Rename, or Task Settings

To batch manage tasks:

  • Select multiple tasks (It also works for selecting one task).

  • Choose from the available options to perform batch operations.

Did this answer your question?