Skip to main content

What is a Custom Task?

Updated over 11 months ago

There are two types of tasks within Octoparse:

  1. Custom Task

It is always recommended that prior to commencing a scraping task, you should conduct a search to determine if a ready-to-use template is accessible. If no template is found for your target website, you may proceed to establish a Custom Task.


How to set up a Custom Task?

There are two ways to quickly start a new task using Custom Task:

Method 1

Paste your target website URL into the designated box and hit Start (or press enter).

Method 2

From the side bar menu, hover on the New option and select Custom Task.


The Custom Task interface

2.png
  • Built-in Browser: Once you've entered a target webpage URL, the webpage will be loaded in Octoparse's built-in browser. you can browse the website in Browse mode or you can click to extract the data you need in Select mode.

  • Workflow: As you proceed to interact with the webpage, such as opening a web page and clicking on a page element/button, the entire process is defined automatically in the form of a workflow.

  • Tips panel: Octoparse uses smart Tips to "talk" to you during the extraction process, to guide you through the task-building process.

  • Data Preview: Have a preview of the data selected. You can also rename the data fields or remove the ones that are not needed.


How to use Custom Task to build tasks manually

To build a task manually using Custom Task, simply click on the target data on the webpage. Follow the tips provided on the Tips panel to proceed with the task-building process. The general building steps are straightforward:

Select the data you need on the webpage >> Follow through the instructions provided in Tips panel >> Check your workflow >> Run the task to get data

In light of the nature of the web, web pages change all the time, and different sets of data may be needed by different individuals. The Custom Task is created with the flexibility and versatility required to handle all kinds of scraping needs while making sure it is still non-coder friendly with step-by-step guidance provided in Action Tips.


1. Select your target data on the web page

Within the built-in browser, use simple clicks to select any data you'd like to extract from the webpage. As you hover over the web page, Octoparse tries to "understand" what you'd like to fetch as it highlights the page elements around your cursor. You can move your cursor slightly if the highlighted area is not quite close to what you'd like to extract.

Once you have the data you need to be highlighted in blue, you can click to confirm the selection. Now, the selected page element should be highlighted in green, indicating that's been selected successfully.

Repeat the same process if you'd like to extract multiple elements on the same page.

2. Follow through the instructions provided in Tips panel

Octoparse attempts to guide you through the task-building process by offering all possible next steps in the Action Tips Panel. It is a way for Octoparse to "talk" to you.

Every time you select an element, the Action Tips panel will pop up with a number of options for you to choose from. Simply follow through with the instructions provided and choose how you'd like to proceed with the selected data. For example, if you'd like to scrape the text of the selected elements, you can choose Text; or If you'd like to click on the selected element to go to the linked page, you can choose "Click element".

Below are the most frequently used actions:

  • Text - Capture the text of the selected page element

  • Click element - click the selected page element

  • InnerHtml & OuterHtml - capture the source code string of the selected element

  • Loop click - click the selected element repeatedly (similar to Loop click next page)

  • Link - capture the URL of the selected link (when a link is selected)

  • Image URL - capture the image URL (when an image is selected)

Tips:

  • In instances where a target element is difficult to pinpoint with the cursor, you can use the HTML tags located at the bottom of the Tips panel to refine the selection.

  • The Expand the selection button

    at the end can be used to expand the current selection to include the outer HTML tag. For example, if you'd like to extract the entire part surrounding the selected element, you can keep clicking on the expand button until the entire part gets highlighted in green.

3. Check the workflow

As you go on to build the scraping task, Octoparse simultaneously creates a workflow according to how you've interacted with the web page as well as the Tips Panel.

An example workflow:

Tip: Check out this tutorial to learn more about how to test your workflow step-by-step: Lesson 4: Test-run the task

4. Run the task

Now that you've finished building and testing your task, you can run the task by clicking the Run button. You can run the task on your device or run it in the Cloud.

Did this answer your question?