Add triggers to an Extract Data
Updated over a week ago

Trigger in Octoparse is used as conditions and constraints for users to make a quick judgment to either abandon or keep certain data lines. It helps users filter out the data that they want directly, so they don't need to scrape the whole dataset and delete unwanted ones later after exporting the data into Excel or CSV files.

When to use the Trigger?

Use Case 1

If you are scraping products from an e-commerce website and you only want products with a price of less than $100, you can use Trigger to dump "useless" data lines, specifically, any products with a price equal/to over $100 and only keep the ones you need.

To achieve this, you can create a trigger like this: if the data field "price" is equal to or greater than "100", do "dump the line of data". This way, Octoparse will "judge" whether the data meets the defined criteria before having it extracted. In the end, the dataset will only have the data desired.

Use Case 2

Another useful application is when you need to extract data associated with a specific date, say, all news articles published today (e.g., 2020-01-01). To achieve this, you can create a trigger: If the data field "date" is not "2020-01-01", do "dump the line of data". As a result, you will only fetch articles for 2020-01-01.

Multiple conditions can be used together. For example, if you need to extract news articles for 2020-01-01 and only when the article title contains the words "CPI", it can be done using the following two conditions:

Condition 1: If the data field "date" is not "2020-01-01", do "dump the line of data"

[AND]

Condition 2: If the data field "title" does not contain "CPI", do "dump the line of data"


How to use a Trigger?

STEP 1. Create a new Trigger

  • Go to Extract Data action

  • Click "Add a Trigger" in the Options tab to create a new trigger

1.png

STEP 2. Name your Trigger

  • Name the Trigger by entering a name directly in the Trigger Name box

2.png

STEP 3. Choose the target field and set up the condition

  • Select one target field from the dropdown menu

3.png

Note: If you change the field name after setting up the trigger condition, please remember to go back and reselect the target field again.

  • Set the conditions for the selected data field. You can set conditions based on "text", "numerals" or "time"

Three different conditions can cover most of the demands from texts to numbers, even time and dates.

a. For text

There are six options (is, is not, contains, does not contain, is blank, is not blank) for texts.

For example, If you select "contains" and type in the word "SKIRT" in the text box, the whole condition will be: If the data field "title" contains the words "SKIRT".

5.png

Note: The text value is case-sensitive. Please make sure you enter the right text.

b. For numbers

There are four options available for numbers (greater than, less than, greater than, or equal to).

For example, if you select the data field "Price", "greater than", and fill in the value "50", the condition will be: If the data field "Price" is greater than "50".

6.png

Note: Please make sure the field only contains the number value. If it has a text value, you can use the Clean Data feature to refine it. For example, if the price is "$100", you should remove the currency symbol "$" before setting Trigger.

c. For time and date

There are four options available for time and date (after, before, on or after, on or before).

For example, for the data field "Current_time", if you select "after", "12 am of the extraction day" and do "dump this line of data", the condition will be: if the current time is after 12 am of the extraction day, then dump the line of data. As a result, only those threads that are published before 0:00 AM of the extraction day will get fetched.

7.png

You can also customize the time or date range.

_5.gif

Note: You need to use Clean Data to reformat the time field to a format: yyyy-MM-dd HH:mm:ss. This format can be recognized by the trigger.


STEP 4. Add more conditions by using [AND] or [OR]

Multiple conditions can be added to the same trigger. Use condition [AND] or condition [OR] to define the relationships between the various conditions.

8.png

If you click "Add [AND] condition" and add a condition, the action will be executed if the data field meets both conditions.

If you click "Add [OR] condition" and add a condition, the action will be executed if the data field meets either one of the two conditions.


STEP 5. Choose an action from "Do" and click "Confirm" to save

Octoparse will execute one of the following steps when the conditions are triggered.

a. Dump this line of data

If "Dump this line of data" is selected, Octoparse will abandon the whole data line from the extraction step no matter what steps it has been triggered.

10.png

b. End the loop

If "End the loop" is selected, you'll need to choose which Loop Item to end.

11.png

c. Stop the entire extraction

If "Stop the entire extraction" is selected, the extraction will be terminated once the corresponding condition is satisfied.

12.png

Tip: You can edit, copy, delete, or disable the existing trigger after saving the changes.

13.png
Did this answer your question?