Skip to main content

Branch conditions

Step-by-step guide to setting up branch conditions in Octoparse for conditional scraping, handling different page layouts, and customizing task flow with if/else rules.

Updated today

All pages are not created equal. So when your target web pages are showing variations, you should use Branch conditions to achieve condition-based scraping. Here is how it works:

48.png

When should you consider using "Branch Conditions"?

There are two main scenarios when the Branch Conditions can be useful:

  1. When you are only interested in getting data from certain pages with a specific tag, such as "New", "Hot selling", "On Sale", etc.

  2. When data on the page is being displayed in a different pattern, i.e., sometimes showing up as text, other times showing up as images.


In the example image below, we want information about products that are on sale. Looking closely at the item detail page, it seems like we can use the "-XX%" icon as a condition to test for: if the element is found on the item page, we’ll go ahead and capture the product information; otherwise, we’ll skip the page/product entirely.

Let’s see how it is done! To follow through, you may want to use this URL:

STEP 1. Build a loop to click each link on the list

  • Click the first two items

  • Select loop click each element on the Tips Panel

  • Select Yes and click Infinite scrolling

  • Click Confirm to create a loop

STEP 2. Use "Branch Conditions" to test for the condition: whether it is present on the item page

  • Go to the workflow section and click on the add step button to add Branch Conditions inside the loop

  • Select "Branch Conditions_Branch2" on the left-hand side and select "Execute if the current page contains specific element"

  • Paste the XPath: //div[@class="a-section a-spacing-none aok-align-center aok-relative"]/span[contains(text(),"%")] into the text box below(This XPath will locate the element "%")

  • Click Apply

NOTE:

1. You can check this tutorial to learn more about XPath.

2. If you don't know how to write an XPath, you can use the XPath generator button and select the element on the page. Octoparse would then generate an XPath automatically.

  • Click the branch on the right-hand side, select "Always execute the branch"

61.png

Difference between the 5 condition actions:

  1. Always execute the branch: When this option is selected, Octoparse will not judge at all and will proceed to execute the actions within the branch immediately. Only select this option for the branch on the right side.

  2. Execute if the page contains specific text: When selected, Octoparse will look for the designated text string within the current page.

  3. Execute if the current page contains a specific element: When selected, Octoparse will look for the designated element (according to the XPath filled in) within the current page.

  4. Execute if the current loop contains specific text: When selected, Octoparse will look for the designated text string within the current loop item.

  5. Execute if the current loop contains a specific element: When selected, Octoparse will look for the designated element (according to the Relative XPath filled in) within the current loop item. Use this option only when you need to judge between items of a loop.

STEP 3. On the product item page (select one item from the loop that has the element), click your targeted data fields to create an "Extract data" step

  • Rename the fields if needed.

STEP 4. Drag the "Extract Data" action into the left branch

STEP 5. Add a "Back To Previous Page" Action

So now, we have configured Octoparse to look for the element on the page. If the element is found, capture the desired data; otherwise, skip the product.

Tips:

1. If you want to add more conditions to classify more variations, you can add them in the workflow.

23.png

2. If a condition is set as "whether an element is found", the designated element must be uniquely found on the page, or the judgment may fail to work.

3. Octoparse goes through the branches from left to right by default. It is important to always keep the condition you want to test for within the left branch; if the condition for the left branch is "Always execute the branch", Octoparse will not proceed to the branch on the right, as "Always execute the branch" will always be tested as "True".

4. You can leave the branch blank if no data extraction action is needed when the condition is not met.

5. When a data extraction action is being added to both branches, both the number of data fields and the names of the data fields are required to be kept the same.

6. You can use nested branch judgment to further refine the test.

Did this answer your question?