Retry actions
Updated over a week ago

Retry action is a feature provided in Octoparse for dealing with page loading errors. There are a number of conditions you can choose from to have Octoparse reload the current web page. For the sake of web scraping, it is essential to make sure that the web page is loaded correctly so Octoparse can go on to extract the information you need.


Why set up Retry?

Octoparse runs into trouble fetching the target web data or even proceeding to the next action down the road when the web page is not loaded properly. For this reason, it is useful to set up "Retry" conditions for when the web page should be reloaded prior to extracting the data.


How to set up Retry?

The Retry option is only available for two page-loading-related actions in the workflow: Go to Webpage and Click Item/Click to Paginate.

  • Click on the action to access the settings. You can then further click open Retry to reveal the options.

2.png
  • Under Retry the action when, click Add conditions to set up conditions for when the page should be reloaded. Basically, you are telling Octoparse when to reload the page if one or more conditions are met.

3.png

Now, set your retry conditions using the options provided.

4.png
5.png

Usually, when a page fails to load properly, you'll get error messages like "errors", "500 Internal Server Error" or "Too many requests". Let's say that we want to have the page reloaded when we get "500 Internal Server Error" on the page. In this case, the condition should be: if the current page Text contains "500 Internal Server Error" then reload the page. As a result, Octoparse will retry loading the page when the string is found on the current page.

6.png

You can also input the XPath of a certain element that will only be there when the page is loaded correctly. In this case, you need to select Does not contain. If the designated element is not found on the page, Octoparse will reload the page.

Keep clicking on Add conditions to add as many conditions as needed based on your project requirements. Or you can click the delete button to delete the conditions you do not need.

  • Set up Retry for and Wait time

After setting up the retry conditions, you can then decide if you want to retry loading the web page once, twice, or more. Having a max number of times for the retry is critical, so Octoparse does not reload the web page endlessly. Once Octoparse reaches the maximum number of retries, it will stop and proceed to the next step.

  • Set up proxies or user agent

Occasionally, your request may not go through due to a banned IP address. To resolve this issue, select Rotate proxies when page reloads to initiate IP rotation. Also, you can select Rotate user agent (browser) when page reloads to change the user agent Octoparse uses to load the page.

Did this answer your question?