Anti-blocking solutions
Updated over a week ago

Web scraping, if not done responsibly, can have some negative effects on the target websites, so some websites do not welcome web scraping as much. If the website you are going to scrape does take anti-scraping measures such as IP blocking, Octopares makes it possible for you to dramatically reduce the chance of being blocked.

Anti-blocking options can be found in the Task Settings.


Use IP proxies

You can set up proxies manually in Octoparse if you would like to access the website with external proxies (e.g. from a specific country) or if you prefer to use your own proxies to protect your local IP. For more information on how to set up proxies, please refer to Set up proxies.

Octoparse will automatically switch proxies based on your setup when running specific tasks.


Auto-switch browser agents

Your browser sends what’s known as a user agent for any web page you visit. This is a string to tell the target website what kind of device you are accessing the page with. When scraping a website very consistently with the same user agent, it can easily be detected as a scraping bot. Thus, with this feature, the chance of being blocked can be reduced.

To set up the auto-switch browser agent

  • Check the Auto-switch browser agents box

  • Click Configure to select a user agent

  • Confirm the settings

Not all the UAs work for every website, so you might need to experiment a bit. If you want Octoparse to visit the website "via PC" when scraping the website, you should not select any user agents for mobile, like "Firefox for mobile". If you want Octoparse to visit the website "via mobile", you should only check the boxes of the agents for mobile.

  • Set how often you'd like to rotate the user agents or select Switch IPs concurrently

    Octoparse will automatically switch the user agent every X mins when the task is running locally or in the Cloud.


Auto clear cookies

When scraping a website very consistently with the same cookies, it can easily be detected as scraping bot activity. With this feature, Octoparse will clear the cookies from time to time which makes it look like it's accessing the website for the first time.

  • Check the Auto clear cookies box

  • Set how often you'd like to clear the cookies or select Clear cookies when IPs rotate

  • Click Save

Octoparse will automatically clear the cookies every X seconds when the task is running locally or in the Cloud.

Note: The anti-blocking settings may not 100% bypass a website's blocking mechanisms. The best way is to treat a website nicely and control the accessing speed.

Did this answer your question?