In some cases, the local extraction works perfectly, but in Cloud runs, some fields are shown in blank. This tutorial will introduce the causes of this issue and how to solve it.
1. Tasks executed with cloud extraction are split-table and working too fast, hence some elements can be skipped.
Tasks with "Fixed List," "List of URLs" and "Text List" loop modes are splittable. The main tasks will be split into subtasks executed by multiple cloud nodes simultaneously. So in this case, every step of the task will work very fast, hence some pages may not be loaded completely before moving to the next step.
To ensure the web page is loaded completely in the cloud, you can try to:
Increase the timeout for the Go To Web Page step
Set up Wait before action
You can set up a waiting time for all steps created in the workflow. We suggest that you set the wait time for the Extract Data action.
Set up an anchor element to find before action
This step will guarantee that extraction only starts after a certain element has been found. You can choose any element's XPath from the desired fields.
Tip: How to get the XPath of a certain element on the page?
Click the Extract Data
Switch to the vertical view and you will see all the Xpaths for each field
2. The website you are after is multi-regional
A multi-regional website could have different page structures for the content provided to visitors from different countries. When a task is set to run in the cloud, it is executed with our IPs based in America by default. In this case, for tasks targeted by websites outside America, some data may be skipped as it can’t be found on the website opened in the cloud.
To identify if a website is multi-regional, you can:
Check the Cloud log screenshot to see if the web page is loaded well. You can check if the page looks the same as it does on your device.
In this case, as the targeted content can only be found when opening the website with your own IP, we suggest Local Extraction to get the data or switch the Octoparse IP pools or use proxies to access the website with IPs from a specific country.
Here is a related tutorial for checking errors in the Cloud: Why does the task get no data in the Cloud but work well when running in the local?