Google Play is a good website for collecting reviews of mobile applications. The reviews can help users to make a better choice about which app to use or push the developers to improve their apps.
In this tutorial, we will scrape the applications' reviews from Google Play.
You can visit our easy-to-use "Task Template" on the home screen of the Octoparse. All you need to do is type in several parameters, and the task is ready to go. For further details, please check it out here: Task Templates
To follow through, you may want to use this URL in the tutorial:
We will scrape data such as reviewer name, post time, and review content from each APP details page with Octoparse.
The main steps are shown in the menu on the right, and you can download the sample task file here.
1. Go to Web Page - to open the target web page
Enter the page URL on the home screen and click Start
2. Click See all reviews - to see all the reviews
Click See all reviews from the web page
Choose Click element on the Tips panel
The workflow will be like this:
3. Auto-detect the web page data - to create the workflow
Click Auto-detect webpage data
Untick Click on a "Load More" button
Click Create workflow in the Tips window
Check the data fields in Data Preview section, and you can also delete the unwanted fields or rename fields if needed
4. Modify the Xpath of the Scroll Page- locate the scrolling area precisely
Click on Scroll Page
Select the scroll area as Partial
Enter the XPath //div[@class='fysCi']
Choose for one screen and enter 1000 in the repeats
Tick End loop when there's no more content to be load
Click Apply
The final workflow should look like this:
Tip: If you want to learn more about XPath, please check the following tutorial: What is XPath and how to use it in Octoparse
5. Run extraction - run your task and get data
Click Save
Click Run on the upper right side
Choose one mode on Run on your device to run the task on your computer, or choose one mode on Run in the cloud to run the task in the Cloud (for premium users only)
Here is the sample output.