You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier, and more robust! Download and upgrade here if you haven't already done so!
GoodFirms is a research and review platform that helps software buyers and service seekers opt for the best software or firm. At the same time, it helps IT companies and software vendors to boost user acquisition stats, market share, and brand awareness.
In four steps, this tutorial will show you how to scrape company info, such as company name, location, website, etc., from Goodfirms.
To follow through, you may want to use the URL below:
The main steps are shown in the menu on the right, and you can download the sample task file here.
1. Create a Go to Webpage - to open the target website
Enter the page URL on the home screen and click Start to create a new task
2. Auto-detect the webpage - to create a workflow
Choose Auto-detect webpage data and wait for the detection to complete
Uncheck Add a page scroll
Click Create workflow
Check the data fields in Data Preview and delete unwanted fields or rename them if needed (double-click to rename)
3. Modify Pagination settings - to locate the pagination button accurately
Click on the Pagination box
Replace the auto-generated Matching XPath with: //a[@title="Next Page"]
Click Apply to save the change
NOTE: To learn more about XPath in Octoparse, please check: What is XPath and how to use it in Octoparse?
Click on Click to Paginate box in the workflow
Select the Options
Tick Load with AJAX > set the AJAX timeout (7-10s recommended)
Tip: Why do you need to set up AJAX timeout? Check it out here: Handling AJAX
4. Run the task - to get your desired data
Click Save on the upper right side to save your task
Click Run next to it and wait for the Run Task window to pop up
Select Run on your device to run the task on your local device
Wait for the task to complete
Here is a sample output from a local run: