Scraping data from a search engine is a good way to collect information related to one topic. In this tutorial, we are going to show you how to scrape the search results data on Google search.
You can go to Popular Template on the home screen of the Octoparse and start with the ready-to-use Google Search Template directly to save your time. With this template, there is no need to configure scraping tasks. For further details, you may check it out here: Task Templates
If you want to create your own task with our Custom Task, you can look through this tutorial as a reference. We will scrape data such as the title, URL, and description from the search results page with Octoparse.
You may need this link to follow through: https://www.google.com/
The main steps are shown in the menu on the right. [Download demo task file click here]
1. Create a Go to Web Page - to open the target website
Enter the URL on the home page and click Start
2. Enter text - to start the search
Click the search box and then choose Enter text on the tips panel
Enter the keywords you need to search for in Textbox 1
An Enter Text will be created in the workflow:
If you want to search for a list of keywords, choose Enter text in the loop
A Loop Item with an Enter Text inside it will be created in the workflow:
To add a click, you can set it under the Enter text action
3. Create a Loop Item - scrape data from the result list
Click on the first result title
Keep clicking on the Expand selection area button until you see the first result block is selected
Do the same to select the second result
Select Text
Select the fields to scrape
To scrape the title URL, click on the title and choose the A tag
Select Link
Double-click the field name to rename it
Delete the fields you don't want
4. Create a Pagination - scrape from multiple pages
Click on the Next page button
Choose Loop click next page
5. Set up wait time - to slow down the scraping speed
Google search applies an anti-scraping technique and it would show reCAPTCHA to solve. We need to slow down the scraping by setting the wait time.
Click on Extract Data action
Select Options
Tick Wait before action
Select the wait time as 1s-3s and click Apply to confirm
6. Run the task - to get your target data
Click Save
Click Run on the upper left side
Select a running mode either on your device or in the Cloud (for premium users only)
Here is the sample output.