Skip to main content

Scrape data from Google Search

Updated over 2 months ago

Scraping data from a search engine is a good way to collect information related to one topic. In this tutorial, we are going to show you how to scrape the search results data on Google search.

You can go to Popular Template on the home screen of the Octoparse and start with the ready-to-use Google Search Template directly to save your time. With this template, there is no need to configure scraping tasks. For further details, you may check it out here: Task Templates

If you want to create your own task with our Custom Task, you can look through this tutorial as a reference. We will scrape data such as the title, URL, and description from the search results page with Octoparse.

You may need this link to follow through: https://www.google.com/

The main steps are shown in the menu on the right. [Download demo task file click here]


1. Create a Go to Web Page - to open the target website

  • Enter the URL on the home page and click Start


2. Enter text - to start the search

  • Click the search box and then choose Enter text on the tips panel

  • Enter the keywords you need to search for in Textbox 1

An Enter Text will be created in the workflow:

  • If you want to search for a list of keywords, choose Enter text in the loop

A Loop Item with an Enter Text inside it will be created in the workflow:

To add a click, you can set it under the Enter text action

  • Click Options

  • Tick Hit the Enter/Return key when finish entering

  • Click Apply


3. Create a Loop Item - scrape data from the result list

  • Click on the first result title

  • Keep clicking on the Expand selection area button until you see the first result block is selected

  • Do the same to select the second result

  • Select Text

  • Select the fields to scrape

  • To scrape the title URL, click on the title and choose the A tag

  • Select Link

  • Double-click the field name to rename it

  • Delete the fields you don't want


4. Create a Pagination - scrape from multiple pages

  • Click on the Next page button

  • Choose Loop click next page


5. Set up wait time - to slow down the scraping speed

Google search applies an anti-scraping technique and it would show reCAPTCHA to solve. We need to slow down the scraping by setting the wait time.

  • Click on Extract Data action

  • Select Options

  • Tick Wait before action

  • Select the wait time as 1s-3s and click Apply to confirm


6. Run the task - to get your target data

  • Click Save

  • Click Run on the upper left side

  • Select a running mode either on your device or in the Cloud (for premium users only)

Here is the sample output.

13.png
Did this answer your question?