All Collections
Case Tutorial
Search Engine
Scrape data from Google Search
Scrape data from Google Search
Updated over a week ago

You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier, and more robust! Download and upgrade here if you haven't already done so!

Scraping data from a search engine is a good way to collect information related to one topic. In this tutorial, we are going to show you how to scrape the search results data on Google search.

You can go to "Task Templates" on the home screen of the Octoparse and start with the ready-to-use Google Search Template directly to save your time. With this template, there is no need to configure scraping tasks. For further details, you may check it out here: Task Templates

If you want to create your own task with our Custom Task, you can look through this tutorial as a reference. We will scrape data such as the title, URL, and description from the search results page with Octoparse.

You may need this link to follow through: https://www.google.com/

The main steps are shown in the menu on the right. [Download demo task file click here]


1. Create a Go to Web Page - to open the target website

  • Enter the URL on the home page and click Start


2. Enter text - to start the search

  • Click the search box and then choose Enter text on the tips panel

  • Enter the keywords you need to search for in Textbox 1

this is what the workflow looks like:

  • If you want to search for a list of keywords, choose Enter text in the loop

A Loop Item with an Enter Text inside it will be created in the workflow:

To add a click, you can set it under the Enter text action

  • Click Options

  • Tick Hit the Enter/Return key when finish entering

  • Click Apply


3. Auto-detect the web page - to scrape the search result page

  • Choose Auto-detect the page data

  • Untick Add a page scroll and choose Create workflow

  • Double-click to rename the fields or delete the fields you don't want

22.gif

Tips!

If the auto-detect function scrapes several fields you don't want, it is more convenient to switch to the vertical view to delete them in batch.

10.png

4. Modify element XPaths - to locate the elements accurately

  • Click Loop Item and then input the //h1[contains(text(),'Page Navigation')]/following-sibling::a[1] under the Matching XPath.

  • Click Loop item1 and then input the //H3[@class='LC20lb MBeuO DKV0Md']/../../../../../../.. under the Matching XPath. Remember to click apply in both settings.

  • Click Extract data

  • Change to the Vertical View

  • Enter the XPaths for the fields you need

Here are some examples:

Title: //H3[1]

Title_URL: //div[@class='yuRUbf']//a[1]

Description: /div/div[2]

Tips!

Check out more details about XPath here: What is XPath and how to use it in Octoparse


5. Add a page scroll manually

The load more button only shows when you scroll a little on the page.

  • Click + and choose Loop to create a page scroll

  • Click Loop Item 2 and choose Scroll Page in the loop mode

  • Set scroll to the bottom of the page and repeat at 5

  • Click Apply


6. Set up wait time - to slow down the scraping speed

Google search applies an anti-scraping technique and it would show reCAPTCHA to solve. We need to slow down the scraping by setting the wait time.

  • Click on Extract Data action

  • Select Options

  • Tick Wait before action

  • Select the wait time as 1s-3s and click Apply to confirm


7. Run the task - to get your target data

  • Click Save

  • Click Run on the upper left side

  • Select a running mode either on your device or in the Cloud (for premium users only)

Here is the sample output.

13.png
Did this answer your question?