All Collections
Case Tutorial
Search Engine
Scrape data from Google Advanced Search results
Scrape data from Google Advanced Search results
Updated over a week ago

You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

Google Advanced Search is a more detailed method of finding information on Google. It uses a variety of Google search operators that consists of specific characters and commands – also known as "advanced operators" – that go beyond a standard Google search.

An example is shown in the picture below:

2.png

1.png

This tutorial will show you how to scrape data from Google Advanced Search results with Octoparse using the URL above.

You can also go to "Task Templates" on the main screen of the Octoparse scraping tool and start with the ready-to-use Google Advanced Search Template directly to save time. For further details on the task templates, you may check it out here.

The main steps are shown in the menu on the right. [Download the task file here]


1. Create a Go to Web Page - to open the target website

  • Enter the target URL into the search bar on the home screen and click Start


2. Set up Scroll Page and create a Pagination - to load more data

  • Click on Go to Web page

  • Go to the Options tab

  • Tick Scroll down the page after it is loaded

  • Set up scroll 6 times

  • Click Apply

  • Click on More Results at the bottom of the webpage

  • Click Loop click

  • Set AJAX timeout: 7-10s recommended

AJAX.jpg

Note: Sometimes, Google may use Captcha as an anti-scraping measure. To solve the Captcha manually, turn on the Browse mode and follow the instructions.

5.png

3. Create a Loop Item - to locate the data

  • Click on the "+" icon in the Pagination loop and select Loop

  • Click Loop Item and switch the Loop Mode to Variable List

  • Input the XPath //div[@lang="en"] in the Matching XPath box

  • Click Apply

4. Create an Extract Data - to extract the search results

  • Click on the Title of the first item on the webpage

  • Click Text on the Tips panel

  • Repeat the two steps above to extract the other data fields

  • Click on the More button next to the data field > Customize XPath

7.png
  • Modify the XPath of the Data field as below:

title: //h3

content: //div[@style="-webkit-line-clamp:2"]/span[2]

8.png

5. Set up Wait before action - to make sure data is fully loaded

Wait before action is a function that can be set to every action in the workflow. It will let the task wait before the action is executed.

In this case, it is better to add a Wait before action for Loop Item and Extract data in the workflow.

  • Click on each step respectively > Options

  • Set Wait before action: 3s recommended

  • Click Apply

wait_2.jpg

6.Run the task - to get your desired Data

  • Click Save on the upper right to save your task

  • Click Run next to it and wait for a Run Task window to pop up

  • Select Run on your device to run the task on your local device

  • Wait for the task to complete

Here is the sample output from a local run:

Tip: Local runs are great for task troubleshooting and quick runs. If you are dealing with more complicated tasks, it is recommended that you select Run in the Cloud to run the task in Octopars's cloud-based platform for higher speed. Try out this premium feature by signing up for the 14-day free trial here. You can also schedule your task to run hourly, daily, or weekly and get data delivered to you regularly.

Did this answer your question?