All Collections
Case Tutorial
Lead Generation
Scrape information from Groupon
Scrape information from Groupon
Updated over a week ago

You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

Groupon is a website that provides professional personal services including classes, photography, local services and so on.

In this tutorial, we are going to show you how to scrape information about photography services from Groupon.com.

DADA.png

To follow through, you may want to use this URL in the tutorial:

The main steps are shown in the menu on the right and you can download the demo task file here.


1. Create a Go to Web Page - open the target web page

  • Enter the URL on the home page and click Start


2. Click "x" - to close the ad

  • Click "x" on the upper right corner of the ad

  • Click Click element on the Tips panel

  • Uncheck Open in a new tab in the Options settings


3. Auto-detect webpage data - to generate a workflow

  • Click Auto-detect web page data on the tips panel

  • Wait for the detection to complete

  • Delete unwanted fields in the data preview panel

  • Untick Add a page scroll

  • Click Create workflow

  • Choose Select subpage URL

  • Select the Title URL field

  • Click on Confirm


4. Extract Data - to select the data to scrape

  • Click on the wanted data

  • After the chosen data turns green, Click Text on the Tips panel

  • Edit the field name by double-clicking it

mceclip0.png

The final workflow will look like this:

mceclip0.png

5. Run the task - to get the target data

  • Click the Save button first to save all the settings you have made

  • Then click Run to choose a mode to run your task either locally or cloudy

  • Waiting for the task to complete

Below is a sample data run from the local. Excel, CSV, HTML, and JSON formats are available for export.

data.png
Did this answer your question?