All Collections
Case Tutorial
Travel
Scrape ticket prices from Studhub
Scrape ticket prices from Studhub
Updated over a week ago

You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

Stubhub is a website for fans to buy and sell tickets for different events. Usually, you can search for the ticket price, time, and location of an event to decide whether to purchase it or not.

This tutorial will show you how to scrape ticket prices from Stubhub.

search.jpg

To follow through the tutorial, you may want to use the URL below:

The main steps are shown in the menu on the right and you can download the demo task file here.


1. Create a Go to Web Page - to open the target website

  • Enter the target URL into the search bar on the home screen and click Start


2. Auto-detect the webpage - to create a workflow

Octoparse's Auto-detection function can help you quickly create a workflow according to the target website's design.

  • Click Auto-detect webpage data on the Tips panel and wait for the detection to complete

  • Check the data fields in Data Preview and delete unwanted fields

  • Untick Add a page scroll and click Create workflow


3. Create a Pagination - to load more data on the webpage

Once the basic workflow is built, we need to get the pagination done to scrape all the tickets.

  • Click Load more button on the Tips panel

  • Scroll down to click See More and then click Confirm on the Tips panel

  • Double-click on the data header to rename the fields if needed



4. Modify the XPath of Loop Item - to locate the items accurately

The auto-generated XPath of Loop Item needs to be modified; otherwise, Octoparse may fail to correctly locate the loop on different web pages.

  • Click Loop Item to open its settings

  • Input the Matching XPath as: //div[@class='sc-ksluID kWpYTp']

  • Click Apply to save the change


5. Modify the Xpath for the remaining tickets field - to get the correct data

The auto-detected Xpath for this data may be wrong, so we do the following:

  • Click ... of the "ticket remaining" field

  • Click Customize Xpath

  • Enter the right Xpath //span[contains(text(),'tickets remaining')] and click Apply


6. Run the task - to get your desired data

  • Click Save on the upper right to save your task

  • Click Run next to it and wait for a Run Task window to pop up

  • Select Run on your device to run the task on your local device

  • Wait for the task to complete

Here is the sample output from a local run:

Did this answer your question?