You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!
Stubhub is a website for fans to buy and sell tickets for different events. Usually, you can search for the ticket price, time, and location of an event to decide whether to purchase it or not.
This tutorial will show you how to scrape ticket prices from Stubhub.
To follow through the tutorial, you may want to use the URL below:
The main steps are shown in the menu on the right and you can download the demo task file here.
1. Create a Go to Web Page - to open the target website
Enter the target URL into the search bar on the home screen and click Start
2. Auto-detect the webpage - to create a workflow
Octoparse's Auto-detection function can help you quickly create a workflow according to the target website's design.
Click Auto-detect webpage data on the Tips panel and wait for the detection to complete
Check the data fields in Data Preview and delete unwanted fields
Untick Add a page scroll and click Create workflow
3. Create a Pagination - to load more data on the webpage
Once the basic workflow is built, we need to get the pagination done to scrape all the tickets.
Click Load more button on the Tips panel
Scroll down to click See More and then click Confirm on the Tips panel
4. Modify the XPath of Loop Item - to locate the items accurately
The auto-generated XPath of Loop Item needs to be modified; otherwise, Octoparse may fail to correctly locate the loop on different web pages.
Click Loop Item to open its settings
Input the Matching XPath as: //div[@class='sc-ksluID kWpYTp']
Click Apply to save the change
5. Modify the Xpath for the remaining tickets field - to get the correct data
The auto-detected Xpath for this data may be wrong, so we do the following:
Click ... of the "ticket remaining" field
Click Customize Xpath
Enter the right Xpath //span[contains(text(),'tickets remaining')] and click Apply
6. Run the task - to get your desired data
Click Save on the upper right to save your task
Click Run next to it and wait for a Run Task window to pop up
Select Run on your device to run the task on your local device
Wait for the task to complete
Here is the sample output from a local run: