All Collections
Case Tutorial
Travel
Scrape hotel data from Tripadvisor
Scrape hotel data from Tripadvisor
Updated over a week ago

You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

TripAdvisor offers online hotel reservations and bookings for transportation, lodging, travel experiences, and restaurants. Users can find better service by comparing the hotels or restaurants on the platform to enjoy their trip better.

In this tutorial, we are going to show you how to scrape hotel data from TripAdvisor.

For TripAdvisor scraping, you can use our ready-to-use Task Template available on the home page or follow this tutorial to build the task from scratch.


To demonstrate, we will use this URL as an example: https://www.tripadvisor.com/Hotels-g186338-London_England-Hotels.html

The main steps are shown in the menu on the right and you can download the demo task file here.


1. Create a Go to Web Page - to open the target Webpage

  • Paste the URL and click Start


2. Click See all - to load all hotels

You need to click the See all button first to show all the hotels.

  • Select the See all button

  • Choose the Click Button on the Tips

  • Set up AJAX timeout as 5s

  • Go to the settings of the Click Item

  • Modify the Xpath of the Click Item to //span[.='See all']/..

  • Go to Options

  • Set the Wait before action to 4s


3. Set up Pagination Loop - to scrape data from multiple listing pages

  • Scroll down to find the next page button (->) and click on it

  • Select Loop click

  • Adjust Set AJAX timeout to 10s

  • Update the Pagination loop XPath to //a[@aria-label='Next page']


4. Create a Loop Item- to click on each hotel to get data

  • Click on 2 random hotel titles

  • Select Loop and click on each element

  • Choose No in the coming message on the Tips panel

  • Click on any text that needs to be extracted

  • Select Text

  • Repeat until all the data needed is in place

  • Go to Data Preview and double-click on the header to rename the field

sto.gif
  • Change the Loop mode to Variable List and then modify the XPath of the Loop Item to //div[@data-automation="hotel-card-title"]/a

Below is what the final workflow looks like. If everything is in place, you can continue to run the task.

mceclip0.png

5. Run the task - to get your target data

  • Run the task in the top right corner

  • Run on your device to run the task on your local device, or select Run in the cloud to run the task on the Cloud (for premium users only)

run.png

Here is the sample output:

Did this answer your question?