Skip to main content
All CollectionsCase TutorialTravel
Scrape hotel data from Booking
Scrape hotel data from Booking
Wyatt avatar
Written by Wyatt
Updated over a month ago

In this tutorial, we are going to show you how to scrape hotel information from Booking.com.

Also, you can go to Templates on the sidebar of Octoparse, and start with the ready-to-use Booking Template directly to save your time. With this feature, there is no need to configure scraping tasks. For further details, you may check it out here: Template Task

If you would like to know how to build the task from scratch, you may continue reading the following tutorial.

We will scrape data such as hotel names, images, addresses, descriptions, scores, reviews, and star ratings with Octoparse.

The main steps are shown in the menu on the right.

[Download the demo task here]


1. Go to the web page - open the target web page

  • Input the URL to the home page

  • Click on Start button


2. Auto-detect the web page - create a workflow

  • Click Auto-detect web page

  • Click Create a workflow

  • Adjust the order of the fields as you want

11.gif
  • Delete and rename the field

It is quite convenient to delete the fields that you don't want together after auto-detection.

Click the vertical view icon to switch to vertical view to delete and rename the fields. Note that you need to double-click on the field name to rename it.

12.gif

3. Update the XPath for the Loop Item and Pagination

  • Click on the Loop Item

  • Input the XPath //div[@data-testid="property-card"]

  • Click on Pagination and update the Xpath to //span[text()='Load more results']

TIP: Check out this tutorial to learn more about Path: What is XPath and how to use it in Octoparse

If you want to click on each detail link to get more information, please follow the next steps.


4. Click into each detail link - scrape more information

  • Click on Extract Data step and click on Enter subpage URL

  • Choose the Select subpage URL on the Tips panel

  • Select Click on an extracted data field and select the one you want to click on from the drop-down menu (you can confirm if it's the correct link in the Data Preview)

  • Click on Confirm


5. Extract Data - extract data on the detail pages

If there is a pop-up, turn on browse mode to close it manually. Then turn it off.

  • Select the data you want to scrape and click Element data

  • Double-click on the field name to rename it if needed

rename.png

6. Set up wait time - slow down the scraping speed

Booking might block your IP if you scrape it too much, therefore we need to control the scraping speed.

4.png

The final workflow will be like this:


7. Start extraction- run the task and get data

  • Click Save

  • Click Run on the upper left side

  • Select Run on your device to run the task on your computer, or select Run task in the Cloud to run the task in the Cloud (for premium users only)

Here is the sample output -

google_2.png
Did this answer your question?