Many product web pages use image carousels (like the one below) to display multiple images as slides which you can usually flip through manually. In this tutorial, I will show you how to extract the images of a carousel into your desired format.

Many product web pages use image carousels, such as the one depicted below, to exhibit a series of images in a slide format that can typically be navigated manually.

In this instructional guide, we will demonstrate the process of scraping the images from a carousel and converting them into a desired format.

This tutorial uses the below link as an example and can be applied to a majority of carousel scenarios:

Click to visit the example page

Format 1. One image URL per column

Example output:

Simply select one of the images, and select Image URL on the Tips Panel. Repeat the same process to fetch all the other image URLs.

NOTE: In this example page, we need to select the IMG tag from the bottom of the Tips to locate the image URL. Only when the IMG is selected, Octoparse will show the option Image URL on the Tips.

Format 2. One image URL per data line

Example output:

It is also possible to scrape images to different lines of the same column using a loop extract action.

Step 1. Click on the first image in the carousel

Step 2. Go to the Tips Panel and select the IMG tag - Select all similar elements

Step 3. Select Image URL

Format 3. All image URLs in one data field

Example output:

Option 1. Merge the extracted image URLs into one line

Once you've loop extracted the image URLs into different lines (following the steps in Scrape images to different lines), you can then merge the extracted data to merge the lines into one single line.

1) Click the More icon for the data field, then select Merge field data

Option 2. Scrape the HTML code of the carousel and match out the image URLs from the code

1) Select the entire carousel and select OuterHtml

2) Click the More icon for the field and select Clean data

3) Click Add Step and choose Matching with Regular Expression

4) Inspect the code to find the starting value and ending value of the image URL

5) Click Try the ReEx tool

6) Enter Start with and End with value to generate a RegEx and apply the settings

7) Tick Match all and Confirm

NOTE: The image URLs scraped are thumbnail URLs. If you need to get the full image URLs, you can continue to add steps to reformat the field. Please check this tutorial: How to scrape the full image URLs instead of thumbnails?

Scrape articles from Medium

Scrape product information from eBay

Scrape hotel data from Booking

Scrape customer reviews from Trustpilot

Scrape product images from eBay

Scrape images from a carousel

Format 1. One image URL per column

Example output:

Format 2. One image URL per data line

Example output:

Format 3. All image URLs in one data field

Example output:

Option 1. Merge the extracted image URLs into one line

Option 2. Scrape the HTML code of the carousel and match out the image URLs from the code