Skip to main content

Scrape images from a carousel

Updated over 10 months ago

Many product web pages use image carousels (like the one below) to display multiple images as slides which you can usually flip through manually. In this tutorial, I will show you how to extract the images of a carousel into your desired format.

Many product web pages use image carousels, such as the one depicted below, to exhibit a series of images in a slide format that can typically be navigated manually.

c1.png

In this instructional guide, we will demonstrate the process of scraping the images from a carousel and converting them into a desired format.

This tutorial uses the below link as an example and can be applied to a majority of carousel scenarios:


Format 1. One image URL per column

Example output:

c2.png

Simply select one of the images, and select Image URL on the Tips Panel. Repeat the same process to fetch all the other image URLs.

NOTE: In this example page, we need to select the IMG tag from the bottom of the Tips to locate the image URL. Only when the IMG is selected, Octoparse will show the option Image URL on the Tips.


Format 2. One image URL per data line

Example output:

c3.png

It is also possible to scrape images to different lines of the same column using a loop extract action.

Step 1. Click on the first image in the carousel

Step 2. Go to the Tips Panel and select the IMG tag - Select all similar elements

Step 3. Select Image URL


Format 3. All image URLs in one data field

Example output:

c4.png

Option 1. Merge the extracted image URLs into one line

Once you've loop extracted the image URLs into different lines (following the steps in Scrape images to different lines), you can then merge the extracted data to merge the lines into one single line.

1) Click the More icon for the data field, then select Merge field data

Option 2. Scrape the HTML code of the carousel and match out the image URLs from the code

1) Select the entire carousel and select OuterHtml

2) Click the More icon for the field and select Clean data

1113.png

3) Click Add Step and choose Matching with Regular Expression

1114.png

4) Inspect the code to find the starting value and ending value of the image URL

1115.png

5) Click Try the ReEx tool

116.png

6) Enter Start with and End with value to generate a RegEx and apply the settings

11.png

7) Tick Match all and Confirm

21.png

NOTE: The image URLs scraped are thumbnail URLs. If you need to get the full image URLs, you can continue to add steps to reformat the field. Please check this tutorial: How to scrape the full image URLs instead of thumbnails?

Did this answer your question?