All Collections
Using Octoparse
Capture images from a carousel
Capture images from a carousel
Updated over a week ago

Many product web pages use image carousels (like the one below) to display multiple images as slides which you can usually flip through manually. In this tutorial, I will show you how to extract the images of a carousel into your desired format.

c1.png

c2.png

c3.png

c4.png

You may need this link to follow through:


1. Scape one image into one column

Simply select one of the images, and select Image URL on the Tips Panel. Repeat the same process to fetch all the other image URLs.

Note: In this example page, we need to select the IMG tag from the bottom of the Tips to locate the image URL. Only when the IMG is selected, Octoparse will show the option Image URL on the Tips.


2. Scrape images into different lines

It is also possible to scrape images to different lines of the same column using a loop extract action.

1) Select the first image -> Select the IMG tag

2) Choose Select all similar elements

3) Select Image URL


3. Scrape all images into one column

There are two ways to achieve scraping all images into one column.

Option 1. Merge the extracted image URLs into one line

Once you've loop extracted the image URLs into different lines (following the steps in Scrape images to different lines), you can then merge the extracted data to merge the lines into one single line.

1) Click the More icon for the data field, then select Merge field data

Option 2. Scrape the HTML code of the carousel and match out the image URLs from the code

1) Select the entire carousel and select OuterHtml

2) Click the More icon for the field and select Clean data

1113.png

3) Click Add Step and choose Matching with Regular Expression

1114.png

4) Inspect the code to find the starting value and ending value of the image URL

1115.png

5) Click Try the ReEx tool

116.png

6) Enter Start with and End with value to generate a RegEx and apply the settings

11.png

7) Tick Match all and Confirm

21.png

Note: The image URLs scraped are thumbnail URLs. If you need to get the full image URLs, you can continue to add steps to reformat the field. Please check this tutorial:

Did this answer your question?