Skip to main content

How to scrape the full image URLs instead of the thumbnails?

Updated over 9 months ago

Sometimes we need to scrape the image URL from a website, but all we get is just the URL of a thumbnail picture instead of a normal-sized picture.

Here is a picture scraped from Amazon. As you can see, the image is too small to see.

mceclip3.png

To get the normal-size images, all we need to do is modify the image URL that we already have, following the steps below:

If you would like to know how to scrape the image URLs, you can refer to this tutorial first: Scrape images from a carousel

1. Observe the difference between the full image URL and the thumbnail URL

The URLs of different sizes usually only have a slight difference. We need to find the difference between the full image URL and the thumbnail URL

  • For example, the thumbnail on Amazon is like this

The full image URL is

You can see the thumbnail has 'SR38,50' in its URL. We just need to delete it from the URL.

  • In some cases, you may see the image URL contains the size number like 85X85 to indicate the size of the image:

You can try to use replace "85X85" with "1000X1000" to see if the URL is still valid:

2. Use the Octoparse Clean Data function to reformat the thumbnail URL into a full URL

  • Click on More(...) button and click Clean data

  • Add a step as Replace

2.png
  • Type in the value you want to replace (SR38,50, for example) into the Replace box

  • Type in the value you want to replace it with in the With box

3.png

(In the case of the Amazon image URL, you need to delete the SR38, 50, which means to replace it with nothing. So you just need to leave the With box empty.)

  • Click Confirm to save

  • Click Apply to save the settings

4.png

Then you can get the full image URL you need in the final results.

5.png

Did this answer your question?