Skip to main content

Scrape Product Variants

Updated over 10 months ago

If you need to scrape e-commerce data, especially product data, you may encounter a situation like this:

For some products with different options, you want to collect each variant's price, SKU, etc. Taking this hair dye product as an example, you may need to scrape its pricing for each color.

In this tutorial, we will show you how to scrape information about different product variants. To show you how to do it with Octoparse, we can take this web page URL as an example:

For this product, its color, pricing, images, page URL, and product ID will vary when you switch the option.

The main steps are shown in the menu on the right. You can download the demo task here.


1. Create a Go to Webpage - to open the target website

  • Enter the URL on the home page and click Start


2. Create a Loop Item - to loop through each color option

  • Click the first color option on the list, and then choose Select all similar elements on the Tips panel

  • Choose Loop click each image (or Loop click each element)

  • Choose No

  • Click Apply to save

4.png
  • (Optional) Click Loop Item to change the "Loop Mode" from Fixed List to Variable List. Then, enter the Element XPath: //a[contains(@class,"cover-swatch js-cover-swatch")]. This is important when you have different products with different numbers of colors to scrape.

Tip: The XPath above only works for the example web page we use in this tutorial. For your own target websites, you will need to write the XPath on your own. Check out this tutorial to learn how to write it: What is XPath and how to use it in Octoparse


3. Extract Data - to extract all product-related data

You can click on the elements on the page to extract the data you need and rename the data fields if needed.

The final workflow looks like this:


4. Run the task - to get your desired data

  • Click Save on the upper right to save your task

  • Click Run next to it and wait for a Run Task window to pop up

  • Select Run on your device to run the task on your local device

  • Wait for the task to complete

Here is the data output sample.

Did this answer your question?