If you need to scrape e-commerce data, especially product data, you may encounter a situation like this:
For some products with different options, you want to collect each variant's price, SKU, etc. Taking this hair dye product as an example, you may need to scrape its pricing for each color.
In this tutorial, we will show you how to scrape information about different product variants. To show you how to do it with Octoparse, we can take this web page URL as an example:
For this product, its color, pricing, images, page URL, and product ID will vary when you switch the option.
The main steps are shown in the menu on the right. You can download the demo task here.
1. Create a Go to Webpage - to open the target website
Enter the URL on the home page and click Start
2. Create a Loop Item - to loop through each color option
Click the first color option on the list, and then choose Select all similar elements on the Tips panel
Choose Loop click each image (or Loop click each element)
Choose No
Set up AJAX timeout (Learn more about Handling AJAX)
Make sure the Open in a new tab option is UNselected
Click Apply to save
(Optional) Click Loop Item to change the "Loop Mode" from Fixed List to Variable List. Then, enter the Element XPath: //a[contains(@class,"cover-swatch js-cover-swatch")]. This is important when you have different products with different numbers of colors to scrape.
Tip: The XPath above only works for the example web page we use in this tutorial. For your own target websites, you will need to write the XPath on your own. Check out this tutorial to learn how to write it: What is XPath and how to use it in Octoparse
3. Extract Data - to extract all product-related data
You can click on the elements on the page to extract the data you need and rename the data fields if needed.
The final workflow looks like this:
4. Run the task - to get your desired data
Click Save on the upper right to save your task
Click Run next to it and wait for a Run Task window to pop up
Select Run on your device to run the task on your local device
Wait for the task to complete
Here is the data output sample.