If you need to scrape e-commerce data, especially product data, you may encounter a situation like this:
For some products with different options, you want to collect each variant's price, SKU, etc. Taking this shampoo product as an example, you may need to scrape its pricing for each size.
In this tutorial, we will show you how to scrape information about different product variants. To show you how to do it with Octoparse, we can take this web page URL as an example:
For this product, its color, pricing, images, rating, and product description will vary when you switch the option.
The main steps are shown in the menu on the right. You can download the demo task here.
1. Create a Go to Webpage - to open the target website
2. Create a Loop Item - to loop through each color option
Click the first size option on the list, and then choose Select all similar elements on the Tips panel.
Choose No.
Set up AJAX timeout (Learn more about Handling AJAX).
Make sure the Open in a new tab option is unchecked.
Click Apply to save.
(Optional) Click Loop Item to change the "Loop Mode" from Fixed List to Variable List.
Then, enter the Element XPath://li[contains(@class, "product-variant-swatch")].
Click Apply to save.
Tip: The XPath above only works for the example web page we use in this tutorial. For your target websites, you will need to write the XPath on your own. Check out this tutorial to learn how to write it: What is XPath and how to use it in Octoparse
3. Extract Data - to extract all product-related data
Select Click Item to enter the subpage.
Click on your targeted data on the page and click Text to extract the data.
Rename the data fields if needed.
The final workflow looks like this:
4. Run the task - to get your desired data
Click Save and Run the Task.
Select Run on your device to run the task on your local device.
Wait for the task to complete.
Here is a sample of the data output: