All Collections
Case Tutorial
E-Commerce
Scrape product info from Shein
Scrape product info from Shein
Updated over a week ago

You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

SHEIN is an online fast fashion retailer which has a great impact on the fast fashion industry and is a super hit on Tik-Tok now. It mainly serves women's wear at a much lower price in the fast fashion industry.

To follow through, you may want to use the URL in the tutorial:

We will scrape data such as the Product Name, Price, Image URL, SKU, Number of reviews, and Scores.

The main steps are shown in the menu on the right, and you can download the sample task file here.


1. Go to Web Page - to open the targeted web page

  • Enter the URL on the home page and click Start

mceclip0.png

2. Auto-detect web page - to create a workflow

  • Choose Auto-detect web page data

  • Wait for the detection to complete

auto_detect.jpg
  • Untick Add a page scroll

  • Click Create workflow button on the Tips panel

Create_workflow.jpg

Check the data fields on the Data Preview, and you can also delete the unwanted fields or rename fields if needed

  • Delete fields

  • Rename fields


3. Select subpage URL - to extract detailed product information

  • Choose Select subpage URL on the Tips panel

  • Select the "Title_URL" button on the web page from the drop-down menu (you can confirm if it's the correct link on the Data Preview)

  • Click Confirm


4. Extract data - to select the data for extraction

  • Click on the data you want to extract from the page

  • Select Text on the Tips panel

  • Repeat the steps until you get all the data needed to be scrapped

  • Edit the name of the data fields if needed

  • Add some wait time before each step according to your local network

  • Click on Extract Data, click the vertical view icon, and change the XPath for Product_name and Title_URL fields to /div[@class="S-product-item__info"]/div/a


5. Run task - get the data you want

  • Click Save, and click Run on the upper right side

  • Select Run on your device to run the task on your computer

Here is the sample output:

Did this answer your question?