You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!
SHEIN is an online fast fashion retailer which has a great impact on the fast fashion industry and is a super hit on Tik-Tok now. It mainly serves women's wear at a much lower price in the fast fashion industry.
To follow through, you may want to use the URL in the tutorial:
We will scrape data such as the Product Name, Price, Image URL, SKU, Number of reviews, and Scores.
The main steps are shown in the menu on the right, and you can download the sample task file here.
1. Go to Web Page - to open the targeted web page
Enter the URL on the home page and click Start
2. Auto-detect web page - to create a workflow
Choose Auto-detect web page data
Wait for the detection to complete
Untick Add a page scroll
Click Create workflow button on the Tips panel
Check the data fields on the Data Preview, and you can also delete the unwanted fields or rename fields if needed
3. Select subpage URL - to extract detailed product information
Choose Select subpage URL on the Tips panel
Select the "Title_URL" button on the web page from the drop-down menu (you can confirm if it's the correct link on the Data Preview)
Click Confirm
4. Extract data - to select the data for extraction
Click on the data you want to extract from the page
Select Text on the Tips panel
Repeat the steps until you get all the data needed to be scrapped
Edit the name of the data fields if needed
Add some wait time before each step according to your local network
Click on Extract Data, click the vertical view icon, and change the XPath for Product_name and Title_URL fields to /div[@class="S-product-item__info"]/div/a
5. Run task - get the data you want
Click Save, and click Run on the upper right side
Select Run on your device to run the task on your computer
Here is the sample output: