You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!
Amazon is one of the most popular e-commerce websites around the world. Many users try to scrape it to collect product information. In this tutorial, we are going to show you how to scrape product details from Amazon.
You can also go to Templates section on the main screen of the Octoparse scraping tool and start with the ready-to-use Amazon Templates directly to save your time. Octoparse provides several Amazon templates designed for different countries such as Germany, France, the US, Spain, and India. With this feature, there is no need to configure scraping tasks. For further details, you may check it out here: Templates
If you would like to know how to build the task from scratch, you may continue reading the following tutorial or check this video below.
To follow through, you may want to use this URL in the tutorial:
The main steps are shown in the menu on the right, and you can download the sample task here.
1. Go to Web Page - to open the targeted web page
Enter the URL on the home page and click Start
2. Auto-detect the web page - to create the workflow
A Pagination and Loop Item would be generated automatically in the workflow.
Click More and Delete field to get rid of the unwanted data
Double-click to rename data fields
If all the data you need could be scraped from the listing page, you can stop here and jump to Set up AJAX timeout for "Click to Paginate". If you want to go to each product detail page to get more info, follow the steps below.
3. Click on each product link - to scrape more information
Click the second item on the page and choose Click element on the Tips panel
This is how the workflow should look like:
Click Click Item and paste the new XPath: //a[@class="a-link-normal s-no-outline"]
Click Apply
4. Extract Data - to extract data from the detail pages
Select information on the web page
Choose Text
Repeat the above steps to extract all the data you need
5. Set up AJAX timeout for "Click to Paginate"
Click open the settings of Click to Paginate
Go to Options
Tick Load with AJAX and select 10s as the AJAX timeout
6. Run extraction - run your task and get data
Click Save
Click Run on the upper left side
Select Run on your device to run the task on your computer, or select Run task in the Cloud to run the task in the Cloud (for premium users only)
Here is the sample output.