Clicking each link in a list and scraping data from a new page is a common scenario in web scraping. This tutorial will show you how to click through a listing page to a detail page to get the data you need. This is especially useful when extracting from e-commerce sites (Amazon, eBay, etc) and business directories (Yellowpage, etc).
You may need this link to follow through:
Step 1
Enter the URL in Octoparse and click Start
Step 2
Click on the first product title that contains the product page URL. The selected title will be highlighted in green while all the other similar product titles will be highlighted in red.
Step 3
Click Select all similar elements on the Tips panel
Note: If there is no Select all option on the Tips panel after you select the first URL, please continue to select the second URL.
Step 4
Select Loop click each element, or Loop click each URL from the Tips panel.
Step 5
Once you get this pop-up, click on Yes and Next page button
Note: If your target data is only displayed on one single page, you can just click on No and skip Step 5.
Step 6
Scroll to the end of the webpage and click the next page button and Confirm
The steps will then be auto-generated and added to the workflow.
Note: To loop click-through all the links on the list, it is important that you select the anchor element. Octoparse automatically identifies tags for selected items. So when you select an item with a URL, the selected tag would be "A", which stands for an anchor that usually links one page to another.
If you find Octoparse does not locate the A tag, you can click the "A" on the Tips panel.
Step 7
Then click on target data fields such as title, review, price, etc. to extract them
Note: Setting up a wait time in Options for steps like "Click Item" or "Extract Data" can effectively avoid data skipping and make the crawling process more human-like. (Usually, 2-5 seconds would work well). Then click Apply to confirm.
The final workflow should look like this:
Note: If you encounter any issues getting your task to run properly, here are some tutorials to help you troubleshoot.