Many websites use the "Load More" or "Show More" button to load content continuously. Websites very commonly use this technique to create a better user experience.
Unlike Pagination with a "Next" button, the "Load More" button keeps adding more content onto one web page, making it trickier to scrape. In this article, I will show you how to deal with the "Load More" button in Octoparse.
You may need this example link to follow through:
1. Use Auto-detect to deal with the "Load More" button
Octoparse's auto-detect web page data feature can easily help with this type of website.
Click on Auto-detect web page data and wait for the process to complete
You will see a Click on a "Load More" button on the Tips panel.
Click Check to see if the Load more button has been located correctly. If not, click Edit to choose the right button.
Click Edit to set the number of clicks, which is how many times you want to click on the Load More button.
Set up AJAX timeout, which is the time for the page to load after the button is clicked
Click Create workflow to generate the settings
The workflow should look like the picture below:
2. Create a pagination action manually
If the auto-detection does not find the load more button, you can try to manually create the pagination step.
Select the Load More button on the web page and choose Loop click
Set up a proper AJAX timeout (what is AJAX?)
A Pagination step will be created in the workflow and you can then add other steps to get data.
Tips:
1. If you only wish to click the "Load More" button for X times, click the Pagination box, tick "Repeats," and set Repeats to the number X.
2. If you find that the task gets many duplicates during scraping, you can drag the Loop Item out of the Pagination so that Octoparse will start to scrape after loading all the items.