If you are seriously looking into scraping a website, you may need to navigate the website's different pages and extract data from each page. The first step is to identify the pagination you are dealing with and work from there. A few examples are:

Paginate using a "Next" button
Paginate without a "Next" button
Paginate with infinite scroll
Paginate using a "Load more" button

In this tutorial, we will focus on how to create a pagination action when there is no next page button on the page. More specifically, one that requires clicking the numbered links when you want to turn the page, like the ones below.

Let's explore how you can create a pagination action with no next page button in Octoparse.

1. Create Pagination and update Pagination XPath

The underlying logic for resolving the issue is that we need to write an XPath that can always locate the next page number.

This will be a two-step process:

STEP 1: Write/find the XPath of the page element that takes you to the next page (e.g., if you are on page 1, then you would want to click page 2; if you are on page 2, then you would like to click page 3, so on and so forth).

STEP 2: Revise the XPath of the Pagination in the workflow in Octoparse.

Note: XPath knowledge is not mandatory but it is extremely helpful to create a task that does exactly what you need in Octoparse. Check out What is XPath and how to use it in Octoparse to learn more about using XPath to create the perfect web scraper.

Sounds complicated? No worries, let's dive into an example.

To follow through, you may use the link below:

https://www.designrush.com/agency/ui-ux-design/us?ascending=0&orderby=min_budget

Click on the number 2 button
Choose Loop click

A Pagination will be created.

Now we need to write the XPath for the Pagination.

*The button we need to click is different when we are on different pages. But the target button will always be after the current page button. So we need to first find the XPath for the current page button. Then we can use the following-sibling to get the next page number.

Copy and paste the current page URL (https://www.designrush.com/agency/ui-ux-design/us?ascending=0&orderby=min_budget) to your browser (e.g., Chrome).

Note: You need to download a browser add-on tool called XPath Helper.

In your browser, click to launch the XPath Helper.

Locate the page numbers on the web page, right-click page 1 and select the Inspect option

By now, your screen should look like below. The highlighted code corresponds to the link on page 1.

We can write an XPath to locate the current page number first based on on the HTML code.

//li[contains(@class,"page-item active")]

If you put the XPath to the XPath Helper, you will see the page 1 button is found.

Now write the XPath for the next number: //li[contains(@class,"page-item active")]/following-sibling::li[1]/a

Now we just need to update the XPath for the Pagination.

Click on Pagination
Input the XPath //li[contains(@class,"page-item active")]/following-sibling::li[1]/a

Note that usually we need to locate //a tag to make sure the located element is clickable.

2. Use "Batch Generate" to create URLs for all pages

An alternative but very effective way to approach scraping multiple pages of a website is to first collect the URLs of all the pages you would need to scrape and build a task using the list of URLs collected.

Take a closer look at the web page URLs for the different pages. Do you notice something like this?

If you see a similar pattern to the example above, with only the page number changing in the URLs of the different pages, you can easily batch generate all the page URLs and scrape as many pages as needed. Once you have the links generated, Octoparse will go on to scrape all the pages automatically.

Loop Item (Loop URLs/Pagination)

Dealing with pagination (clicking on a "Next" button)

What is XPath and how to use it in Octoparse?

Scrape job information from Indeed

How to handle 'Next 10 Pages' pagination

Dealing with pagination (no "Next" button)

1. Create Pagination and update Pagination XPath

2. Use "Batch Generate" to create URLs for all pages