All Collections
Case Tutorial
E-Commerce
Scrape product information from Bukalapak
Scrape product information from Bukalapak
Updated over a week ago

You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier, and more robust! Download and upgrade here if you haven't already done so!

Bukalapak is an Indonesian E-commerce company. It enables small and medium enterprises to go online, and it also supports traditional family-owned businesses.

You can go to "Template Gallery" on the sidebar of the Octoparse home page and start with the ready-to-use Bukalapak Templates directly to save your time. With this feature, there is no need to configure scraping tasks. For further details, you may check it out here: Task Templates

This tutorial will show you how to collect product details on bukalapak.com with Octoparse.

2022-05-25_17-36-25.jpg

To follow through, you may want to use this URL in the tutorial:

The main steps are shown in the menu on the right, and you can download the sample task file here.


1. Create a Go to Web Page - to open the target website

  • Enter the page URL on the home screen and click Start to create a new task

bukalapak_0006.jpg
  • Click on Go to Web Page > Options

  • Tick Scroll down the page after it is loaded

  • Set Scroll as for one screen > Repeat times as 12

  • Click Apply to save

bukalapak_0000.jpg

2. Auto-detect the webpage - to create a workflow

  • Click Auto-detect web page data and wait for the detection to complete

bukalapak_0005.jpg
  • Check the data fields in Data Preview and delete unwanted fields

  • Uncheck Add a page scroll

  • Click Create workflow

2022-05-25_16-49-58.jpg
  • Rename them after creating workflow if needed


3. Modify the XPath of Loop Item - to locate the data field(s) more accurately

  • Choose Loop Item in the workflow

  • Input the Matching XPath as: //div[@class="bl-flex-container flex-wrap is-gutter-16"]/div

  • Click Apply to save

bukalapak_0003.jpg

4. Modify the settings of Pagination - to fully load the content on the webpage

  • Click Click to paginate in the workflow > Click Options

  • Tick Scroll down the page after it is loaded

  • Set scroll as to the bottom of the page

  • Set scroll times as 12

bukalapak_0002.jpg

5. Run the task - to get your target data

  • Click Save on the upper right to save your task

  • Click Run next to it and wait for a Run Task window to pop up

  • Select Run on your device to run the task on your local device

  • Wait for the task to complete

Here is a sample output from a local run:

Did this answer your question?