All Collections
Using Octoparse
Scrape data from a table
Scrape data from a table
Updated over a week ago

Table data is common among websites related to finance, sports, etc. This tutorial will guide you on how to scrape table data.

If you have learned how to grab a list of data, then table data is more or less similar (Extract a list). You can take each row of the table as an element of list data. Then, each table cell is equal to a sub-element in the element.

How to collect the table data with Octoparse? Go ahead with this tutorial!

mceclip0.png

1. Use the Auto-detect function to set up the workflow

Octoparse supports auto-detecting the table and capturing all the columns. With this feature, you just need to

  • Copy the URL into Octoparse and click Start to create a new task

  • Click on Auto-detect web page data in the Tips panel to create a workflow

  • Check if all cells have been captured and click Create workflow

byutt.png

Tip: Check out Lesson 1: Start with Auto-detect for details about auto-detect.


2. Set up workflow manually

What if the auto-detect fails or doesn't collect the complete table data? In this case, you need to set up the task manually. Here are the steps:

  • Select the first cell in the first row of the table, and then click the Expand the selection button until it selects the whole first row

Tip: You can click Turn OFF Auto-detection or Cancel Auto-detection to stop auto-detect if it starts automatically.

  • Choose Select all child elements on the Tips panel.

All the child elements in the first row are selected, and then Octoparse finds other similar elements highlighted in red.

red.png
  • Choose Select all similar groups from the Tips panel.

All the child elements in the table are selected and highlighted in green.

green.png
  • Click Element data on the Tips panel.

  • Edit data fields if needed (optional)

You now have all the data fields set up for the task. You can refine the data fields in the Data Preview section.

  • Double-click the field name to rename the data fields

  • Click the More button next to the field's name for more actions

Did this answer your question?