All Collections
Using Octoparse
Get page-level data (metadata, URL, title & HTML)
Get page-level data (metadata, URL, title & HTML)

How to use Octoparse to extract page-level data, including webpage URL, page title, meta description, meta keywords, and HTML source code.

Updated over a week ago

Octoparse not only captures information from the web page body, but also gets page-level data including webpage URL, page title, meta description, meta keywords, and HTML source code.

You can easily follow the steps below to add them:

STEP 1. Select an Extract Data from the workflow

STEP 2. Go to the Data Preview section then click on Add Custom Fields button

1.png

STEP 3. Select your target data field from Page-level data

3.png

STEP 4 (optional). Rename the data field by double-clicking on the field name

4.png

There are 5 types of data can be added in this way:

  • Page URL: URL of the current page

  • Page title: title of the current page, which is a short description of a webpage and appears at the top of a browser window.

mceclip2.png
  • Meta description: meta description tag of the current page, which contains a summary of the page.

mceclip3.png
  • Meta keyword: meta keyword tag of the current page

mceclip4.png
  • HTML source code: the complete HTML code of the web page

Did this answer your question?