Octoparse not only captures information from the web page body, but also gets page-level data including webpage URL, page title, meta description, meta keywords, and HTML source code.
You can easily follow the steps below to add them:
STEP 1.
Select an Extract Data from the workflow
STEP 2.
Go to the Data Preview section then click on the Add Custom Fields button
STEP 3.
Select your target data field from Page-level data
STEP 4 (optional).
Rename the data field by double-clicking on the field name
There are 5 types of data that can be added:
Page URL: URL of the current page
Page title: title of the current page, which is a short description of a webpage and appears at the top of a browser window.
Meta description: meta description tag of the current page, which contains a summary of the page.
Meta keyword: meta keyword tag of the current page
HTML source code: the complete HTML code of the web page