You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

If you scrape a list of URLs, you may want to get the original input URL as a field along with your target data so you can match them to see if there are any URLs that haven't been scrapped.

However, chances are the URLs might change after opening (e.g, some URL parameters might change) or be redirected to another totally different URL. Now the new feature of adding an Original input URL in Octoparse 8.5 perfectly resolves this dilemma! Let's see how to use this function.

What's the original URL Octoparse adds as a field?

For this function, Octoparse adds the original URL you input to Octoparse to start the task.

Single URL. If you start the task with one single URL, you will get the URL that you put in the Go to Web Page action

URL lists in the loop item. If you are extracting data from a URL list, you will get the URL list you input in the Loop URLs by using the Original Input URL

How to add the original URL?

Let's take this link as an example: https://www.yachtall.com/en/fwd/go-to-builder?id=75&js=1

Open this link in your browser and you will notice that the URL is redirected to another one: https://en.azimutyachts.com/

STEP 1. Input your URL(s) in Octoparse to start a task

STEP 2. Go to the Data Preview section and select Original input URL from Add Custom Field

You will see a field named Original_URL created as a field and the value of it is https://www.yachtall.com/en/fwd/go-to-builder?id=75&js=1 not https://en.azimutyachts.com/

Tip: You can also scrape the URL after redirecting, which means to get https://en.azimutyachts.com/ instead of https://www.yachtall.com/en/fwd/go-to-builder?id=75&js=1. Please check the tutorial Scrape page-level data (metadata, page URL, page title, source code)

Scrape and download files from websites

Create a task with a list of URLs

Scrape data from JSON links

Scrape App data from Google Play

Scrape hotel data from Booking

Add the original URL (before redirecting) along with the data scraped

What's the original URL Octoparse adds as a field?

How to add the original URL?