Octoparse supports downloading images and document files to local folders during the local scraping process. Documents in jpg, png, gif, doc, pdf, ppt, txt, xls, and zip formats are currently supported.

In this tutorial, we are going to show you how to download files and images with Octoparse.

Note:

File downloads can only be used in local runs currently. Cloud run cannot download files.
Octoparse can only be downloaded based on download URLs scraped. If we cannot scrape the download URLs, we cannot download the file.
Octoparse cannot trigger a download by clicking on the download button.

1. Download Files

Sample URL for the downloaded file settings:

https://www.rhinorack.com/en-au/products/roof-racks/canopy/flexiglass-canopy-racks/pioneer-tradie-1528mm-x-1236mm-rlt600_jb0879

Click on one of the files - Choose the document you want to download and the selected element will turn green.

Click Document file - To extract the links as well as download the files to local folders

You will see two fields created in the data preview: one field shows the download URL, and the other field shows the location of the file to be downloaded to.

Note:

Deleting the field with a folder icon in the name will cancel the download settings.
If you have already set up a field to scrape the download URL, you can click on More-> Dowaload files

Name downloaded files - You can easily rename the downloaded files using the four provided options. These options can be found on the Tips panel after you click Document file.

MD5 Hash Value: Use the MD5 value to name the files
Original File Name: Default original file name
Completion Time: Use the download completion time to name the files
Data Field Value: Use a data field value to name the file

What to do if the file name already exists - if the file name already exists in the folder, there are also three ways to deal with the situation.

Skip the new file: Skip the current downloaded file
Replace the existing file: Replace the existing file with the newly downloaded file
Rename the new file: Rename the new file with a (1) at the end of the file name

2. Download Images

Downloading images to local folders shares the same logic as downloading files.

The sample URL for the downloaded image settings is

https://www.ebay.com/itm/335402284583?

Click on one image

Click Image file - To extract the links as well as download the images to local folders

Note: Only complete URLs with "https://" can be downloaded directly with Octoparse. If the URL value scraped is only part of the complete download link, you can use the Add prefix or other data refining features in the Clean Data function to get the valid download links.

3. Download Settings

3.1 Download file settings

Click the arrow icon in front of the data field

You can rename the downloaded files, separate multiple URLs and input URLs to skip the download files here

3.2 Download Location settings

Click on the Task Settings icon in the upper right corner of the task settings screen

Choose Downloads
Click the Browse button - Choose a local folder for the downloaded files and images
Choose an option for the When a local run starts
Click Save - Save all the modifications

Scrape company info from Goodfirms.co

Scrape News Articles from Bloomberg

Scrape cryptocurrency prices from CoinGecko

Scrape App data from Google Play

Scrape product images from eBay

Scrape and download files from websites

1. Download Files

2. Download Images

3. Download Settings

3.1 Download file settings

3.2 Download Location settings