All Collections
Case Tutorial
News Portal
Scraping news from Digital Journal.com
Scraping news from Digital Journal.com
Updated over a week ago

You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

Digital Journal is a website that provides world news including but not only tech & science, social media, business...

In this tutorial, we are going to show you how to scrape search results from Digital Journal.com both the listing page and detailed page.

1.png
2.png

To follow through, you may want to use the URL in this tutorial:

The main steps are shown in the menu on the right. [Download task file here]


1. Go to Web Page - open the target web page

  • Enter the URL on the home page and click Start


2. Auto-detect web page - to create a workflow

Octoparse's internal auto-detect function can help to automatically generate a workflow quickly. Further modifications can be made based on this.

  • Click on Auto-detect web page data and wait for the detection to complete

  • Check the data fields in Data Preview and delete unwanted data

  • Untick Add a page scroll

  • Click Create workflow


3. Select subpage URL - loop click into each item

  • Click Select subpage URL on the tips panel

  • Choose Title_URL in the drop-down box under Click on an extracted data field

  • Click Confirm


4. Extract Data - select the data to scrape

  • Click on the wanted data

  • After the information turn green, Click Text in the tips box

  • Repeat the steps above to scrape more data fields

The final workflow will look like this:

WF.png

5. Run the task - to get the desired data

  • Click the Save button first to save all the settings you have made

  • Then click Run to run your task either locally or cloudly

  • Choose one run mode and run the task

  • Waiting for the task to complete

Below is a sample data run from the local. Excel, CSV, HTML, and JSON formats are available for export.

DATA.png

Note: Local runs are great for quick runs and small amounts of data. If you are dealing with more complicated tasks or a mass of data, Run in the Cloud is recommended for higher speed. You are very welcome to try the premium feature by signing up for the 14-day free trial here. Tasks can be scheduled hourly, daily, or weekly and data delivered regularly.

Did this answer your question?