Scrape questions from Quora
Updated over a week ago

You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

Quora is an American social question-and-answer website. Users can collaborate by editing questions and commenting on answers that have been submitted by other users to share ideas.

This tutorial will show you how to scrape questions from Quora.

quora.jpg

To follow through with the tutorial, you may want to use the URL below:

The main steps are shown in the menu on the right.

[Download task file here]


1. Create a Go to Web Page - to open the target website

  • Enter the target URL into the search bar on the home screen and click Start


2. Set up a Page scroll - to load more data

  • Click the + add step button to add a step in the workflow

  • Click Loop

create_loop.jpg
  • Set the Loop Mode as Scroll Page

  • Set Scroll way as Scroll to the bottom of the page

  • Set Scroll times: 10 times for example (more scroll means more data)

  • Set the Wait time: 2-3s recommended

  • Click Apply to save the change

quora1.jpg

3. Create a Loop - to capture the list of questions from the webpage

  • Click the + add step button to add a step inside the Scroll Page Loop

  • Select Loop

quora2.jpg
  • Set the Loop Mode as a Variable List

  • Input the Matching XPath as: //div[contains(@class,"q-box qu-borderAll")]/div[2]/div

  • Click Apply to save the changes

xpath_loop.jpg

Note: If you want to learn more about Loop Item, please check out here.


4. Create an Extract Data step - to extract questions and related information you need

  • Click the "+" button to add a step in the workflow

  • Select Extract Data

quora3.jpg
  • Click on your target data field and select Element data on the Tips panel

  • Double-click the data field names to rename them if needed

rename.jpg

5. Change the XPath of the data field - to locate the data correctly

The auto-generated XPath of the title cannot locate all the data needed, so we have to update the XPath to make sure it functions properly.

  • Click "..." >Customize XPath

  • Paste //div[@class="q-inline qu-flexWrap--wrap"] in the Matching Xpath box

  • Click Apply


6. Set up Retry action - to resolve page loading error

For some reason, sometimes the search results on Quora can only be shown after reloading the page. To ensure the page can be loaded successfully, we can set up the Retry action.

  • Click Go to Webpage

  • Go to the Retry tab

  • Select contains

  • Input We couldn't find any results

  • Set up Retry for 5 times

  • Click Apply


7. Run the task - to get your target data

  • Save your task from the upper right

  • Run the task (next to Save)

  • Select Run on your device to run the task on your local device

  • Wait for the task to complete

Here is a sample output from a local run:

Did this answer your question?