You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!
Quora is a place to gain and share knowledge. It's a platform to ask questions and connect with people who contribute unique insights and quality answers. People here share their brilliant ideas from different countries and different careers.
This tutorial will show you how to scrape answers from Quora with question URLs. If you have no question URLs at hand, you can follow this tutorial first: Scrape questions from Quora
To follow through the tutorial, you may want to use the URL below:
The main steps are shown in the menu on the right and you can download the task file here.
1. Create a Go to Web Page - to open the target website
To start our scrape journey, the target website needs to be input first.
Enter the search URL into the search box at the center of the home screen. Click Start to create a new task.
2. Set up a Page Scroll - to load more data
Click on "+" under Go to Web Page Step
Click on Loop
Click Loop Item
Choose Loop Mode >>Scroll Page
Tick >> for one screen
Repeats >>100 times
Click Apply
Note: For more information about Page Scroll settings, please check this article: Set up a page scroll
3. Create a Loop - to capture the list of answers from the webpage
Click on "+" to add a step inside the scroll page loop
Click Loop
Switch Loop Mode to Variable List
Input the XPath in Matching XPath box: //div[not(contains(@class,'question_page_ad')) and(contains(@class,'question_answer_item'))]
Click Apply to apply the settings
4. Set up a Branch - to extend the whole content of the answer
Some answers would be folded when it is too long, so we need to click "Continue Reading" on the page to extend the whole answer. While some may not need to be extended. So here we set a branch to let Octoparse judge whether we need to click the "Continue Reading" or not.
Click on "+" button inside Loop Item to set a Branch in the workflow
Click Branch Conditions
Choose the left branch box
Tick Execute if the current Loop contains a specific element
Put Xpath in the Matching XPath box as: //div[contains(text(),'Continue Reading')]
Click Apply to apply the settings
Click "+" in the left branch to add a Click step inside
Click on the Click Item
Choose Relative XPath to the Loop Item
Set up the XPath for the Click Item as //div[contains(text(),'Continue Reading')]
Click on Options
Set up AJAX Load as 5s
The whole branch setting means to execute the click procedure if there's a "Continue reading" button.
Note: For More Branch setting details, please check this article: Branch Conditions
5. Create an Extract Data - to extract the data you need
After the branch has been set up, we need to add a data extract step for the final extraction. Also, make sure the step is included in the loop.
Click "+" under the Branch box
Click Extract Data
Click Element data on the Tips
Double-click the data fields to rename them if needed
6. Modify the Xpath - to locate data accurately
To locate the data we want accurately, the XPath for the fields needs to be modified.
Switch to Vertical View
Input the Xpath below for each field in Field Settings:
User: //div[@class="q-inlineFlex qu-alignItems--center qu-wordBreak--break-word"]
Career: //div[contains(@class,"truncateLines")]/span[1]/span[2]
Date: //a[contains(@class,'answer_timestamp')]
Answer: //div[contains(@class,'test_answer_content')]
The final workflow will look as below:
7. Run the task - to get the target data
Click the Save button first to save all the settings you have made
Then click Run to run your task.
Select Run on your device and click Standard Mode to run the task on your local device
Wait for the task to complete
Below is a sample data run from the local. Excel, CSV, HTML, and JSON formats are available for export.