You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!
Quora is an American social question-and-answer website. Users can collaborate by editing questions and commenting on answers that have been submitted by other users to share ideas.
This tutorial will show you how to scrape questions from Quora.
To follow through with the tutorial, you may want to use the URL below:
The main steps are shown in the menu on the right.
[Download task file here]
1. Create a Go to Web Page - to open the target website
2. Set up a Page scroll - to load more data
Click the + add step button to add a step in the workflow
Click Loop
Set the Loop Mode as Scroll Page
Set Scroll way as Scroll to the bottom of the page
Set Scroll times: 10 times for example (more scroll means more data)
Set the Wait time: 2-3s recommended
Click Apply to save the change
3. Create a Loop - to capture the list of questions from the webpage
Click the + add step button to add a step inside the Scroll Page Loop
Select Loop
Set the Loop Mode as a Variable List
Input the Matching XPath as: //div[contains(@class,"q-box qu-borderAll")]/div[2]/div
Click Apply to save the changes
Note: If you want to learn more about Loop Item, please check out here.
4. Create an Extract Data step - to extract questions and related information you need
Click the "+" button to add a step in the workflow
Select Extract Data
Click on your target data field and select Element data on the Tips panel
Double-click the data field names to rename them if needed
5. Change the XPath of the data field - to locate the data correctly
The auto-generated XPath of the title cannot locate all the data needed, so we have to update the XPath to make sure it functions properly.
Click "..." >Customize XPath
Paste //div[@class="q-inline qu-flexWrap--wrap"] in the Matching Xpath box
Click Apply
6. Set up Retry action - to resolve page loading error
For some reason, sometimes the search results on Quora can only be shown after reloading the page. To ensure the page can be loaded successfully, we can set up the Retry action.
Click Go to Webpage
Go to the Retry tab
Select contains
Input We couldn't find any results
Set up Retry for 5 times
Click Apply
7. Run the task - to get your target data
Save your task from the upper right
Run the task (next to Save)
Select Run on your device to run the task on your local device
Wait for the task to complete
Here is a sample output from a local run: