All Collections
Case Tutorial
Social Media
Scrape the replies of a Tweet
Scrape the replies of a Tweet
Updated over a week ago

You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

Twitter is a free social networking site where users broadcast short posts known as tweets. Users on Twitter post an average of 6000 tweets every second, making it over 500 million tweets posted daily. These tweets can contain text, videos, photos, or links and users can interact with each other by replying to tweets.

In this tutorial, we will show you how to scrape the replies to a tweet on Twitter.

tweet_1.jpg

To follow through with the tutorial, you may want to use the URL below:

The main steps are shown in the menu on the right, and you can download the sample task file here.


1. Create a Go to Web Page - to open the target website

  • Enter the target URL into the search bar on the home screen and click Start


2. Log in to Twitter - to load the replies

  • Click to turn on Browse mode

  • Click on Log in

  • Input your login credentials to log in to Twitter

  • Click on Go to Webpage>go to the Options tab >tick Use cookie>Click on Use cookie from the current page>Click on Apply

3. Auto-detect the webpage - to create a workflow

Octoparse's Auto-detection function can help you quickly create a workflow according to the target website's design.

  • Click Auto-detect web page data in Tips and wait for the detection to complete

  • Click Create workflow


4. Modify the settings of page scroll-down - to better scroll down the page and fully load the data

  • Click on Scroll Page

  • Set the Wait time: 2-3s recommended

  • Click Apply to save the change

Note: Check out here to find out more about extracting data while scrolling the page.


5. Modify the XPath of the loop - to locate the data field(s) more accurately

  • Click on Loop Item in the workflow

  • Input the Matching XPath as: //div[@class="css-1dbjc4n r-18u37iz"]/div[2]

  • Click Apply to save the change

  • Check the data fields in Data Preview and delete unwanted fields by clicking More>Delete field


6. Extract the text - to select the data you want

  • Click on the element you are interested in

  • Choose Text on the Tips panel

After selecting the data, you can go to the Data Preview section and rename the data fields if needed


7. Run the task - to get your desired data

  • Click Run to run your task either on your device or in the cloud

  • Select Standard Mode under Run on your device section to run the task on your local device

  • Wait for the task to complete


Here is a sample output from a local run:

Tip: Local runs are great for task troubleshooting and quick runs. If you are dealing with more complicated tasks, it is recommended that you select Run in the Cloud to run the task in Octoparse's cloud-based platform for higher speed. Try out this premium feature by signing up for the 14-day free trial here. You can also schedule your tasks to run hourly, daily, or weekly and get data delivered to you regularly.

Did this answer your question?