With a reported, 211 million daily active users, Twitter (officially known as X now) has proven its worth in social media marketing. Users on Twitter post an average of 6000 tweets every second, making it over 500 million tweets posted daily. All of this chatter and noise is a treasure chest full of valuable information for marketers, brands, researchers, and analysts. Marketers and brands often scrape Twitter data from specific accounts (influencers and competitors) to analyze engagement and plan effective strategies.
Due to popular demand, this tutorial is the first in a series of tutorials that the Octoparse team has prepared for users with a need for Twitter data.
In this post, we are going to teach you how to scrape the followers/following list from any public account.
We will scrape the followers/following list for Nintendo of America. Check out the two sample URLs below:
Note: Although the workflows are extremely similar, you still need to create two separate tasks to scrape the two lists, using different XPaths.
The main steps are listed in the menu on the right, and you can access the sample task here.
1. Create a Go to Web Page - to open the target web page
Every workflow in Octoparse starts by telling Octoparse a web page to start with.
Enter the link of the followers/following list page into the search bar at the top of the home screen and click Start.
2. Log into Twitter in Browse mode - to save cookies for authentication
Twitter forbids direct access to followers/following lists unless you've logged in first.
Toggle on Browse mode and log into Twitter as you do in a normal browser
Click the Go to Web Page action to open its settings panel (located at the bottom right)
Go to the Options tab and tick Use cookies
Click Use cookie from the current page
Click Apply to save the settings
Turn off Browse Mode
We have now successfully saved the login information in the task workflow so that our Twitter account can be logged in when we run the task.
3. Create Extract Data - to scrape the basic info for the public account from the page heading
Click on the display name (e.g., Nintendo of America) and select Text on the Tips panel
Repeat the action to get the username
Click Add custom fields and select Page URL from Page-level data to get the profile URL
Tip: Twitter might change the XPath of the heading area. You might need to rewrite the XPath if the data preview section does not contain the correct information.
For the display name, the XPath is //h2[@dir="ltr" and @aria-level="2"]/span
For the username, the XPath is //h2[@dir="ltr" and @aria-level="2"]/following-sibling::div/span
4. Create a Loop Item to scrape the following/follower info
You will see a Loop Item created in the workflow
Click on the follower title and choose Text to create a field
Do the same to add other fields like follower ID and description
Click on the follower ID and choose A from the Tips to get the URL
Rename the field by double-clicking on the name
Click on the Loop Item box and update the XPath of the Loop Item into //div[@data-testid="cellInnerDiv"]
5. Add Page Scroll step
Twitter utilizes the page scroll to load more followers. You need to add Loop Item as the Page Scroll step.
Add a Loop Item inside the workflow
Set Loop Mode as Scroll Page
Set Scroll for one screen
Tick Capture data as page scrolls dynamically to minimize data loss
Click Apply to save the settings
7. Run the task - to get your desired data
Click Save on the upper right to save your task
Click Run next to it and wait for a Run Task window to pop up
Select Run on your device to run the task on your local device
Wait for the task to complete
Here is the sample output from a local run.
Tip: Local runs are great for task troubleshooting and quick runs. If you are dealing with more complicated tasks, it is recommended that you select Run in the Cloud to run the task in Octoparse's cloud-based platform for higher speed. Try out this premium feature by signing up for the 14-day free trial here. You can also schedule your task to run hourly, daily, or weekly and get data delivered to you regularly.