You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!
In this tutorial, we will show you how to scrape posts from LinkedIn.com. To follow through, you may want to use this URL in the tutorial: https://www.linkedin.com/search/results/content/?keywords=google&origin=GLOBAL_SEARCH_HEADER&sid=DIi
The main steps are shown in the menu on the right, and you can download the sample task file here.
1. Create a Go to Web Page - to open the target website
2. Log in to the website - to access the data
After the login page is loaded, click on the Email input box and choose Enter text
Click on the password input box
Set up the AJAX timeout to 10s
3. Auto-detect webpage - to create the workflow
Select Auto-detect web page data
Click Create workflow
Click on Scroll Page, set up Scroll for one screen, Repeat and Wait time
Check the data fields in Data Preview and delete unwanted fields or rename them if needed
4. Modify the XPath of Loop Item - to locate more posts
LinkedIn pages are quite complicated. The auto-generated XPath does not work perfectly. So we need to update the XPath.
Click on Loop Item and input the XPath:
//ul[contains(@class,"reusable-search__entity-result-list")]/li
Click Apply
5. Extract the data - to choose the data you want
The auto-detection feature can help us select most of the data we need, but there may still be some data that we need to manually select.
Click on the element on the page
Choose Text
It takes two more steps to extract the post URLs
6. Modify the XPath of data fields - to locate the data precisely
You may need to modify the XPath of some data fields that do not show on the data preview section. By doing so, we can make data scraping more precise. Here are some prepared XPaths for you.
Post_URL: //div[contains(@class,"description-container")]/div/a
Content://div[contains(@class,"feed-shared-update-v2__commentary")]
Comments: //li[contains(@class,"social-details-social-counts__comments")]
7. Run task - to get the data
Click Run to run your task either on your device or in the cloud
Select Standard Mode under Run on your device section to run the task on your local device
Wait for the task to complete
Note: We do not suggest running the LinkedIn tasks in the Cloud because the website will detect that you are logging in with a suspicious IP.
Here is the sample output data, which can be exported in Excel, CSV, HTML and JSON formats.