Following-sibling function

The following-sibling function is a useful tool in XPath that allows you to select elements that come after a specific element. This function is especially beneficial when one desires to extract data that are not located in the same position across multiple pages.

To use the following-sibling function, you must first select the element that you want to use as a reference point. This can be done using any of the standard CSS selectors, such as class, id, or element selectors. Once you have selected the reference element, you can then use the "following-sibling" function to select any elements that come after it.

Example

Step 1: Identify the Reference Point

In this example, we will use the "Item Weight" as our reference point. This means that no matter where the "Item Weight" data appears on the page, we will be able to extract it.

Step 2: Locate the "Item Weight" Data

Once you have identified the reference point, you can now locate the "Item Weight" data on the product page.

Step 3: Use XPath to Extract the Data

Now that we have located the "Item Weight" data, we can use XPath to locate it. To do this, please access the webpage using the Chrome browser, right-click and select the option to inspect the desired data.

<th class="a-color-secondary a-size-base prodDetSectionEntry">
Item Weight
</th>
<td class="a size-base">
10 pounds
</td>

You can see that the "Item Weight" data is contained within a

th

tag with the text

Item Weight

, so we can use the following XPath to extract it:

//th[contains(text(),'Item Weight')]/following-sibling::td[1]

Conclusion

In conclusion, extracting "Item Weight" data from different product pages can be made easier by using XPath. By identifying a reference point, you can easily extract this data no matter where it appears on the page. Remember to test and refine your code to ensure it works consistently on different product pages.

Tips: how to modify the XPath

Step 1:

Navigate to the Data Preview section and select the option to "Customize XPath".

Step 2:

Enter the new XPath into the designated text box for Matching XPath, and then click Apply to save the changes.

What is XPath and how to use it in Octoparse?

Customize element XPath

Scrape product information from eBay

Scrape business information from Google Maps

XPath Cheatsheet for Web Scraping with Octoparse

Locate elements based on nearby text ("following-sibling" function)