All Collections
Advanced Tools
XPath & Regular Expression
Use relative XPath to locate data outside a loop item
Use relative XPath to locate data outside a loop item
Updated over a week ago

Although somewhat rare, in some cases we need to get data outside an existing loop item.

Let's say we want to extract data from the Amazon Best Sellers page. For each product, we need to get its product details and the category it belongs to at the same time, as shown in the picture below:

categoryandproduct.jpg

If we create a loop just for the products, apparently "category" data will be outside the "product" loop. You might try to resolve the issue by creating another loop to get the category data. Try as you might, it will not end well... because Octoparse will yell at you for overlapping two loops. But if the new loop is completely independent of the existing loop, we will fail to establish data between the two loops.

It seems that we are stuck in a dilemma. What can we do? The answer is actually quite simple:

Use the XPath for the product loop as an AXIS and write relative XPath to locate the category data

mceclip5.png

In case you are still confused, allow me to explain to you step by step with the sample website: https://www.amazon.com/gp/bestsellers/?ref_=nav_em_cs_bestsellers_0_1_1_2

Create a loop for all the products

mceclip2.png

Check the XPath for the products in the HTML source code:

mceclip3.png

The XPath for the products will be: //li[@class="a-carousel-card"]

Using this as an axis, we can get the category data based on it :

//li[@class="a-carousel-card"]/ancestor::div[@class="a-row a-carousel-controls a-carousel-row a-carousel-has-buttons"]/preceding-sibling::div[@class="a-row a-carousel-header-row a-size-large"]//h2

As we have set the product XPath as the matching XPath for the loop item (which is the AXIS), the XPath for the product data field should be left blank, while the XPath for the category data field should be the part after the AXIS.

mceclip6.png

The sample data will look like this:

mceclip7.png
Did this answer your question?