Although somewhat rare, in some cases we need to get data outside an existing loop item.
Let's say we want to extract data from the Amazon Best Sellers page. For each product, we need to get its product details and the category it belongs to at the same time, as shown in the picture below:
If we create a loop just for the products, apparently "category" data will be outside the "product" loop. You might try to resolve the issue by creating another loop to get the category data. Try as you might, it will not end well... because Octoparse will yell at you for overlapping two loops. But if the new loop is completely independent of the existing loop, we will fail to establish data between the two loops.
It seems that we are stuck in a dilemma. What can we do? The answer is actually quite simple:
Use the XPath for the product loop as an AXIS and write relative XPath to locate the category data
In case you are still confused, allow me to explain to you step by step with the sample website: https://www.amazon.com/gp/bestsellers/?ref_=nav_em_cs_bestsellers_0_1_1_2
Create a loop for all the products
Check the XPath for the products in the HTML source code:
The XPath for the products will be: //li[@class="a-carousel-card"]
Using this as an axis, we can get the category data based on it :
//li[@class="a-carousel-card"]/ancestor::div[@class="a-row a-carousel-controls a-carousel-row a-carousel-has-buttons"]/preceding-sibling::div[@class="a-row a-carousel-header-row a-size-large"]//h2
As we have set the product XPath as the matching XPath for the loop item (which is the AXIS), the XPath for the product data field should be left blank, while the XPath for the category data field should be the part after the AXIS.
The sample data will look like this: