All Collections
Using Octoparse
Select the correct HTML tag for web elements
Select the correct HTML tag for web elements
Updated over a week ago

A web page is an HTML document. An HTML tag is a piece of the markup language used to indicate the beginning and end of a web element in an HTML document.

To correctly select the HTML tag, let's have a look at the tags we usually encounter in a task. Knowing the meanings of the tags can help us understand which tag to select in different cases.

<a> </a>

defines hyperlink, it can open a new page by click

<p> </p>

defines paragraphs when organizing text content

<div> </div>

defines a block or knob to segment different areas of the page

<li> </li>

defines a list item

<img> </img>

defines image elements of the page

<table> </table>

defines HTML table element

<tr> </tr>

defines a row in an HTML table

<td> </td>

defines a standard data cell in an HTML table

<select><option></option></select>

defines a dropdown menu with options

When different tags are located, Octoparse will show different options on the Tips panel. At the bottom of the Tips panel, you can see an HTML path and the last tag is the one located now.

If the current one located is not what you want, you can click on the other tags you want from the path.

If you cannot find the correct one on the current path, you can also click open the “>” to find more tags inside.

Here is an Expand the selection button that will help you expand the selected area. If you find your target area hard to be selected directly, you can select part of it first, and keep clicking on this Expand the selection button until the target area is selected.

Let's take some elements for example:


1. Image extraction

If you want to scrape an image URL, you will need to locate the img tag as this tag will include the image URL in it.

Click on the image and you will see that the IMG tag is the last one which means you are locating the correct tag.


2. Link Extraction

To get the link of an element, you need to make sure to locate the element that contains the URL. Usually the A tag contains the URL you want.

Only when you click on the A tag, the option Link will show.

Did this answer your question?