Tag Finders

A Tag Finder is used to find a tag on an HTML/XML page. Tag Finders are used in steps, where they define how to find the tag(s) to which the step should be applied. The list of Tag Finders of the current step is located in the "Finders" tab in the Step View. Steps that work on spreadsheet content use Range Finders rather than Tag Finders.

Understanding Tag Paths

In understanding how to use Tag Finders, the concept of a tag path is important. A tag path is a compact text representation of where some tag is located on a page. Consider this tag path:

html.body.div.a

This tag path refers to an <a>-tag inside a <div>-tag inside a <body>-tag inside an <html>-tag.

A tag path can match more than one tag on the same page. For example, the above tag path will match all of the <a>-tags on this page, except the third one:

<html>
  <body>
    <div>
      <a href="url...">Link 1</a>
      <a href="url...">Link 2</a>
    </div>
    <p>
      <a href="url...">Link 3</a>
    </p>
    <div>
      <a href="url...">Link 4</a>
      <a href="url...">Link 5</a>
      <a href="url...">Link 6</a>
    </div>
  </body>
</html>

You can use indexes to refer to specific tags among tags of the same type at that level. Consider this tag path:

html.body.div[1].a[0]

This tag path refers to the first <a>-tag in the second <div>-tag in a <body>-tag inside an <html>-tag. So, on the page above, this tag path would only match the "Link 4" <a>-tag. Note that indexes start from 0. If no index is specified for a given tag on a tag path, the path matches any tag of that type at that level, as we saw in the first tag path above. If the index is negative, the matching tags are counted backwards, i.e. starting with the last matching tag which corresponds to index -1. Consider this tag path:

html.body.div[-1].a[-2]

This tag path refers to the second-to-last <a>-tag in the last <div>-tag in a <body>-tag inside an <html>-tag. So, on the page above, this tag path would only match the "Link 5" <a>-tag.

You can use an asterisk ('*') to mean any number of tags of any type. For example, the tag path

html.*.table.*.a

refers to an <a>-tag located anywhere inside a <table>-tag, which itself can be located anywhere inside an <html> tag. There is an implicit asterisk in front of any tag path, so you can simply write "table" instead of "*.table" to refer to any table tag on the page. The only exception is tag paths starting with a punctuation mark ('.'), which means that there is no implicit asterisk in front of the tag path, so the tag path must match from the first (i.e. top-level) tag of the page.

With asterisks, you can create tag paths that are more robust against changes in the page, since you can leave out insignificant tags that are liable to change over time, such as layout related tags. However, using asterisks also increases the risk of accidentally locating the wrong tag.

You can provide a list of possible tags by separating them with '|', as in this tag path:

html.*.p|div|td.a

This tag path refers to an <a> tag inside a <p>-, <div>-, or <td>-tag located anywhere inside an <html> tag.

In a tag path, text on a page is referred to just as any other tag, using the keyword "text". Although text is not technically a tag, it is treated and viewed as such in a tag path. For example, consider this HTML:

<html>
  <body>
    <a href="url...">Link 1</a>
    <a href="url...">Link 2</a>
  </body>
</html>

The tag path "html.body.a[1].text" would refer to the text "Link 2".

Tag Finder Properties

A Tag Finder can be configured using the following properties.

Find Where:
In this property, you can specify where to find the tag relative to a named tag. The default value is "Anywhere in Page", meaning that named tags are not used to find the tag.
Tag Path
In this property, you can specify the tag path as described in the previous section. The tag path can be specified in several ways using a Value Selector.
Attribute Name
In this property, you can specify that the tag must have a specific attribute, for example "align".
Attribute Value
In this property, you can specify that the tag must have an attribute with a specific value. If the Attribute Name property is set, the attribute value is bound to that specific attribute name.
Tag Pattern
In this property, you can specify a pattern that the tag must match (including all tags inside it), for example ".*.*Stock Quotes.*.*". Some caution should be observed in using this property, since it can have considerable impact on the performance of you robot. This is because the "Tag Pattern" may be applied many times throughout a page just to find the one tag that it matches. One way to try and avoid this is to choose "Text Only" for the "Match Against" property.
Match Against
In this property, you can specify that the "Tag Pattern" should match only the text or the entire HTML of the tag. The default is to match only the text because this is normally much faster.
Tag Depth
This property determines which tag to use if matching tags are contained inside each other. The default value is "Any Depth" which accepts all matching tags. If you select "Outermost Tag", only the outermost tags are accepted, and similarly, if you select "Innermost Tag", only the innermost tags are accepted.
Tag Number
This property determines which tag to use if more than one tag matches the tag path and the other criteria. You specify the number of the tag to use, either counting forwards from the first tag or counting backwards from the last tag that matches.

Examples

As an example, if you set the Tag Path property to "table", the Attribute Name property to "align", the Attribute Value property to Fixed Text where the text must be "center", and the Tag Pattern property to ".*Business News.*", then the Tag Finder would locate the first <table>-tag that is center aligned and that contains the text "Business News".