How to Extract Content from HTML

Design Studio has six step actions for extracting content from a tag in an HTML page:

Often you need to reformat (or normalize) the extracted content, and the Extract and Extract Tag Attribute actions allow you to do this by configuring a list of data converters.

There are also two actions to extract data from various binary data formats, e.g. PDF or Flash. These are different from the ones above in that they extract the data and produce a HTML page that contains the data in some structured form that lets your robot access the data. These actions are however used in an initial step before the actual data extraction, in which you may loop over the produced HTML and extract text from this.