Using SSIS Regex Parser Task for Extracting HTML Content


In this post you will learn how to use FREE SSIS Regex Parser Task along with REST API Task to extract HTML content in few clicks.


Assume that you want to search certain keywords from Bing or google and want to know how many pages found for that keyword. Url for search would be something like where regex is our search word.

When page is returned view source code of that page and you will find tag like below.

What we want is number 21,00,000 using Regular expression pattern search.

Step-By-Step : Extract HTML Tag value using Regex Expression

  1. Download and Install SSIS PowerPack (It includes FREE SSIS Regex Parser Task )
  2. Create new SSIS Package
  3. Drag ZS REST API Task on Control flow designer from SSIS Toolbox
  4. Double click to configure the task. Enter URL you like to fetch e.g.
  5. Click on Response Tab and check Save response option. Select Save to Variable. If needed create new variable.
  6. Click Test (Scroll at the bottom to see html content)
  7. Now drag ZS Regex Parser Task and connect with REST API Task
  8. Select Variable which will hold HTML text you like to parse.
  9. Enter following expression and map target to some Variable if you like to save extracted value. Below expression ends with {{0,1}} which means extract first match and 2nd group of that match (0 based Index). 2nd group of match will hold actual count of search result. If you omit {{x,y}} at the end then {{0,0}} is used.
    See below screenshot
    SSIS Regex Parser Task - Extract HTML Tag Value using Regular Expression

    SSIS Regex Parser Task – Extract HTML Tag Value using Regular Expression

  10. In the above step you can select Variable as Input or use placeholder in Direct string (e.g  {{Use::varHtml}} )
  11.  You can also connect ZS Logging task to show extracted value

Here is final flow.

SSIS Regular expression parsing example

SSIS Regular expression parsing example


Posted in SSIS Regex Parser Task and tagged , , , , .