| Property Name |
Description |
| LoggingMode |
LoggingMode determines how much information is logged during Package Execution. Set Logging mode to Debugging for maximum log.
Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).
| Option |
Description |
| Normal [0] |
Normal |
| Medium [1] |
Medium |
| Detailed [2] |
Detailed |
| Debugging [3] |
Debugging |
|
| PrefixTimestamp |
When you enable this property it will prefix timestamp before Log messages. |
| AccessMode |
Specifies where url(s)/path(s) stored. URLs or Paths can be provided as direct input, from ssis variable or file
Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).
| Option |
Description |
| Direct [0] |
Direct |
| Variable [1] |
Variable |
| Connection [2] |
Connection |
|
| DirectValue |
URL(s) or HTML File Path(s) from where you want to extract your HTML Tables. For multiple URLs or Paths enter them on new line. Only applicable when AccessMode is DirectValue. |
| VariableName |
SSIS variable name which holds URL(s) or html file paths(s) from where you want to extract data. Only applicable when AccessMode is Variable. |
| TableExtractMethod |
Table extract method
Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).
| Option |
Description |
| ByNumber [0] |
Table Number (e.g. 1,2,3...) |
| ByCss [1] |
CSS Class Name (e.g my-class) |
| ByXPath [2] |
XPath for <table> node |
| ByXPathCustom [3] |
XPath for <div> or custom node types |
|
| MaxGuessRows |
How many rows to scan in order to determine datatype for each column |
| TableNumber |
Table Number inside web page which you want to extract. Number starts with 1 |
| TableXPath |
XPath expression to detect table you want to parse (e.g. //table[@class='mydata']) |
| RowXPath |
XPath expression to detect row(s). Leave this blank if you are working with HTML tables based on the <table> tag. If you are working with <div>-based tables, use this to define a custom selector. Example (class contains): ./div[contains(@class,'row')] -OR- Example (exact match by id): ./div[@id='row x-small'] |
| CellXPath |
XPath expression to detect cell(s) within a row. Leave this blank if you are working with HTML tables based on the <table> tag. If you are working with <div>-based tables, use this to define a custom selector. Example (class contains): ./div[contains(@class,'cell')] -OR- Example (exact match by id): ./div[@id='cell-1'] |
| ExtraColumns |
Key/Value pairs of extra columns to include along with main table content. Enter multiple key/value pairs separated by new line (e.g. \\r\\n). Key is output column name, value is actual XPath expression to extract Text from any node. Syntax: {column_name_1}={xpath_expr1} {new-line} {column_name_2}={xpath_expr2} ...
Example (Output two extra columns): Title1=//h1[contains(@class,'main-title')] Title2=//h2[contains(@class,'sub-title')]
|
| TableClass |
CSS Class name used in table. This is helpful if you don't know exact number of table in HTML code but you know the CSS class name used for table. |
| HasHeaderRow |
True is table has header row else set to false. If header row is found then column name is extracted from header row. |
| HeaderRowNumber |
Header row number. Row number starting from 1 |
| SkipRowsTop |
Total rows to skip from top of the table before reading data. By default data starts after header row (if specified HasHeaderRow). |
| SkipRowsBottom |
Total rows to skip from bottom of the table. |
| OutputLinks |
Set this option to true if you want to extract hyper links from each cell. When this option is checked you will see new column with _Links suffix. If you have multiple links in a single table cell then links are separated by vertical bar. To extract only one link per column set MaxLinksPerColumn = 1 |
| OutputImages |
Set this option to true if you want to extract images from each cell. When this option is checked you will see new column with _Images suffix. If you have multiple images in a single table cell then images are separated by vertical bar. To extract only one image per column set MaxImagesPerColumn = 1 |
| MaxLinksPerColumn |
Maximum number of hyper links you want to extract for each cell. This option is ignored if OutputLinks is set to false. Set 0 to extract all links from cell. |
| MaxImagesPerColumn |
Maximum number of images you want to extract for each cell. This option is ignored if OutputImages is set to false. Set 0 to extract all images from cell. |
| TrimWhiteSpace |
Trim whitespaces from front and end of images you want to extract for each cell. This option is ignored if OutputImages is set to false. Set 0 to extract all images from cell. |
| MaxRows |
Specifies maximum number of data rows to output (Similar as TOP N in SQL query) |
| EnableGroupDetect |
Specifies whether you want to enable group detection. Sometime you have table with Grouping rows (using colspan=N) in that case enable this option so you can get output in __groupName column. |
| TreatInputAsHtmlString |
Treat input as raw HTML string rather than URL. |
| CharacterSet |
Character set for text (e.g. windows-1250 ) |
| DoNotFailWhenExprYieldsNoResult |
Do not throw error if table/row/cell not found for specified table number, css or various xpath expression(s) including expression defined for extra columns. When this option is set, system will try to process gracefully as much as it can. |