Property Name |
Description |
LoggingMode |
LoggingMode determines how much information is logged during Package Execution. Set Logging mode to Debugging for maximum log.
Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).
Option |
Description |
Normal [0] |
Normal |
Medium [1] |
Medium |
Detailed [2] |
Detailed |
Debugging [3] |
Debugging |
|
PrefixTimestamp |
When you enable this property it will prefix timestamp before Log messages. |
AccessMode |
Specifies where url(s)/path(s) stored. URLs or Paths can be provided as direct input, from ssis variable or file
Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).
Option |
Description |
Direct [0] |
Direct |
Variable [1] |
Variable |
Connection [2] |
Connection |
|
DirectValue |
URL(s) or HTML File Path(s) from where you want to extract your HTML Tables. For multiple URLs or Paths enter them on new line. Only applicable when AccessMode is DirectValue. |
VariableName |
SSIS variable name which holds URL(s) or html file paths(s) from where you want to extract data. Only applicable when AccessMode is Variable. |
TableExtractMethod |
Table extract method
Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).
Option |
Description |
ByNumber [0] |
ByNumber |
ByCss [1] |
ByCss |
ByXPath [2] |
ByXPath |
|
MaxGuessRows |
How many rows to scan in order to determine datatype for each column |
TableNumber |
Table Number inside web page which you want to extract. Number starts with 1 |
TableXPath |
XPath expression to detect table you want to parse (e.g. //table[@class='mydata']) |
TableClass |
CSS Class name used in table. This is helpful if you don't know exact number of table in HTML code but you know the CSS class name used for table. |
HasHeaderRow |
True is table has header row else set to false. If header row is found then column name is extracted from header row. |
HeaderRowNumber |
Header row number. Row number starting from 1 |
SkipRowsTop |
Total rows to skip from top of the table before reading data. By default data starts after header row (if specified HasHeaderRow). |
SkipRowsBottom |
Total rows to skip from bottom of the table. |
OutputLinks |
Set this option to true if you want to extract hyper links from each cell. When this option is checked you will see new column with _Links suffix. If you have multiple links in a single table cell then links are separated by vertical bar. To extract only one link per column set MaxLinksPerColumn = 1 |
OutputImages |
Set this option to true if you want to extract images from each cell. When this option is checked you will see new column with _Images suffix. If you have multiple images in a single table cell then images are separated by vertical bar. To extract only one image per column set MaxImagesPerColumn = 1 |
MaxLinksPerColumn |
Maximum number of hyper links you want to extract for each cell. This option is ignored if OutputLinks is set to false. Set 0 to extract all links from cell. |
MaxImagesPerColumn |
Maximum number of images you want to extract for each cell. This option is ignored if OutputImages is set to false. Set 0 to extract all images from cell. |
TrimWhiteSpace |
Trim whitespaces from front and end of images you want to extract for each cell. This option is ignored if OutputImages is set to false. Set 0 to extract all images from cell. |
MaxRows |
Specifies maximum number of data rows to output (Similar as TOP N in SQL query) |
EnableGroupDetect |
Specifies whether you want to enable group detection. Sometime you have table with Grouping rows (using colspan=N) in that case enable this option so you can get output in __groupName column. |
TreatInputAsHtmlString |
Treat input as raw HTML string rather than URL. |
CharacterSet |
Character set for text (e.g. windows-1250 ) |