SSIS Secure FTP XML Source
PreviousNext

SSIS Secure FTP XML Source can be used to parse / extract XML files stored in Secure FTP storage. You can also read compressed (gzip) files without extracting on local disk.

Download SSIS PowerPack

Content

Video Tutorial

Coming soon...

Step-By-Step

In this tutorial you will learn how to extract data from XML file (In this case its from Secure FTP Server).
  1. Firstly, You need to Download and Install SSIS ZappySys PowerPack.
  2. Download and Install Local FTP/SFTP Rebex Tiny Sftp Server here.
  3. Once you finished above step, Open Visual Studio and Create New SSIS Package Project.
  4. Now, Drag and Drop SSIS Data Flow Task from SSIS Toolbox.
    SSIS Data Flow Task - Drag and Drop
  5. Double click on the Data Flow task to see Data Flow designer surface.
  6. Here, In Visual Studio, drag and drop the ZS Secure FTP XML Source (Web API or File) in the design panel.
    SSIS Secure FTP XML File Source - Drag and Drop
  7. In order to connect Secure SFTP server in SSIS, you will need obtained FTP/SFTP Credential.

How to create SFTP Server.

  1. Right click on Connection Managers Panel to Create New SFTP Connection for Server, and Context Menu will appear, Select New Connection from the Context Menu.
    SFTP Task Operations - Connection
  2. Select ZS-SFTP Connection Manager from the Connection Managers list and Click on Add Button.
    SFTP Task Operations - Connection
  3. Now in Connection Manager UI, Select SFTP - SSH File Transfer Protocol and Enter your Host name and Select Logon Type Username and Password and Enter it.
    SFTP Task Operations - Connection
  4. Click on Test Connection. If test successful then hit OK to save the connection manager configure setting UI.

How to extract data from FTP server XML File using Secure FTP XML Source.

  1. Double click on ZS Secure FTP XML File Source configure it.
  2. Select FTP Connection, check on Indent to get pretty print. Select file and enter array filter.
    SSIS Secure FTP XML File Source - Configure
  3. Click on Preview button to see Data Preview.
  4. In the Advanced Tab you can more settings if you want.
    SSIS Secure FTP XML File Source - Advance Tab
  5. In the Date/Time Handling tab, convert Date and Time.
    SSIS Secure FTP XML File Source - Date and Time
  6. Click on OK button to save Secure FTP XML Source configure setting UI.
  7. Now, Just Drag and Drop Our Free ZS Trash Destination from SSIS Toolbox.
    SSIS Trash Destination - Drag and Drop
  8. Now single click on the Secure Ftp Xml File Source, once you see blue arrow from source ... connect it to Trash Destination.
  9. Lets Double click on ZS Trash Destination to Configure it.
    SSIS Trash Destination - Drag and Drop
  10. Thats all, Just Run or Execute Package.
    SSIS Secure FTP XML File Source - Execute

Properties

Property Name Description
LoggingMode LoggingMode determines how much information is logged during Package Execution. Set Logging mode to Debugging for maximum log.

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
Normal [0] Normal
Medium [1] Medium
Detailed [2] Detailed
Debugging [3] Debugging
PrefixTimestamp When you enable this property it will prefix timestamp before Log messages.
TreatBlankNumberAsNull Treat empty string as NULL for any numeric data types
TreatBlankBoolAsNull Treat empty string as NULL for bool data types
TreatBlankDateAsNull Treat empty string as NULL for any date/time data types
Encoding Encoding of source file

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
Default [0] Default
ASCII [1] ASCII
UTF8 [2] UTF-8
UTF16 [3] UTF-16 LE (i.e. Unicode Little Endian)
UTF32 [4] UTF-32
UTF8WithoutBOM [5] UTF-8 Without BOM
UTF32WithoutBOM [6] UTF-32 Without BOM
UTF7 [7] UTF-7
UTF7WithoutBOM [8] UTF-7 Without BOM
UTF16WithoutBOM [9] UTF-16 Without BOM
BigEndian [10] UTF-16 BE (i.e. Unicode Big Endian)
BigEndianWithoutBOM [11] UTF-16 BE Without BOM
CharacterSet Character set for text (e.g. windows-1250 )
Culture Culture code (e.g. pt-BT). This helps to parse culture specific number formats (e.g. In some culture you may have comma rather than decimal points 0.1 can be 0,1)
MaxRows Maximum XML records to fetch. Set this value to 0 for all records
EnableCustomReplace Enables custom search / replace in the document text after its read from the file/url or direct string. This replace operation happens before its parsed. This option can be useful for custom escape sequence in source document which is causing issue in the parser. You can replace such unwanted characters fore parser starts parsing the text.
SearchFor String you like to search for (Only valid when EnableCustomReplace option is turned on). If you want to enable Regular Expression pattern search then add --regex or --regex-ic (for case-insensitive search)  at the end of your search string (e.g. ORDER-\d+--regex OR ORDER-\d+--regex-ic (case-insensitive search) )
ReplaceWith String you like to replace with (Only valid when EnableCustomReplace option is turned on). If you added --regex or --regex-ic at the end of your SearchFor string then ReplaceWith can use special placeholders (i.e. $1, $2...) based on regular expression groups. For example you SearchFor=(\w+)(@\w+.com) to search for emails then to mask emails you can something like this for ReplaceWith = ****$2 (where $2 is domain part and $1 is before @)
DirectPath XML file path (e.g. c:\data\myfile.xml) or pattern to process multiple files (e.g. c:\data\*.xml)
Recursive Include files from sub folders too.
EnableMultiPathMode Enable this option to treat DirectPath as list of paths / urls (separated by new line or double colon :: ). This option is very useful if you have many URLs / Paths with similar data structure and you want to return response from all URLs in one step (UNION all URLs with single dataset). Examples:  http://someurl1::http://someurl2 --OR-- c:\file1::c:\file2 --OR-- c:\file1::https://someurl
ContinueOnFileNotFoundError By default process stops with error if specified local file is not found. Set this property to true if you wish to continue rather than throwing file not found error.
Filter Enter expression here to filter data.(Example:  $.Users[*].UserName ) This will fetch User names from users records
IncludeParentColumns Use this option to include parent properties (Non array) in the output along with Filtered Rows
IncludeParentColumnsWhenChildMissing By default child and parent information is not included in the output if children not found for specified expression. For example if you want to extract all orders from all customers nodes then you can type $.Customers[*].Orders[*]. This will fetch all orders from all customers. By default customers records without orders wont be included in the output. If you want to include those customers where orders not found then check this option (Output null information for order attributes). This behavior is similar to LEFT OUTER JOIN in SQL (Left side is parent, right side is child). This option is ***resource intensive*** so only check if you really care about this behavior.
IncludeParentColumnsWithArrayType Set this option to true if you want to output parent columns which are array. By default any parent column which is an array is not included in output. See also FilterForParentColumnsWithArrayType property if you set this property
FilterForParentColumnsWithArrayType Filter expression to extract value form parent
ParentColumnPrefix Prefix for parent column name. This option is only valid if you have set IncludeParentColumns=True
ThrowErrorIfPropertyMissing Throw error if property name specified in filter expression is missing. By default it will ignore any missing property errors.
MaxLevelsToScan This property how many nested levels should be scanned to fetch various properties. 0=Scan all child levels.
ExcludedProperties List comma separated property names from XML document which you want to exclude from output. Specify parent property name to exclude all child nodes.
LevelSeparator Property level separator used in generated property name (separator for outer properties - Above selected filter node). Use this if default separator is producing duplicate property name which is conflicting with existing name.
EnableArrayFlattening Enables deep array flattening for selected filtered hierarchy. When you turn on this property it will flatten each property of each array item and expose as column (e.g. If you have Filter set as $.customers[*] and for each customer you have an array of Addresses then you may see output columns like Addresses.1.City, Addresses.1.State, Addresses.2.City, Addresses.2.State .... Addresses.N.City, Addresses.N.State). You can control how many array items you want to flatten by setting MaxArrayItemsToFlatten property.
MaxArrayItemsToFlatten Maximum number of array items to flatten. inner array flattening. Adjust this property to control how many columns being generated. This option is ignored if you set EnableArrayFlattening=false
FileCompressionType Compression format for source file (e.g. gzip, zip)

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
None [0] None
GZip [1] GZip
Zip [2] Zip
TarGZip [3] TarGZip
ArrayTransformationType Array Transformation you want to apply. Useful for case when you have 2-Dimensional arrays with rows/columns in separate arrays.

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
None [0] None
TransformSimpleTwoDimensionalArray [1] Simple 2-dimensional array (e.g. {cols:[..], rows:[[..],[..]]} )
TransformComplexTwoDimensionalArray [2] Complex 2-dimensional array  (e.g. {cols:[{..},{..}], rows:[{f:[..]},{f:[..]}] )
TransformKeyValuePivot [3] Key/Value to Columns
TransformMultipleColumnsExpressions [4] Multiple columns using expressions
TransformColumnslessArray [5] Columnless array (e.g. [[..],[..]] )
TransformJsonLineArray [6] JSON Lines - Single Dimension Array(s) (i.e. [..][..] )
TransformPivotColumnlessArray [7] Pivot - Columnless array (e.g. [..] )
ArrayTransColumnNameFilter Filter expression to use to extract column names for array transformation.
ArrayTransRowValueFilter Filter expression for row values (Not applicable for simple array transformation).
ArrayTransEnableCustomColumns When you have 2D array but don't have column list specified in a separate array then use this option (e.g. { arr: [[10,11],[21,22]] } ). If you selected Column less array or JSON Lines option then this property means Column Names coming from First Line of array.
ArrayTransCustomColumns When you have 2D array but don't have column list specified in a separate array then use specify column names here. Use comma separated list (e.g col1,col2,col3 ). Column name Order must match value order.
EnableRawOutputModeSingleRow Enable Raw Document Output Mode with unstructured data processing option for any format (i.e. XML, Html, Text, Json). Unlike other option EnableRawOutputMode, this option doesn't invoke parser to extract documents by finding row terminator. It will source string as row value in single row/ single column. You can also define RawOutputDataTemplate along with this property (e.g. Template can be {data: [$1] } ). This will wrap response inside template string before sending to parser.
RawOutputDataRowTemplate When you enable EnableRawOutputModeSingleRow you can use this property. Template must be in JSON format (e.g. { data: [$1] } ). [$1] means content extracted using first expression or no expression (i.e. raw data). If RawOutputFilterExpr contains multiple expressions (separated by || ) then you can use multiple placeholders (i.e. [$1], [$2]...[$N]). RawOutputFilterExpr can have JsonPath, XmlPath, RegEx (set RawOutputExtractMode).
DateFormatString Specifies how custom date formatted strings are parsed when reading JSON.
DateParseHandling Specifies how date formatted strings, e.g. Date(1198908717056) and 2012-03-21T05:40Z, are parsed when reading JSON.

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
None [0] Keep date as string
DateTime [1] Convert to DateTime (Timezone lost)
DateTimeOffset [2] Convert to DateTimeOffset (Preserve Time zone)
FloatParseHandling Specifies how decimal values are parsed when reading JSON. Change this setting to Decimal if you like to have large precision / scale.

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
Double [0] Default (Double [~15-17 digits])
Decimal [1] Decimal (High Precision / Scale [~28-29 digits] )
IndentOutput Indent JSON output so its easy to read.
OutputRawDocument Output as raw JSON document rather than parsing individual fields. This option is helpful if you have documents stored in a file and you want to pass them downstream as raw JSON string rather than parsing into columns.
ConvertFormat Output convert raw XML document to JSON (Recommended). This option is ignored if OutputRawDocument=false. Once you do that any further parsing downstream must use JSON Parser rather than XML Parser.
OnErrorOutputResponseBody When you redirect error to error output by default you get additional information in ErrorMessage column. Check this option if you need exact Response Body (Useful if its in JSON/XML format which needs to be parsed for additional information for later step).
ElementsToTreatAsArray Comma separated element names which you want to treat as Array regardless how many times element repeats at the same level. By default only those elements are treated as array if element appears more than once at the same level.
EnablePerformanceMode Enables memory optimized mode. You may lose certain functionality when you turn on this. Only turn on this feature if you getting out of memory error.
OutputFilePath Set this option to true if you want to output FilePath. This option is ignored when you consume DirectValue or data from Url rather than local files. Output column name will be __FilePath
OutputFileName Set this option to true if you want to output FileName. This option is ignored when you consume DirectValue or data from Url rather than local files. Output column name will be __FileName
EnablePivot When this property is true then Column is converted to Row. Pivoted names will appear under  Pivot_Name column and values will appear under Pivot_Value field.
IncludePivotPath When this property is true then one extra column Pivot_Path appears in the output along with Pivot_Name and Pivot_Value. This option is really useful to see parent hierarchy for pivoted value.
EnablePivotPathSearchReplace Enables custom search/replace function on Pivot_Path before final value appears in the output. This option is only valid when IncludePivotPath=true.
PivotPathSearchFor Search string (static string or regex pattern) for search/replace operation on Pivot_Path. You can use --regex suffix to treat search string as Regular Expression (e.g. MyData-(\d+)--regex ). To invoke case in-sensitive regex search use --regex. This option is only valid when EnablePivotPathSearchReplace=true.
PivotPathReplaceWith Replacement string for search/replace operation on Pivot_Path. If you used --regex suffix in PivotPathSearchFor then you can use placeholders like $0, $1, $2... anywhere in this string (e.g. To remove first part of email id and just keep domain part you can do this way. Set PivotPathSearchFor=(\w+)@(\w+.com)--regex, and set current property i.e. PivotPathReplaceWith=***@$2 ). This option is only valid when EnablePivotPathSearchReplace=true.
MetaDataScanMode Metadata scan mode controls how data type and length is determined. By default few records scanned to determine datatype/length. Changing ScanMode affects length/datatype accuracy.

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
Auto [0] Auto
Strict [1] Strict - Exact length
Guess2x [2] Guess2x - 2 times bigger
Guess3x [3] Guess3x - 3 times bigger
Guess4x [4] Guess4x - 4 times bigger
TreatAsUnicodeString [5] Set all columns as string
Guess10x [6] Guess10x - 10 times bigger
TreatStringAsMaxLength [7] Set string columns with MAX Length - i.e. DT_WSTR(4000)
TreatStringAsBlob [8] Set string columns as BLOB - i.e. DT_NTEXT
MetaDataCustomLength Length for all string column. This option is only valid for MetaDataScanMode=Custom
MetaDataTreatStringAsAscii When this option is true, it detects all string values as DT_STR (Ascii) rather than DT_WSTR (Unicode)

Setting UI

SSIS Secure FTP XML File Source - Setting UI
SSIS Secure FTP XML File Source - Setting UI
SSIS Secure FTP XML File Source - Setting UI
SSIS Secure FTP XML File Source - Setting UI

See Also

Articles / Tutorials


Copyrights reserved. ZappySys LLC.