SSIS XML Source (File/SOAP/REST)
PreviousNext

SSIS XML Source can be used to read data from XML file(s) or XML formatted response from SOAP Web service or REST API Web Service. It supports advanced filtering along with flexible approach to configure request parameters for web service.

Download SSIS PowerPack

Content

Video Tutorial

Step-By-Step

In this section you will learn how to use XML Source Connector to extract data from XML file (In this case its Web URL).
  1. Firstly, You need to Download and Install SSIS ZappySys PowerPack.
  2. Once you finished first step, Open Visual Studio and Create New SSIS Package Project.
  3. Now, Drag and Drop SSIS Data Flow Task< from SSIS Toolbox.
    SSIS Data Flow Task - Drag and Drop
  4. Double click on the DataFlow task to see DataFlow designer surface.
  5. From the SSIS toolbox drag and drop XML Source on the dataflow designer surface.
    XML Source - Drag and Drop

How to use Direct File Path for Read data from XML files.

  1. Double click XML Source to configure it.
  2. You can use select single file or multiple file using wildcard pattern in path.
    Note: If you want to operation with multiple files then use wild card pattern as below (when you use wild card pattern in source path then system will treat target path as folder regardless you end with slash )
    
    c:\source\store_*.xml (all files starting with file name)
    c:\source\subfolder\*.xml (all files with .xml Extension and located under folder subfolder)
    
    SSIS XML Source - Read data from XML files (Single or Multiple files) - Use wildcard pattern in path
  3. Now, Just Drag and Drop Our Free ZS Trash Destination from SSIS Toolbox.
  4. Now single click on the XML Source, once you see blue arrow from source ... connect it to Trash Destination.
  5. Thats all, Just click on OK button for save settings and Run or Execute it.
    SSIS XML Source - Read data from XML files (Single or Multiple files) - Use wildcard pattern in path

How to extract data from XML file (In this case its Web URL).

  1. Double click on ZS XML Source (Web API or File) for configure it.
  2. From the Access Mode dropdown select [File path or web Url] and paste the following Url for this example.
    https://www.w3schools.com/xml/plant_catalog.xml
    Now enter Path expression in Path textbox to extract only specific part of XML file as below ($.CATALOG.PLANT[*] will get content of address_component array attribute from XML document. address_component attribute is treated as array because appear more than once under same parent node, so we have to use [*] to indicate we want all records of that array)
    $.CATALOG.PLANT[*]
    XML Source - Configure
  3. Click on Preview button to see Data Preview.
  4. Now one issue with XML parsing is how to determine which element(s) should be treated as an Array so expression engine can parse [*] type of expression correctly? For that you can type list of Elements names (comma separated) under Array Handling tab. In our case we want to treat type element as Array as below
    Extract Array from XML
  5. Click on OK button to save XML Source configure settings UI.
  6. How to pass credentials to service (Basic Authorization Header) Check this article for more information

How to extract Data from Direct Value or Example.

  1. Double click on ZS XML Source (Web API or File) for configure it.
  2. You can see three an Example in ZS XML Source (Web API or File) for Direct Values, Just select of them.
  3. Now, Select Array filter.
    $.store.book[*]
    Read XML data in SSIS
  4. Click on Preview Button to see Data Preview.
  5. Click on OK to Save Xml Source configure setting UI.

How to Create Dynamic URL and extract Data from Variable Path.

  1. Double click on ZS XML Source (Web API or File) for configure it.
  2. Create a variable and store file name or Path. We have store full path in below example.
  3. Select Variable which holds XML file path or web URL.
    SSIS XML Source - Variable Mode - Call Web API or Read from File
  4. Here, you can select/edit columns, edit datatype.
    SSIS XML Source - Configure Columns and DataType
  5. Now, Just Drag and Drop Our Free ZS Trash Destination from SSIS Toolbox.
    SSIS Trash Destination - Drag and Drop
  6. Now single click on the XML Source, once you see blue arrow from source ... connect it to Trash Destination.
  7. Lets Double click on ZS Trash Destination to Configure it.
    SSIS Trash Destination - Drag and Drop
  8. Execute the package and verify source data in data viewer.
    How to read-extract XML records from file in SSIS

Properties

Property Name Description
LoggingMode LoggingMode determines how much information is logged during Package Execution. Set Logging mode to Debugging for maximum log.

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
Normal [0] Normal
Medium [1] Medium
Detailed [2] Detailed
Debugging [3] Debugging
PrefixTimestamp When you enable this property it will prefix timestamp before Log messages.
TreatBlankNumberAsNull Treat empty string as NULL for any numeric data types
TreatBlankBoolAsNull Treat empty string as NULL for bool data types
TreatBlankDateAsNull Treat empty string as NULL for any date/time data types
Encoding Encoding of source file

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
Default [0] Default
ASCII [1] ASCII
UTF8 [2] UTF-8
UTF16 [3] UTF-16 LE (i.e. Unicode Little Endian)
UTF32 [4] UTF-32
UTF8WithoutBOM [5] UTF-8 Without BOM
UTF32WithoutBOM [6] UTF-32 Without BOM
UTF7 [7] UTF-7
UTF7WithoutBOM [8] UTF-7 Without BOM
UTF16WithoutBOM [9] UTF-16 Without BOM
BigEndian [10] UTF-16 BE (i.e. Unicode Big Endian)
BigEndianWithoutBOM [11] UTF-16 BE Without BOM
CharacterSet Character set for text (e.g. windows-1250 )
Culture Culture code (e.g. pt-BT). This helps to parse culture specific number formats (e.g. In some culture you may have comma rather than decimal points 0.1 can be 0,1)
MaxRows Maximum XML records to fetch. Set this value to 0 for all records
EnableCustomReplace Enables custom search / replace in the document text after its read from the file/url or direct string. This replace operation happens before its parsed. This option can be useful for custom escape sequence in source document which is causing issue in the parser. You can replace such unwanted characters fore parser starts parsing the text.
SearchFor String you like to search for (Only valid when EnableCustomReplace option is turned on). If you want to enable Regular Expression pattern search then add --regex or --regex-ic (for case-insensitive search)  at the end of your search string (e.g. ORDER-\d+--regex OR ORDER-\d+--regex-ic (case-insensitive search) )
ReplaceWith String you like to replace with (Only valid when EnableCustomReplace option is turned on). If you added --regex or --regex-ic at the end of your SearchFor string then ReplaceWith can use special placeholders (i.e. $1, $2...) based on regular expression groups. For example you SearchFor=(\w+)(@\w+.com) to search for emails then to mask emails you can something like this for ReplaceWith = ****$2 (where $2 is domain part and $1 is before @)
AccessMode Defines how to read the XML file or direct string

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
DirectValue [0] Direct value
ValueFromVariable [1] Direct value from variable
DirectPath [2] File path or web URL
PathFromVariable [3] File path or web URL from variable
DirectValue Defines how to read the XML file or direct string or command line output. If you like to read from command line then simply prefix any valid command line with cmd:>. Here is more information about streaming data from command line output. Syntax: cmd:>[exe / bat folder]<exe-name> [arguments]
First argument is exe name or full path for exe or bat file. Second part is arguments for command line program. You can use double quotes around exe / batch file path if it contains space.
For *.bat or *.cmd file make sure to add [ @echo off ] in the first line (without brackets) else command itself is added in output. To read more please see product help file

====================
Examples:
====================
cmd:>cmd /c dir *.dll /b
cmd:>aws iam list-users --output xml
cmd:>az vm list --output xml
cmd:>py c:\scripts\run-python.py
cmd:>powershell -executionpolicy bypass -File "c:\scrips\run.ps1"
cmd:>powershell -executionpolicy bypass -Command "[System.IO.StreamReader]::new((Invoke-WebRequest -URI https://zappysys.com/downloads/files/test/invoices.xml).RawContentStream).ReadToEnd()"
cmd:>c:\folder\my-batch-file.bat
cmd:>c:\folder\my-batch-file.bat option1 option2
cmd:>curl -k https://httpbin.org/get
cmd:>curl.exe -k https://httpbin.org/get
cmd:>c:\folder\curl.exe -k https://httpbin.org/get
cmd:>"c:\folder with space\curl.exe" -k https://httpbin.org/get
ValueVariable Variable name which holds XML string
PathVariable Variable name which holds data file path or url
DirectPath XML file path (e.g. c:\data\myfile.xml) or pattern to process multiple files (e.g. c:\data\*.xml)
Recursive Include files from sub folders too.
EnableMultiPathMode Enable this option to treat DirectPath as list of paths / urls (separated by new line or double colon :: ). This option is very useful if you have many URLs / Paths with similar data structure and you want to return response from all URLs in one step (UNION all URLs with single dataset). Examples:  http://someurl1::http://someurl2 --OR-- c:\file1::c:\file2 --OR-- c:\file1::https://someurl
ContinueOnFileNotFoundError By default process stops with error if specified local file is not found. Set this property to true if you wish to continue rather than throwing file not found error.
HttpHeaders Set this if you want to set custom Http headers. You may use variable anywhere in the header value using syntax {{User::YourVarName}}. Syntax of Header key value pair is : <request><header><name>x-myheader-1</name><value>AAA</value></header> <header><name>x-myheader-2</name><value>BBB</value></header></request>
HttpRequestData User defined data you wish to send along with your HTTP Request (e.g. Upload file data, Form POST data). Usually you have to set content-type of your data but if you select RequestMethod=POST then system will automatically set content-type=application/x-www-form-urlencoded.
HttpRequestMethod Http Web Request Method (e.g. POST, GET, PUT, LIST, DELETE...). Refer your API documentation if you are not sure which method you have to use.
HttpRequestContentType Specifies content type for data you wish to POST. If you select Default option then system default content type will be used (i.e. application/x-www-form-urlencoded). If you specify Content-Type header along with this option then header value takes precedence.

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
Default [0] Default
TextPlain [1] Text (text/plain)
ApplicationJson [2] JSON (application/json)
ApplicationXml [3] XML (application/xml)
TextXml [4] XML (text/xml)
TextXmlUtf8 [5] XML (text/xml;charset=UTF-8)
TextHtml [6] HTML (text/html)
ApplicationFormUrlencoded [7] Form (application/x-www-form-urlencoded)
ApplicationOctetStream [8] Binary (application/octet-stream)
Raw [9] Raw (No content-type)
MultiPartMixed [10] Multipart Mixed (multipart/mixed)
ApplicationGraphql [11] GraphQL (application/graphql)
IsMultiPartUpload Check this option if you want to upload file(s) (i.e. POST RAW file data) or send data using Multi-Part encoding method (i.e. Content-Type: multipart/form-data). Multi-Part request allows you to mix key/value and upload files in same request. On the other hand raw upload allows only single file upload (without any key/value)

==== Raw Upload (Content-Type: application/octet-stream) =====  
To upload single file in raw mode check this option and specify full file path starting with @ sign in the Body (e.g.  @c:\data\myfile.zip )

==== Form-Data / Multipart Upload (Content-Type: multipart/form-data) =====  
To treat your Request data as multi part fields you must specify key/value pairs separated by new lines into RequestData field (i.e. Body). Each key value pair is entered on new-line and key/value are separated using equal sign (=). Preceding and trailing spaces are ignored also blank lines are ignored.
If field value has some any special character(s) then use escape sequence (e.g. For NewLine: \r\n, For Tab: \t, For at (@): \@). When value of any field starts with at sign (@) its automatically treated as File you want to upload. By default file content type is determined based on extension however you can supply content type manually for any field using this way [ YourFileFieldName.Content-Type=some-content-type ]. By default File Upload Field always includes Content-Type in the request (non file fields do not have content-type by default unless you supply manually). For some reason if you dont want to use Content-Type header in your request then supply blank Content-Type to exclude this header altogather [e.g. SomeFieldName.Content-Type= ]. In below example we have supplied Content-Type for file2 and SomeField1, all other fields are using default content-type.
See below Example of uploading multiple files along with additional fields. If some API requires you to pass Content-Type: multipart/form-data rather than multipart/form-data then manually set Request Header => Content-Type: multipart/mixed (it must starts with multipart/ else will be ignored).

file1=@c:\data\Myfile1.txt
file2=@c:\data\Myfile2.xml
file2.Content-Type=application/xml
SomeField1=aaaaaaa
SomeField1.Content-Type=text/plain
SomeField2=12345
SomeFieldWithNewLineAndTab=This is line1\r\nThis is line2\r\nThis is \ttab \ttab \ttab
SomeFieldStartingWithAtSign=\@MyTwitterHandle
JsonFormat Data format coming from HTTP Response. This is useful for example when you have OData service and you want to automatically consume all pages of data using odata.netUrl. Setting JsonFormat=Odata will automatically do it for you. This setting is only applicable if XML is coming from HTTP WebRequest.

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
Notset [0] Notset
Json [1] JSON
Odata [2] OData (v3,v4)
UseProxy Enable custom proxy settings (If this is not set then system default proxy will be used. To disable proxy totally uncheck this option and check DoNotUseDefaultProxy option if available)
ProxyUrl Web URL of Proxy server (including port  if necessary). [e.g. http://myproxyserver:8080/]
UseProxyCreds Enable passing userid and password to proxy server
ProxyUserName Proxy username
ProxyPassword Proxy password
NextUrlAttribute If Service response support pagination using some sort of next url attribute then specify which attribute name in XML Response string which holds next url. If no attribute found or its null then component will stop fetching next resultset. Example: $.pagingInfo.nextUrl
PrevUrlAttribute If Service response support pagination using some sort of prev/next url attribute then specify which previous link attribute name from XML Response string which holds previous url.
NextUrlStopIndicator Specifies value for NextUrlAttribute or StopIndicatorAttribute which indicates last page to stop pagination. If you have specified StopIndicatorAttribute then you can use Regular expression rather than static value to indicate last page. To use regular expression value of this property must start with regex= prefix. Example : regex=FALSE|N or use regex= to detect blank value or missing value (assuming you set StopIndicatorAttribute to something like $.hasMore)
StopIndicatorAttribute Attribute name or expression for attribute which can be used as stop indicator (e.g. $.hasMore --OR-- $.pagination.hasMore --OR-- $.data[0].hasMore). If this value is blank then NextUrlAttribute is used
NextUrlSuffix If you want to include certain text (or parameters) at the end of Next url then specify this attribute (e.g. &format=xml). Another common use case of this property is to supply pagination token to next Page URL. You can also use <%nextlink%> or  <%nextlink_encoded%> placeholder (e.g. &cursor=<%nextlink_encoded%> )
NextUrlWait This property indicates total number of milliseconds you want to wait before sending next request. This option allows you to adjust how many API calls can be made within certain timeframe. If your API Service has no limit then set this option to zero
ContinueOnUrlNotFoundError If this option is true then component will continue without exception on 404 error (Url not found). This allows you to consume data gracefully.
ContineOnAnyError Continue when any type of exception occurs during http request
ContineOnErrorForMessage Continue on error when specified substring found in response
ContineOnErrorForStatusCode Continue on error when specified status code returned from web server
ConsumeResponseOnError When error occurs no data is returned. Use this option to get content eventhough error occurs. When this option is checked you can't use [continue on error when specific string found in response] option
ErrorStatusCodeToMatch Status code to match when error occurs and ContineOnErrorForStatusCode option is true. If Response status code matches to this code then task continues to run
ErrorStatusCodeToMatchRegex Status code(s) to match - separated by vertical bar (e.g. 404|405). When error occurs and ContineOnErrorForStatusCode option is true then if StatusCode matches to this code(s) then task continues to run
ErrorSubstringToMatch Error substring to match when error occurs and ContineOnErrorForMessage option is true. If Response status code matches to this code then task continues to run
CookieContainerVariable Cookie Container can be used to maintain state between multiple web requests. Example: You can login to site like wordpress and then extract any private page content by simply passing authentication cookies using this variable.
RequestTimeout Http request Timeout in seconds. Set this to 0 if you want to use system default value (i.e. 100 seconds)
SecurityProtocol Specifies which security protocol is supported for HTTPS communication. Using this option you can enable legacy protocol or enforce to use latest version of security protocol (Note: TLS 1.2 is only supported in SSIS 2014 or Higher).

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
Default [0] System Default
Ssl3 [1] SSL v3.0
Ssl3Plus [2] SSL v3.0 or higher
Tls [3] TLS v1.0
TlsPlus [4] TLS v1.0 or higher
Tls11 [5] TLS v1.1
Tls11Plus [6] TLS v1.1 or higher
Tls12 [7] TLS v1.2
Tls12Plus [8] TLS v1.2 or higher
Tls13 [9] TLS v1.3
EnableCompressionSupport Enable support for gzip or deflate compression (for deflate you must turn on [Tls 1.0 Or Higher] Option on Advanced Settings - Security Protocol for HTTPS). When you check this option compressed response automatically de-compressed saving bandwidth. This option is only valid if web server supports compressed response stream. Check your API documentation for more information.
IgnoreCertificateErrors Ignore SSL certificate related errors. Example: if you getting SSL/TLS errors because of certificate expired or certificate is not from trusted authority or certificate is self-signed. By checking this option you will not get SSL/TLS error.
AllowUnsecureSuite Allow unsecure ciphers/suites and curves for SSLS/TLS communication. Use this option to communicate with web servers which needs legacy / unsecured ciphers support. This option is only trigged when you change default SSL/TLS Version on advanced settings tab.
UseConnection Use connection to pass credentials for authentication (e.g. Use UserID/Password or Use OAuth Protocol for token based approach)
PagingMode Specified how you want to loop through multiple pages returned by REST API.

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
ByResponseAttribute [0] Response Attribute Mode - Read next page information from response
ByUrlParameter [1] Url Parameter Mode - Page number / offset passed as URL parameter (starts at 1 OR custom value in URL)
ByUrlPath [2] Url Path Mode - Page number / offset passed as URL path (starts at 0 OR custom value)
ByPostData [3] POST data Mode - Page number is passed inside POST data
ByUrlParameterMulti [4] Url Parameter Mode (Multi) - Pass Start and End Row Number in URL
ByResponseHeaderRfc5988 [5] Response Header contains Next Link - RFC 5988 (Next URL Link found in Standard Header)
ByResponseHeaderCustom [6] Response Header contains Next Link - Custom (Next URL Link found in Custom Header)
ByResponseHeaderContinuationToken [7] Response Header contains Continuation Token
EnablePageTokenForBody If you wish to pass extracted pagination token or current page number in the body of next request then set this option to true. You can use [$pagetoken$] and [$pagenumber$] placeholders anywhere in the Body where you wish to insert extracted Page token. If you must set encoded token then you can use <%nextlink_encoded%> inside SuffixForNextUrl Property. If you dont use SuffixForNextUrl then raw nextLink or Token will be inserted inside the body. If you dont specify [$pagetoken$] placeholder in the body then NextPage Token will be appended at the end. NextPage Token is extracted by filter expression specified using property NextUrlSuffix
HasDifferentNextPageInfo Set this to true if you wish to specify different URL, Header, Body or Filter for first page and next page (i.e. Paginated response). Some APIs like Amazon MWS, NetSuite, Zuora) you may need to set this to true.
PagePlaceholders When HasDifferentNextPageInfo=true you can set this property to indicate first page and next page. You can specify different URL, Header, Body or Filter for first page and next page (i.e. Paginated response). Use [$tag$] as placeholder anywhere in the URL, Header, Body or Filter and at runtime system will replace it with correct value (first page or next page value). Syntax to specify placeholder for first page vs next page is like a connectionstring url=FirstPageValue|NextPageValue;header=FirstPageValue|NextPageValue;body=FirstPageValue|NextPageValue;filter=FirstPageValue|NextPageValue;method=FirstPageValue|NextPageValue; You can use one or more key/value pairs to make things dynamic (e.g. url, header, body, filter or method)  .For example if you have pagination in your API and First URL is http://abc.com/api/items/get and to get more records you have to call http://abc.com/api/items/getNext then you can use [$tag$] as placeholder in the URL http://abc.com/api/items/[$tag$] and specify this property with first page tag and next page tag as url=get|getNext  (Tags are separated using vertical bar).
FirstPageBodyPart Use this property to set request body fragment for first page. HasDifferentNextPageInfo must be set to true to use this property.
NextPageBodyPart Use this property to set request body fragment for any request after first request. HasDifferentNextPageInfo must be set to true to use this property.
PagingMaxPagesExpr Expression to extract Maximum pages to loop through. Some APIs don't stop pagination and keep returning last page data when you try to read data after last page. Specify expression (e.g. $.page_count ) to read total pages to loop through using this property.
PagingMaxRowsExpr Expression to extract Maximum records to loop through. Some APIs don't stop pagination and keep returning last page data when you try to read data after last page. Specify expression (e.g. $.total_rows ) to read total pages to loop through using this property. This setting is ignored if you set PagingMaxPagesExpr.
PagingMaxRowsDataPathExpr When you enable PagingMaxRowsExpr (end pagination based on MaxRowCount) then you need to count records coming in each response. This expression extract all rows found under specified expression (e.g. $.orders[*] if all records found under orders node).
PageNumberAttributeNameInUrl e.g. Type page_num if URL looks like this => http://abc.com/?page_num=1&sort=true  (page number via query string)
--or-- Type <%page%> if page number is inside URL path like this => http://abc.com/1/?sort=true  (e.g. replace page number in url with placeholder http://abc.com/<%page%>/?sort=true)
Page number will be incremented by one for next URL until last page is reached or [Max Page Number] is reached. This parameter also controls pagination mode [ByResponseHeaderContinuationToken]. When this mode is used you can enter RESPONSE_HEADER_NAME --OR-- NEXT_QUERY_PARAM=RESPONSE_HEADER_NAME --OR-- NEXT_QUERY_PARAM=RESPONSE_HEADER_NAME(regular_expression). If NEXT_QUERY_PARAM (left side) is omitted then Response Header value is sent to next request in the same Header name. If you like to pass response header value in the next URL then use two parts (e.g. cursor=X-CONTINUE-TOKEN) ... this example will read X-CONTINUE-TOKEN header from response and pass it to next request in the URL like http://myapi.com/?cursor=[value-from-previous-response]. You can also use advanced syntax using Regular expression to extract substring from response header value (e.g. cursor=X-CONTINUE-TOKEN(\d*)) will extract only numeric part from header value. Another example is cursor=X-CONTINUE-TOKEN(^((?!null\b).)*$) ... this will return value if its other than "null" (word). For more information about using regular expression check this link https://zappysys.com/links/?id=10124
MaxPageNumber Maximum page number until which auto increment is allowed. Type zero for no limit. Next URL contains next page number (increment by one) until last page is detected or [Max Page Number] limit is reached.
PagingEndRules Rules to end pagination. You can use XML markup to include multiple rules. Here is an example of XML with multiple rules. This will stop pagination if any of these rule matches (Status Code, Size, Error Message)  <ArrayOfPagingEndRule><PagingEndRule><Mode>DetectBasedOnResponseStatusCode</Mode><StatusCode>401</StatusCode></PagingEndRule><PagingEndRule><Mode>DetectBasedOnRecordCount</Mode></PagingEndRule><PagingEndRule><Mode>DetectBasedOnResponseSize</Mode><MinBytes>3</MinBytes><MaxBytes>200</MaxBytes></PagingEndRule><PagingEndRule><Mode>DetectBasedOnResponseErrorMessage</Mode><ErrString>key not found</ErrString></PagingEndRule></ArrayOfPagingEndRule>
StartPageNumberVariable Variable name which will hold starting page number. This is ignored if you using parameter name from query string to indicate page number.
PageNumberIncrement Page counter increment. By default next page is incremented by one if this value is zero. You can also enter negative number if you want to decrease page counter.
PagingEndStrategy Specified how you want detect last page.

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
DetectBasedOnResponseSize [0] Detect last page based on response size (in bytes)
DetectBasedOnResponseErrorMessage [1] Detect last page based on error message (sub string)
DetectBasedOnResponseStatusCode [2] Detect last page based on status code (numeric code)
DetectBasedOnRecordCount [3] Detect based on missing row (stop when no more records)
DetectBasedOnMultipleRules [4] Detect based on multiple rules (i.e. mix of status(es), size, error)
LastPageWhenConditionEqualsTo Condition result to compare to detect last page. Set this property to True if you want detect last page if condition is true else set this to False.
ResponseMinBytes Minimum bytes expected from response.
ResponseMaxBytes Maximum bytes from response.
ResponseErrorString Expected error message sub string from response.
ResponseStatusCode Expected status code from response when page number you trying to access not found.
Filter Enter expression here to filter data.(Example:  $.Users[*].UserName ) This will fetch User names from users records
IncludeParentColumns Use this option to include parent properties (Non array) in the output along with Filtered Rows
IncludeParentColumnsWhenChildMissing By default child and parent information is not included in the output if children not found for specified expression. For example if you want to extract all orders from all customers nodes then you can type $.Customers[*].Orders[*]. This will fetch all orders from all customers. By default customers records without orders wont be included in the output. If you want to include those customers where orders not found then check this option (Output null information for order attributes). This behavior is similar to LEFT OUTER JOIN in SQL (Left side is parent, right side is child). This option is ***resource intensive*** so only check if you really care about this behavior.
IncludeParentColumnsWithArrayType Set this option to true if you want to output parent columns which are array. By default any parent column which is an array is not included in output. See also FilterForParentColumnsWithArrayType property if you set this property
FilterForParentColumnsWithArrayType Filter expression to extract value form parent
ParentColumnPrefix Prefix for parent column name. This option is only valid if you have set IncludeParentColumns=True
ThrowErrorIfPropertyMissing Throw error if property name specified in filter expression is missing. By default it will ignore any missing property errors.
MaxLevelsToScan This property how many nested levels should be scanned to fetch various properties. 0=Scan all child levels.
ExcludedProperties List comma separated property names from XML document which you want to exclude from output. Specify parent property name to exclude all child nodes.
LevelSeparator Property level separator used in generated property name (separator for outer properties - Above selected filter node). Use this if default separator is producing duplicate property name which is conflicting with existing name.
EnableArrayFlattening Enables deep array flattening for selected filtered hierarchy. When you turn on this property it will flatten each property of each array item and expose as column (e.g. If you have Filter set as $.customers[*] and for each customer you have an array of Addresses then you may see output columns like Addresses.1.City, Addresses.1.State, Addresses.2.City, Addresses.2.State .... Addresses.N.City, Addresses.N.State). You can control how many array items you want to flatten by setting MaxArrayItemsToFlatten property.
MaxArrayItemsToFlatten Maximum number of array items to flatten. inner array flattening. Adjust this property to control how many columns being generated. This option is ignored if you set EnableArrayFlattening=false
FileCompressionType Compression format for source file (e.g. gzip, zip)

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
None [0] None
GZip [1] GZip
Zip [2] Zip
TarGZip [3] TarGZip
ArrayTransformationType Array Transformation you want to apply. Useful for case when you have 2-Dimensional arrays with rows/columns in separate arrays.

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
None [0] None
TransformSimpleTwoDimensionalArray [1] Simple 2-dimensional array (e.g. {cols:[..], rows:[[..],[..]]} )
TransformComplexTwoDimensionalArray [2] Complex 2-dimensional array  (e.g. {cols:[{..},{..}], rows:[{f:[..]},{f:[..]}] )
TransformKeyValuePivot [3] Key/Value to Columns
TransformMultipleColumnsExpressions [4] Multiple columns using expressions
TransformColumnslessArray [5] Columnless array (e.g. [[..],[..]] )
TransformJsonLineArray [6] JSON Lines - Single Dimension Array(s) (i.e. [..][..] )
TransformPivotColumnlessArray [7] Pivot - Columnless array (e.g. [..] )
ArrayTransColumnNameFilter Filter expression to use to extract column names for array transformation.
ArrayTransRowValueFilter Filter expression for row values (Not applicable for simple array transformation).
ArrayTransEnableCustomColumns When you have 2D array but don't have column list specified in a separate array then use this option (e.g. { arr: [[10,11],[21,22]] } ). If you selected Column less array or JSON Lines option then this property means Column Names coming from First Line of array.
ArrayTransCustomColumns When you have 2D array but don't have column list specified in a separate array then use specify column names here. Use comma separated list (e.g col1,col2,col3 ). Column name Order must match value order.
EnableRawOutputModeSingleRow Enable Raw Document Output Mode with unstructured data processing option for any format (i.e. XML, Html, Text, Json). Unlike other option EnableRawOutputMode, this option doesn't invoke parser to extract documents by finding row terminator. It will source string as row value in single row/ single column. You can also define RawOutputDataTemplate along with this property (e.g. Template can be {data: [$1] } ). This will wrap response inside template string before sending to parser.
RawOutputDataRowTemplate When you enable EnableRawOutputModeSingleRow you can use this property. Template must be in JSON format (e.g. { data: [$1] } ). [$1] means content extracted using first expression or no expression (i.e. raw data). If RawOutputFilterExpr contains multiple expressions (separated by || ) then you can use multiple placeholders (i.e. [$1], [$2]...[$N]). RawOutputFilterExpr can have JsonPath, XmlPath, RegEx (set RawOutputExtractMode).
DateFormatString Specifies how custom date formatted strings are parsed when reading JSON.
DateParseHandling Specifies how date formatted strings, e.g. Date(1198908717056) and 2012-03-21T05:40Z, are parsed when reading JSON.

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
None [0] Keep date as string
DateTime [1] Convert to DateTime (Timezone lost)
DateTimeOffset [2] Convert to DateTimeOffset (Preserve Time zone)
FloatParseHandling Specifies how decimal values are parsed when reading JSON. Change this setting to Decimal if you like to have large precision / scale.

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
Double [0] Default (Double [~15-17 digits])
Decimal [1] Decimal (High Precision / Scale [~28-29 digits] )
IndentOutput Indent JSON output so its easy to read.
OutputRawDocument Output as raw JSON document rather than parsing individual fields. This option is helpful if you have documents stored in a file and you want to pass them downstream as raw JSON string rather than parsing into columns.
ConvertFormat Output convert raw XML document to JSON (Recommended). This option is ignored if OutputRawDocument=false. Once you do that any further parsing downstream must use JSON Parser rather than XML Parser.
OnErrorOutputResponseBody When you redirect error to error output by default you get additional information in ErrorMessage column. Check this option if you need exact Response Body (Useful if its in JSON/XML format which needs to be parsed for additional information for later step).
ElementsToTreatAsArray Comma separated element names which you want to treat as Array regardless how many times element repeats at the same level. By default only those elements are treated as array if element appears more than once at the same level.
EnablePerformanceMode Enables memory optimized mode. You may lose certain functionality when you turn on this. Only turn on this feature if you getting out of memory error.
OutputFilePath Set this option to true if you want to output FilePath. This option is ignored when you consume DirectValue or data from Url rather than local files. Output column name will be __FilePath
OutputFileName Set this option to true if you want to output FileName. This option is ignored when you consume DirectValue or data from Url rather than local files. Output column name will be __FileName
EnableArchiveFile Set this option to true if you want to move processed file to archive folder.
ArchiveFolderPath Folder path where you want to move processed file.
OverwriteFileInArchiveFolder Folder path where you want to move processed file.
ArchiveFileNamingConvention File naming convention for archived file. By default it will use same name as original source file processed. But you can control naming format using {%name%} and {%ext%} placeholders. Examples: {%name%}_{%timestamp%}_processed{%ext%} or {%name%}{%ext%}.{{System::ContainerStartTime,yyyyMMdd_HHmmss_fff}}
EnablePivot When this property is true then Column is converted to Row. Pivoted names will appear under  Pivot_Name column and values will appear under Pivot_Value field.
IncludePivotPath When this property is true then one extra column Pivot_Path appears in the output along with Pivot_Name and Pivot_Value. This option is really useful to see parent hierarchy for pivoted value.
EnablePivotPathSearchReplace Enables custom search/replace function on Pivot_Path before final value appears in the output. This option is only valid when IncludePivotPath=true.
PivotPathSearchFor Search string (static string or regex pattern) for search/replace operation on Pivot_Path. You can use --regex suffix to treat search string as Regular Expression (e.g. MyData-(\d+)--regex ). To invoke case in-sensitive regex search use --regex. This option is only valid when EnablePivotPathSearchReplace=true.
PivotPathReplaceWith Replacement string for search/replace operation on Pivot_Path. If you used --regex suffix in PivotPathSearchFor then you can use placeholders like $0, $1, $2... anywhere in this string (e.g. To remove first part of email id and just keep domain part you can do this way. Set PivotPathSearchFor=(\w+)@(\w+.com)--regex, and set current property i.e. PivotPathReplaceWith=***@$2 ). This option is only valid when EnablePivotPathSearchReplace=true.
MetaDataScanMode Metadata scan mode controls how data type and length is determined. By default few records scanned to determine datatype/length. Changing ScanMode affects length/datatype accuracy.

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
Auto [0] Auto
Strict [1] Strict - Exact length
Guess2x [2] Guess2x - 2 times bigger
Guess3x [3] Guess3x - 3 times bigger
Guess4x [4] Guess4x - 4 times bigger
TreatAsUnicodeString [5] Set all columns as string
Guess10x [6] Guess10x - 10 times bigger
TreatStringAsMaxLength [7] Set string columns with MAX Length - i.e. DT_WSTR(4000)
TreatStringAsBlob [8] Set string columns as BLOB - i.e. DT_NTEXT
MetaDataCustomLength Length for all string column. This option is only valid for MetaDataScanMode=Custom
MetaDataTreatStringAsAscii When this option is true, it detects all string values as DT_STR (Ascii) rather than DT_WSTR (Unicode)

Remarks

This component supports XMLPath Filtering. Lets take following sample XML as example.

Things to remember for XMLPath expressions

Sample XML for examples

<?xml version="1.0"?>
<Root Ver="1">
  <DocInfo>
    <Author>Bob</Author>
    <Date>2015-01-01</Date>
    <Location>
      <City>Atlanta</City>
      <State>GA</State>
    </Location>
  </DocInfo>
  <Customer CustId="1">
    <Order OrderId="1000">
      <OrderDate>2005-02-01T10:00:09</OrderDate>
      <Item ProdId="101">
        <ProdName>Tea</ProdName>
      </Item>
      <Item ProdId="102">
        <ProdName>Coffee</ProdName>
      </Item>
    </Order>
    <Order OrderId="1001">
      <OrderDate>2005-04-01T21:59:20</OrderDate>
      <Item ProdId="101">
        <ProdName>Tea</ProdName>
      </Item>
      <Item>
        <ProdId>201</ProdId>
        <ProdName>Soda</ProdName>
      </Item>
    </Order>
  </Customer>
  <Customer CustId="2">
    <Order OrderId="2000">
      <OrderDate>2005-02-01T21:59:20</OrderDate>
      <Item ProdId="301">
        <ProdName>Apple</ProdName>
      </Item>
      <Item ProdId="302">
        <ProdName>Orange</ProdName>
      </Item>
    </Order>
    <Order OrderId="2001">
      <OrderDate>2005-01-01T21:59:20</OrderDate>
      <Item ProdId="201">
        <ProdName>Soda</ProdName>
      </Item>
      <Item ProdId="202">
        <ProdName>Milk</ProdName>
      </Item>
    </Order>
  </Customer>  
</Root>

Example of filter expression

Filter Output
$ --or-- <blank>
Get all records

SSIS XML Source Adpater - XMLPath filter

$.Root.Customer[*] Get all customers

SSIS XML Source Adpater - XMLPath filter

$.Root.Customer[*].Order[*] Get all orders for all customers

SSIS XML Source Adpater - XMLPath filter

$.Root.Customer[*].Order[*]

(Include Parent Columns=True)
Get all orders for all customers (Include Parent Columns option checked)

SSIS XML Source Adpater - XMLPath filter
$.Root.Customer[*].@CustId Get only Customer Id for each customer record
$.Root.Customer[*].Order[:1] Get first order for each customer
$.Root.Customer[*].Order[:3] Get top 3 orders for each customer
$.Root.Customer[*].Order[:1] Get last order of each customer
$.Root.Customer[*].Order[0,4] Get orders starting from first row to fifth row (Zero based index). If you want 5th row to 10th row then use [4,9]
$.Root.Customer[*].Order[2] Get 3rd order for each customer

Setting UI

SSIS XML Source - Setting UI
SSIS XML Source - Setting UI
SSIS XML Source - Setting UI
SSIS XML Source - Setting UI

See Also

Articles / Tutorials

Click here to see all articles for [SSIS XML Source (File / SOAP)] category
Pivot JSON and XML data using SSIS or ODBC Drivers

Pivot JSON and XML data using SSIS or ODBC Drivers

Introduction In our previous post we saw various ways to transform JSON arrays. However there will be a time when your JSON / XML file wont have Array and you need to Pivot JSON Data. Sample JSON data file Here is a sample JSON file which we like to parse into rows and columns. Notice […]


Create Excel File in SSIS (Read from JSON / XML)

Create Excel File in SSIS (Read from JSON / XML)

Introduction In this post, we will learn how to Create Excel File in SSIS from source like JSON / XML.  We will use SSIS PowerPack to connect and query a JSON or XML file. This article also covers creating Excel from JSON File. JSON stands for JavaScript Object Notation and it is an Open and Standard format to […]


How to read RSS feed in SSIS and ODBC (with pagination)

How to read RSS feed in SSIS and ODBC (with pagination)

Introduction Read RSS feed in SSIS can be challenging. RSS named first RDF Site Summary and later named Rich Site Summary and Really Simple Syndication allows customer applications to be updated with the news of a site. For example, Microsoft RSS feeds, Apple RSS feeds, Samsung RSS feeds, etc.  With RSS the information comes to you directly and you do […]


Read SAP S4 / HANA data in SSIS (OData REST API)

Read SAP S4 / HANA data in SSIS (OData REST API)

Introduction In out previous post we saw how to call REST API in SSIS. Now let’s learn how to read data from SAP S4 / HANA OData Service (i.e. S/4HANA). SAP HANA provides OData REST API interface to access data in your application using HTTP Protocol. We will use SSIS XML Source component to read SAP […]


Call Oracle UCM Web Service in SSIS (Read XML SOAP API)

Call Oracle UCM Web Service in SSIS (Read XML SOAP API)

Introduction In this post we will learn how to access data from Oracle UCM Web Service (Middle layer for WebLogic) and load into SQL Server or any other target. We will use SSIS XML Source to achieve this result.     About Oracle UCM Web Service If you are not sure what is SOAP Web […]


How to read data from QuickBooks Online in SSIS

How to read data from QuickBooks Online in SSIS

Introduction QuickBooks Online is a well-known Cloud-based Accounting Software. In this post, you will learn how to implement QuickBooks Online API Integration with SQL Server or any other RDBMS (e.g. Oracle, MySQL, Postgresql) using SSIS in few clicks. We will use SSIS XML Source Connector to Read data from QuickBooks Online and Load into SQL Server / other targets (Using OAuth Connection). We […]


How to read data from NetSuite in SSIS (SimpleTalk SOAP API)

How to read data from NetSuite in SSIS (SimpleTalk SOAP API)

Introduction In this post we will learn how to read data from NetSuite in SSIS. We will use ZappySys XML Source for SOAP API access.     What is NetSuite CRM? NetSuite is a CRM / ERP product. It gives you scalable cloud CRM / ERP solution targeted at high-growing, mid-sized businesses and large enterprises. […]


How to call SOAP / REST API using Dynamic Token in SSIS

How to call SOAP / REST API using Dynamic Token in SSIS

Introduction In this blog, we will learn how to call SOAP / REST API using Dynamic Token in SSIS (i.e. Two steps authentication approach – Fist Call Login API to get token and then call API). In our previous blog post, we saw how to call Web API using some industry standards approaches, such as […]


How to Convert XML into JSON using SSIS

How to Convert XML into JSON using SSIS

Introduction These days, JSON is more popular and it is replacing XML because it is faster, easier to use, it is shorter because it does not require tags and uses brackets instead. In this tutorial, we will learn how to convert XML into JSON using SSIS. So let’s get started. Requirements SSDT for business intelligence […]


Load SQL Server data to Workday using SSIS / SOAP API

Load SQL Server data to Workday using SSIS / SOAP API

Introduction In our previous article, we saw step-by-step approach to read data from workday using SSIS. In this article, we will focus on how to load SQL Server data to Workday (e.g. POST, Create, Update). We will use SSIS Web API Destination and the combination of other Transforms such as SSIS Template Transform and SSIS XML Generator […]



Copyrights reserved. ZappySys LLC.