- 1 Introduction
- 2 Import XML File using SSIS XML Source
- 3 Importing data from XML Web Service or REST API
- 4 Parse XML string and split into columns using XML Parser Transform
- 5 Other Settings (XML Source, XML Parser Transform)
- 6 Conclusion
In this post you will learn how to import XML file into target such as SQL Server, CSV, Excel, Oracle. Converting or Importing XML requires flattening XML structure (i.e. hierarchy) sometimes we refer this process as de-normalization. Parsing XML can be tricky depending on how many levels you have and from which level you need data. If you want to export XML rather than Import then check this article.
This article assumes you have installed SSIS PowerPack and watched introduction videos of SSIS XML Source Connector and SSIS XML Parser Transform
Import XML File using SSIS XML Source
ZappySys SSIS XML Source is a very powerful component. It supports not only reading from file but also reading from Direct XML String or REST API / SOAP Web Service.
How ZappySys XML Source parse/output data
XML Source created by ZappySys is very different than what Microsoft provides out the box because ZappySys component outputs one flat view of nested XML document (de-normalized) compared to many outputs what Microsoft XML Source produces which is hard to manage and join based on relation with parent node.
With ZappySys XML component you don’t have to worry about joins because by default parent columns included along with hierarchy you extracted (e.g. If you extract $.Customer.Orders[*].OrderDetails[*] then you get all columns listed under <Order> and <Customer> nodes)
Importing single or multiple XML Files
To import single files specify full XML file path
To import multiple files using XML source use wildcard pattern
Using variable to make path dynamic
You can also use variable placeholders in your path to make it dynamic
Importing data from XML Web Service or REST API
XML Source supports consuming REST API data or calling XML SOAP Web services. This article doesn’t include this feature in depth. But refer to learn more about this feature.
Parse XML string and split into columns using XML Parser Transform
Other Settings (XML Source, XML Parser Transform)
All access modes support preview data like below. Change your settings such as Filter expression, Source etc and click preview.
Filtering data using expression
You can filter XML data by specifying simple expressions (e.g. $.store.book[*]) to navigate various parts of XML where you can extract data (See above screenshot).
Things to remember for filter expressions
- Expression must start with $ (means root from where parsing will start)
- Use dot (.) to separate Hierarchy and use brackets to indicate array 
- Use * to indicate all records
Sample XML for examples
Example of filter expression
|Get all records
|$.Root.Customer[*]||Get all customers
|$.Root.Customer[*].Order[*]||Get all orders for all customers
|$.Root.Customer[*].Order[*](Include Parent Columns=True)||Get all orders for all customers (Include Parent Columns option checked)
|$.Root.Customer[*].@CustId||Get only Customer Id for each customer record|
|$.Root.Customer[*].Order[:1]||Get first order for each customer|
|$.Root.Customer[*].Order[:3]||Get top 3 orders for each customer|
|$.Root.Customer[*].Order[:1]||Get last order of each customer|
|$.Root.Customer[*].Order[0,4]||Get orders starting from first row to fifth row (Zero based index). If you want 5th row to 10th row then use [4,9]|
|$.Root.Customer[*].Order||Get 3rd order for each customer|
Defining array elements
Any hierarchy you extracting with array indicator can be added to array element name list (On Array handling Tab). This helps when you only have one row for node specified as array.
For example if your filter expression is $.Customer.Order[*].OrderItem[*] in this case you indicated that you can have one or more Order nodes under Customer and you may have one or more OrderItem nodes under Order node. If this is the case then set array handling as below
Editing Column Metadata (Length and DataType)
XML Source and XML Parser both detects Datatype/Length by scanning sample data (Default 300 rows) but sometimes this is not accurate and it may produce invalid metadata. If that’s the case you can easily change it in Columns tab. Once you edit Length or DataType make sure you check Lock column (Last column in grid). Once Lock is applied metadata refresh wont affect your changes. You can also add new column manually if its not detected (Lets say your XML is very large and many columns not found in first 300 rows scanned). To add new column type in the last grid row or click new icon in the grid toolbar.
In this post you have seen how to use SSIS XML Source Connector and SSIS XML Parser Transform to extract data from nested XML documents without any coding.
Related to: SSIS Import XML Files | SSIS Parse XML data | SSIS Extract from XML files | Read from XML Files