Reading JSON Arrays from file / REST API

Introduction

REST API is becoming more and more common and with that you will see explosion in use of JSON data format. One of the questions we get a lot is “How to extract or read array from JSON data file” or “How to read multiple arrays from JSON data“. Example : Store JSON document has Customers[…] and Employees[…] arrays and lets say you want to extract both and save to target SQL server database. In this post you will learn how to make this possible and how to make it fast with some performance tips. We will use SSIS JSON Source (File, REST API Connector)

If you want to learn how to parse Multi-Dimensional JSON arrays then check this article.

Video Tutorial – Read from multiple arrays from JSON document

Read array from JSON document

To extract subdocument or array content from JSON document you need to use JSON Path expression. Lets check following example JSON document.

Below is example filter expression to extract common elements

Read single array from JSON data (JSON Path expression)

SSIS PowerPack comes with powerful REST API Connectors for JSON and XML . In below example screenshot you will see how to extract single array section from your JSON File. But if you have multiple array sections (e.g. books and employees from above example) then you have to use Multiple JSON Source Connectors for same JSON File.

Read Json File data from Web Url Example in SSIS

Read multiple arrays from JSON data

If you have scenario to extract multiple arrays from same JSON (e.g. extract Employees and Books from store ) then you have two options. You can pick any approach which suites your need (Method-1 is recommended).

  1. Method-1: Use single JSON Source with [Output As Document] option and feed that to JSON Parser Transform
  2. Method-2: Use multiple JSON Sources with different filter expressions
  3. Method-3: Enable Array Flattening Option (Useful if you have smaller arrays – i.e. less than 10 items) – Version v2.7.6+

Now lets discuss each approach in detail.

Method-1: Use single JSON Source along with JSON Parser Transform

SSIS PowerPack v2.2 released new JSON Parser Transform and XML Parser Transform which can be used to parse any input JSON/XML data coming from upstream in pipeline. It parse input string  column into multiple columns how JSON/XML Source does but the only difference is JSON Parser takes String as input from existing upstream column.

Here are the steps to parse multiple JSON arrays using JSON Parser Transform. For example purpose we will use sample store json listed above.

  1. Drag new data flow in SSIS Control Flow designer. Double click on Data flow
  2. Drag ZS JSON Source from SSIS Toolbox. Configure Source, Filter and other property. Check [Output As Document] option (Found under Access Mode dropdown)
  3. Click preview. From grid copy one sample document from __DOCUMENT__ column. This is raw JSON which we will use later to feed as sample in JSON Parser Transform (If you cant copy JSON then obtain some other way)
  4. Click Columns tab and verify metadata (e.g. datatype, length). if you think you need extra columns or bigger column length then change in the grid and check [Lock] option found in the last grid column.
  5. Click OK to save JSON Source.
  6. Drag Multicast Transform (Native SSIS Transform) from Toolbox
  7. Connect JSON Source to Multicast Transform
  8. Drag first JSON Parser Transform from Toolbox. Rename it to [Parse Employees]. Connect blue arrow coming from Multicast to JSON Parser Transform
  9. Drag second JSON Parser Transform from Toolbox. Rename it to [Parse Books]. Connect blue arrow coming from Multicast to JSON Parser Transform
  10. Double click [Parse Employees]. Select __DOCUMENT__ as upstream column. Enter sample JSON you obtained in step#3. Click on select Filter and highlight Employees node (i.e. $.store.employees[*] ). Click preview to see data.
  11. Very Important:Click Columns tab and verify metadata (e.g. datatype, length). If you think you need extra columns or bigger column length then change in the grid and check [Lock] option found in the last grid column. If column length is smaller than output data then you will get error message when you run package. You can review error message carefully to determine which column needs to be adjusted  (Repeat this until you clear all errors).
  12. Click OK to save [Parse Employees].
  13. Double click [Parse Employees]. Select __DOCUMENT__ as upstream column. Enter sample JSON you obtained in step#3. Click on select Filter and highlight Employees node (i.e. $.store.employees[*] ). Click ok. Click preview to see data.
  14. Perform same steps as #11 for  [Parse Employees] to set correct metadata.
  15. Click OK to save [Parse Employees].
  16. Drag ZS Trash destination from Toolbox. Connect Trash destination to [Parse Employees]
  17. Drag ZS Trash destination from Toolbox. Connect Trash destination to [Parse Books]
  18. Run the package and you will see both 2 outputs from different JSON Parser with different row count.

Here is how it will look like

Extract multiple array output from JSON file / REST API response in (SSIS JSON Source, JSON Parser Transform)

Extract multiple array output from JSON file / REST API response in (SSIS JSON Source, JSON Parser Transform)

 

Method-2: Use multiple JSON Sources with different filter expressions

You can consider using multiple JSON Source (each source with unique filter expression). Since one JSON source can only extract one array or Document.. You have to use below technique to read multiple arrays from same JSON document.

Read array from JSON data - Reading multiple array or single array

Read array from JSON data – Reading multiple array or single array

Method-3: Use Array Flattening Option (Preferred for smaller arrays)

If you have array inside extracted record and you like to flatten it so each items of array becomes column then use newly introduced feature [Enable Array Flattening]

Consider JSON like below. What if you don’t like to store every detail from below JSON in just one table (i.e. rather than splitting in 3 tables Customers, CustomerHobbies , CustomerAddresses ?) . To achieve full denormalization ZappySys offers powerful yet simple one click option.

To enable flattening check below option on Filter

JSON Array Flattening Option in SSIS JSON Source

JSON Array Flattening Option in SSIS JSON Source

 

Column after Array Flattening Option

Column after Array Flattening Option

 

Using JSON REST API Connector to parse JSON data

If you want to access JSON data from REST API then you can use same JSON Source Connector.  All you have to do is rather than local file path c:\data\xyz.json change it to URL as below and you will be able to access and filter JSON data using same technique as above.

Get JSON data from URL (REST API JSON response)

Get JSON data from URL (REST API JSON response)

JSON Path for recursive data extract (Double dot expression)

There will be a time when you dont know how many levels deep you have to go for data extract. See below example. We will to get all employees id and name from hierarchy. For that you can use double dot like below. As you can see there are many nested levels and we want to extract id and name from each level so data can be

Expected Output

Sample Data

Configure JSON Source for Recursive Children Scan using Double Dot JSONPath

Open JSON Source and configure like below.

Set Filter as $..employees[*] (Notice how we used double dots in expression. This does recursive scan for N number of levels)

JSON Path for recursive scan of nested array (many levels deep)

JSON Path for recursive scan of nested array (many levels deep)

Parse Multi-Dimensional Array

If you want to learn more how to parse Multi-Dimensional JSON arrays then check this article full length article.

Sample JSON with Multi Dimensional array

Array Transformation Options for 2D array

Sample JSON with Multi Dimensional array (No columns in document)

Array Transformation Options for 2D array

Performance Tips

In this section we will describe how to make your things faster.

Reduce number of requests to server by changing page size

If you are doing REST API paging to get full resultset then make sure you adjust page size to appropriate number so total requests sent to server is reduced. Each request to server adds overhead. Most of API supports page size parameter. Refer to your API documentation how to adjust page size.

Avoid multiple REST API Round trips by caching response

If you extracting multiple arrays from REST API response using above technique then we recommend you to use REST API Task so save response to File or variable first and then use cached JSON data multiple times. This approach is not as friendly as others but it can certainly reduce your total API requests.

Use GZip compression

If your REST API supports GZip compression then check Enable Gzip on Advanced tab of SSIS JSON Source (File, REST API Connector). This will compress response in Gzip format from server and when it comes to your server it will be automatically decompressed by JSON Source connector. This seeds up thing significantly if you have lots of data to download.

Uncheck include parent columns if not needed

By default JSON Source connect includes parent column (Level above last property in filter) from the Filter path you specify. For Example  if you specify $.store.employees[*] and “Include parent columns” checked then It will extract all employees and then also include parent level columns for store (Tip: Parent column name start with P_ by default) . If you don’t need parent columns then uncheck from output, this reduces many steps during data extract process.

Download sample package

Click here to download SSIS 2012 Package (Will work in SSIS 2014, 2016)

Conclusion

In this post you learned how to extract data from JSON array and how to make data extract fast. Download SSIS JSON Source (File, REST API Connector) to try everything listed above.

Posted in SSIS JSON Source (File/REST) and tagged , , , , , , , .