Parse JSON array in SSIS or ODBC Drivers

Introduction

In our previous post we saw how to parse JSON arrays. Now let’s look at more advanced techniques to parse multi-dimensional JSON array in SSIS (  e.g. 2D – JSON array inside array). We will use SSIS JSON Source to parse complex nested JSON in few clicks. Tips and Tricks mentioned in this article also apply to ODBC PowerPack API Drivers.

2D arrays are used to stuff more data in most compact way. Since you don’t have to repeat column names you save great amount of space. 2D array JSON format is close to CSV format in data size but it gives advantage and structure of JSON without adding extra fat.

Parsing JSON Arrays – Simple hierarchy

If you have simple JSON / XML Structure then you can use Select Filter option to parse array and flatten the hierarchy. You can select top level array or nested array to de-normalize the structure.

For example you can select $.orders[*] to extract all order records or use $.orders[*].items[*] to get each order items for all orders. There is also possibility to use light weight expression (like WHERE clause). Check this article how to use JSONPath but it has its own limitation,

Select Filter

Select Filter

 

Parsing Multi-Dimensional JSON Arrays

Now lets look at how to parse various multi-dimensional JSON array patterns in SSIS using SSIS JSON Source  or ODBC API drivers like JSON Driver

  1. 2D Value Array with schema information (column names in a separate array) – Pattern 1
  2. 2D Value Array without schema information – Pattern 2
  3. Parse JSON Array using Complex Transformation – Pattern 3
  4. Key/Value Transformation – Pattern 4
  5. Multiple columns using JSONPath Expression (Google Geocoding API Usecase) – Pattern 5
  6. Columnless Array – Pattern 6

Pattern 1 – JSON Array using Simple Transformation (Column names found in JSON)

This is the most common pattern of multi-dimensional JSON array where column names are stored in a separate array and values are stored in another 2D array

Pattern 2 – JSON Array using Simple Transformation (Column names not found in JSON)

This is another common pattern of multi-dimensional JSON array where there are no column names. Values are stored in 2D array. Google SpreadSheet API uses this approach. In this pattern you don’t have column names specified so we have to enter manually (This like a CSV file without column names header). On the 2D array Tab check “Specify columns list manually” option. Then enter column names separated by comma. Enter same number of columns in same order as value array.

Pattern 3 – JSON Array using Complex Transformation

Below is more complex version where column names are found in array of column schema documents. Values are also stored in 2D array with complex structure. Google BigQuery uses similar complex structure

 

Pattern 4 – Key/Value Transformation

SSIS PowerPack v2.6.6 and later introduced new transformation type called Key/Value transformation. This option Pivots dynamic Key/Value pairs of some attributes into Columns. Zoho CRM is best use case. Fully described here.

To transform above JSON where fields appear as column (e..g FirstName, LastName and Email) you need to apply following settings on JSON Source UI.

  1. Select Filter $.rows[*]
  2. Goto 2D Array Tab and select following settings
    1. Transform Type = Key/Value
    2. Column Name Filter =  $.fields[*].name
    3. Row Value Filter =  $.fields[*].value

Pattern 5 – Multiple Columns with Expressions (Google Geocoding API Usecase)

(Version 2.7.4 or higher) If you have a use case to include additional columns in the output based on multiple JSONPath expressions then you can use below option. This is useful for API such as Google GeoCoding API

Assume that you have following JSON as input (For demo use Direct Value Mode). You want to extract Address Components into Columns. Perform the following steps to paste such JSON.

 

 

Steps to Parse JSON using Multiple Expressions

  1. Drag and drop ZS JSON Source inside Data Flow Designer surface
  2. Double click to edit component
  3. Select Direct Value Mode and enter sample JSON (See above example)
  4. Select Filter or enter manually  $.results[*]
  5. Go to 2D Array Transformation Tab
  6. Select Transformation Type as Multiple columns using Expressions
  7. Click Configure Columns. You can now enter column name and expression to extract value for that column. Your expression must return only a single value.
    For Example, we entered expressions like below
    Pattern 5- Multiple Columns Using JSONPath Expressions (Google Geocoding API example)

    Pattern 5- Multiple Columns Using JSONPath Expressions (Google Geocoding API example)

  8. Now click Preview to see parsed data. As you see along with other columns we also got additional columns using complex JSONPath expressions.
    Preview Cleaned / Parsed Addresses from Google Geocoding API response

    Preview Cleaned / Parsed Addresses from Google Geocoding API response

Pattern 6 – Columnless array

In this pattern you have 2D array without any property name in JSON like below. For this select Columnless Array option on 2D Transform Tab. You can also define custom columns if you dont like default names.

Parse JSON Array without any column names / property

Parse JSON Array without any column names / property

Pivot JSON / XML data

Now let’s consider little different scenario. If you don’t have JSON Array (See below sample) and your values are part of property name (in below case ProductCode – P001, P002… is actually Property name rather than value).  In this case you need to use Pivot Option. Check this article for full detail.

Pivot JSON / XML Data option for ZappySys API Connectors / Drivers

Pivot JSON / XML Data option for ZappySys API Connectors / Drivers

 

Pivot JSON Data using SSIS JSON Source

Pivot JSON Data using SSIS JSON Source

Conclusion

Using SSIS PowerPack or ODBC PowerPack you can transform complex JSON in a few clicks. Both products support all options listed in this article. Based on your use case you can either use ZappySys Drivers or use SSIS Connectors.

 

 

Posted in SSIS JSON Source (File/REST) and tagged , , , .