Parse JSON array in SSIS or ODBC Drivers

Introduction

In our previous post we saw how to parse JSON arrays. Now let’s look at more advanced techniques to parse multi-dimensional JSON array in SSIS (  e.g. 2D – JSON array inside array). We will use SSIS JSON Source to parse complex nested JSON in few clicks. Tips and Tricks mentioned in this article also apply to ODBC PowerPack API Drivers.

2D arrays are used to stuff more data in most compact way. Since you don’t have to repeat column names you save great amount of space. 2D array JSON format is close to CSV format in data size but it gives advantage and structure of JSON without adding extra fat.

Parsing JSON Arrays – Simple hierarchy

If you have simple JSON / XML Structure then you can use Select Filter option to parse array and flatten the hierarchy. You can select top level array or nested array to de-normalize the structure.

For example you can select $.orders[*] to extract all order records or use $.orders[*].items[*] to get each order items for all orders. There is also possibility to use light weight expression (like WHERE clause). Check this article how to use JSONPath but it has its own limitation,

Select Filter

Select Filter

 

Parsing Multi-Dimensional JSON Arrays

Now lets look at how to parse various multi-dimensional JSON array patterns in SSIS using SSIS JSON Source  or ODBC API drivers like JSON Driver

  1. 2D Value Array with schema information (column names in a separate array) – Pattern 1
  2. 2D Value Array without schema information – Pattern 2
  3. Parse JSON Array using Complex Transformation – Pattern 3
  4. Key/Value Transformation – Pattern 4
  5. Multiple columns using JSONPath Expression (Google Geocoding API Usecase) – Pattern 5
  6. Columnless Array – Pattern 6

Pattern 1 – JSON Array using Simple Transformation (Column names found in JSON)

This is the most common pattern of multi-dimensional JSON array where column names are stored in a separate array and values are stored in another 2D array

Pattern 2 – JSON Array using Simple Transformation (Column names not found in JSON)

This is another common pattern of multi-dimensional JSON array where there are no column names. Values are stored in 2D array. Google SpreadSheet API uses this approach. In this pattern you don’t have column names specified so we have to enter manually (This like a CSV file without column names header). On the 2D array Tab check “Specify columns list manually” option. Then enter column names separated by comma. Enter same number of columns in same order as value array.

Pattern 3 – JSON Array using Complex Transformation

Below is more complex version where column names are found in array of column schema documents. Values are also stored in 2D array with complex structure. Google BigQuery uses similar complex structure

 

Here is another complex variation involving multiple JSON 2D arrays, wherein column names are present within an array of column schema documents. Additionally, values are stored in a 2D array with a sophisticated structure. The Google Analytics Data API also employs a similar intricate structure for data representation.

In the Column name and Values Filter, instead of specifying individual arrays (e.g., dimensionHeaders and metricHeaders), we need to use the ‘*’ wildcard symbol to retrieve data from all arrays in a single output. This allows us to gather all the arrays’ data efficiently and conveniently.

Pattern 4 – Key/Value Transformation

SSIS PowerPack v2.6.6 and later introduced new transformation type called Key/Value transformation. This option Pivots dynamic Key/Value pairs of some attributes into Columns. Zoho CRM is best use case. Fully described here.

To transform above JSON where fields appear as column (e..g FirstName, LastName and Email) you need to apply following settings on JSON Source UI.

  1. Select Filter $.rows[*]
  2. Goto 2D Array Tab and select following settings
    1. Transform Type = Key/Value
    2. Column Name Filter =  $.fields[*].name
    3. Row Value Filter =  $.fields[*].value

Pattern 5 – Multiple Columns with Expressions (Google Geocoding API Usecase)

(Version 2.7.4 or higher) If you have a use case to include additional columns in the output based on multiple JSONPath expressions then you can use below option. This is useful for API such as Google GeoCoding API

Assume that you have following JSON as input (For demo use Direct Value Mode). You want to extract Address Components into Columns. Perform the following steps to paste such JSON.

 

 

Steps to Parse JSON using Multiple Expressions

  1. Drag and drop ZS JSON Source inside Data Flow Designer surface
  2. Double click to edit component
  3. Select Direct Value Mode and enter sample JSON (See above example)
  4. Select Filter or enter manually  $.results[*]
  5. Go to 2D Array Transformation Tab
  6. Select Transformation Type as Multiple columns using Expressions
  7. Click Configure Columns. You can now enter column name and expression to extract value for that column. Your expression must return only a single value.
    For Example, we entered expressions like below
    Pattern 5- Multiple Columns Using JSONPath Expressions (Google Geocoding API example)

    Pattern 5- Multiple Columns Using JSONPath Expressions (Google Geocoding API example)

  8. Now click Preview to see parsed data. As you see along with other columns we also got additional columns using complex JSONPath expressions.
    Preview Cleaned / Parsed Addresses from Google Geocoding API response

    Preview Cleaned / Parsed Addresses from Google Geocoding API response

Pattern 6 – Columnless array

In this pattern you have 2D array without any property name in JSON like below. For this select Columnless Array option on 2D Transform Tab. You can also define custom columns if you dont like default names.

OR API Patterns like Google Sheet API

Manually supply column names

If you want to enter column names manually then use below setting. If you do not enter comma separated list of columns then system will auto generate names based on total columns found.

Parse JSON Array without any column names / property

Parse JSON Array without any column names / property

Auto detect column names from first line

As we mentioned that API like Google Sheets might send you column names as Row data in first line. In such case you can use new option provided in v3.1.2 or later as below. If you 2D array is nested like below screenshot (e.g. under Values node) then you have to supply Array Filter on Filter Options Tab. In this case its $.values[*] . As you can see when we checked First line has column names option it autodetected column names.

JSON With 2D Array with Column names in First Row (Google Sheets API Pattern)

JSON With 2D Array with Column names in First Row (Google Sheets API Pattern)

Pattern 7 – JSON Line format (JSONL) – New line separated

Version 3.1.2 and later you can use option to parse JSON Line format. In JSON Line (also known as JSONL ) record separator is \n or \r\n. There are two versions of JSONL format.

  1. JSON Object Format – Using { }  brackets around record (Also referred as Multi-Content Format). For This format you do not need any special settings in ZappySys. Its automatically detected and parsed correctly.
    Example:
  2. JSON Array Format – Using [ ]  brackets around record. Typically this format is used with First line as Columns and other lines as Data.
    Example:

As we said First Example is automatically Parsed by ZappySys but for 2nd example (Array format) you have to use below Settings.

Parsing JSON Lines / JSONL Data Format (JSON Array Separated by new line)

Parsing JSON Lines / JSONL Data Format (JSON Array Separated by new line)

Pattern 8 – Columnless Value Array with Pivot

If you have data like below without any property name and all in single dimension  array then you can use new option introduced after v3.1.3 “Pivot – Columnless Array” on array transformation tab. This option is not same as Pivot Option found on Pivot Tab (See next section). This option only works when you have no property name for array you like to transform. If you have property name for array then use option as described in the next section.

Output:

Pattern 8 - Pivot Columnless Array

Pattern 8 – Pivot Columnless Array

 

Pivot JSON / XML data

Now let’s consider little different scenario. If you don’t have JSON Array (See below sample) and your values are part of property name (in below case ProductCode – P001, P002… is actually Property name rather than value).  In this case you need to use Pivot Option. Check this article for full detail.

Pivot JSON / XML Data option for ZappySys API Connectors / Drivers

Pivot JSON / XML Data option for ZappySys API Connectors / Drivers

 

Pivot JSON Data using SSIS JSON Source

Pivot JSON Data using SSIS JSON Source

Conclusion

Using SSIS PowerPack or ODBC PowerPack you can transform complex JSON in a few clicks. Both products support all options listed in this article. Based on your use case you can either use ZappySys Drivers or use SSIS Connectors.

 

 

Posted in SSIS JSON Source (File/REST) and tagged , , , .