SSIS Json Source Adapter (Bulk Extract,Read,Copy Json Records)
PreviousNext

JSON Source Adapter can be used to extract and output JSON data stored in JSON local file, Web URL or direct JSON String. Component also supports JSONPath to filter data. This component is optimized to work with very large files.

Download SSIS PowerPack

Content

Video Tutorial


Step-By-Step

In this section you will learn how to use JSON Source Adapter to extract data from JSON file (In this case its Web URL)..
  1. In BIDS/SSDT create new SSIS package
  2. From the SSIS toolbox drag and drop Data Flow Task on the controlflow designer surface.
  3. Double click on the DataFlow task to see DataFlow designer surface.
  4. From the SSIS toolbox drag and drop JSON Source on the dataflow designer surface.
  5. Double click JSON Source to configure it.
  6. From the Access Mode dropdown select [File path or web Url] and paste the following Url for this example .
    http://services.odata.org/V3/OData/OData.svc/Products?$format=json
  7. Now enter JSONPath expression in JSONPath textbox to extract only specific part of JSON file as below ($.value[*] will get content of value attribute from JSON document. Value attribute is array of JSON documents so we have to use [*] to indicate we want all records of that array)
    $.value[*]
  8. Click preview to see our settings are OK.
    Read JSON File data from Web Url Example in SSIS

    ---- OR ----

    You can also extract data from direct JSON (Use our Example links on UI to get some sample JSON quickly)
    Read JSON File data from Web Url Example in SSIS
  9. Click OK to save settings.
  10. From the SSIS toolbox drag and drop Trash Destination on the dataflow designer surface.
  11. Now single click on the JSON Source, once you see blue arrow from source ... connect it to Trash Destination.
  12. (Optional) Right click on the path and Add Data Viewer.
  13. Execute the package and verify source data in data viewer.
    How to read-extract JSON records from file in SSIS

Properties

Property Name Description
LoggingMode LoggingMode determines how much information is logged during Package Execution. Set Logging mode to Debugging for maximum log.

Available Options

Option Description
Normal Normal
Medium Medium
Detailed Detailed
Debugging Debugging
PrefixTimestamp When you enable this property it will prefix timestamp before Log messages.
AccessMode Defines how to read the JSON file or direct string

Available Options

Option Description
DirectValue Direct value
ValueFromVariable Direct value from variable
DirectPath File path or web URL
PathFromVariable File path or web URL from variable
DirectValue Defines how to read the JSON file or direct string
ValueVariable Variable name which holds JSON string
PathVariable Variable name which holds data file path or url
DirectPath JSON file file path (e.g. c:\data\myfile.json) or pattern to process multiple files (e.g. c:\data\*.json)
HttpHeaders Set this if you want to set custom Http headers. You may use variable anywhere in the header value using syntax {{User::YourVarName}}. Syntax of Header key value pair is : <request><header><name>x-myheader-1</name><value>AAA</value></header> <header><name>x-myheader-2</name><value>BBB</value></header></request>
HttpRequestData User defined data you wish to send along with your HTTP Request (e.g. Upload file data, Form POST data). Usually you have to set content-type of your data but if you select RequestMethod=POST then system will automatically set content-type=application/x-www-form-urlencoded.
HttpRequestMethod Http Web Request Method (e.g. POST, GET, PUT, LIST, DELETE...). Refer your API documentation if you are not sure which method you have to use.
HttpRequestContentType Specifies content type for data you wish to POST. If you select Default option then system default content type will be used (i.e. application/x-www-form-urlencoded). If you specify Content-Type header along with this option then header value takes precedence.
MaxRows Maximum JSON records to fetch. Set this value to 0 for all records
JsonFormat Data format coming from HTTP Response. This is useful for example when you have OData service and you want to automatically consume all pages of data using odata.netUrl. Setting JsonFormat=Odata will automatically do it for you. This setting is only applicable if JSON is coming from HTTP WebRequest.

Available Options

Option Description
Notset Notset
Json JSON
Odata OData
UseProxy Enable custom proxy settings (If this is not set then system default proxy will be used. To disable proxy totally uncheck this option and check DoNotUseDefaultProxy option if available)
ProxyUrl Web URL of Proxy server (including port  if necessary). [e.g. http://myproxyserver:8080/]
UseProxyCreds Enable passing userid and password to proxy server
ProxyUserName Proxy username
ProxyPassword Proxy password
NextUrlAttribute If Service response support pagination using some sort of next url attribute then specify which attribute name in JSON Response string which holds next url. If no attribute found or its null then component will stop fetching next resultset
PrevUrlAttribute If Service response support pagination using some sort of prev/next url attribute then specify which previous link attribute name from JSON Response string which holds previous url.
NextUrlStopIndicator Specify value for Next Cursor attribute on which you want to stop processing. If you leave this blank then system will stop fetching next result set when Next Cursor attribute is missing or its blank. If your service sends some numeric value for end of resultset such as -1 then specify it here.
NextUrlSuffix If you want to include certain text (or parameters) at the end of Next url then specify this attribute (e.g. &format=json)
NextUrlWait This property indicates total number of milliseconds you want to wait before sending next request. This option allows you to adjust how many API calls can be made within certain timeframe. If your API Service has no limit then set this option to zero
ContinueOnUrlNotFoundError If this option is true then component will continue without exception on 404 error (Url not found). This allows you to consume data gracefully.
CookieContainerVariable Cookie Container can be used to maintain state between multiple web requests. Example: You can login to site like wordpress and then extract any private page content by simply passing authentication cookies using this variable.
Filter Enter expression here to filter data.(Example:  $.Users[*].UserName ) This will fetch User names from users records
IncludeParentColumns Use this option to include parent properties (Non array) in the output along with Filtered Rows
IncludeParentColumnsWhenChildMissing By default child and parent information is not included in the output if children not found for specified expression. For example if you want to extract all orders from all customers nodes then you can type $.Customers[*].Orders[*]. This will fetch all orders from all customers. By default customers records without orders wont be included in the output. If you want to include those customers where orders not found then check this option (Output null information for order attributes). This behavior is similar to LEFT OUTER JOIN in SQL (Left side is parent, right side is child). This option is ***resource intensive*** so only check if you really care about this behavior.
IncludeParentColumnsWithArrayType Set this option to true if you want to output parent columns which are array. By default any parent column which is an array is not included in output. See also FilterForParentColumnsWithArrayType property if you set this property
FilterForParentColumnsWithArrayType Set this option to true if you want to output parent columns which are array. By default any parent column which is an array is not included in output. See also FilterForParentColumnsWithArrayType property if you set this property
ParentColumnPrefix Prefix for parent column name. This option is only valid if you have set IncludeParentColumns=True
ThrowErrorIfPropertyMissing Throw error if property name specified in filter expression is missing. By default it will ignore any missing property errors.
MaxLevelsToScan This property how many nested levels should be scanned to fetch various properties. 0=Scan all child levels.
ExcludedProperties List comma separated property names from JSON document which you want to exclude from output. Specify parent property name to exclude all child nodes.
RequestTimeout Http request Timeout in seconds. Set this to 0 if you want to use system default value (i.e. 100 seconds)
SecurityProtocol Specifies which security protocol is supported for HTTPS communication. Using this option you can enable legacy protocol or enforce to use latest version of security protocol (Note: TLS 1.2 is only supported in SSIS 2014 or Higher).

Available Options

Option Description
Default System Default
Ssl3 SSL v3.0
Ssl3Plus SSL v3.0 or higher
Tls TLS v1.0
TlsPlus TLS v1.0 or higher
Tls11 TLS v1.1
Tls11Plus TLS v1.1 or higher
Tls12 TLS v1.2
Tls12Plus TLS v1.2 or higher
EnableCompressionSupport Enable support for gzip or deflate compression. When you check this option compressed response automatically de-compressed saving bandwidth. This option is only valid if web server supports compressed response stream. Check your API documentation for more information.
IgnoreCertificateErrors Ignore SSL certificate related errors. Example: if you getting SSL/TLS errors because of certificate expired or certificate is not from trusted authority or certificate is self-signed. By checking this option you will not get SSL/TLS error.
DateFormatString Specifies how custom date formatted strings are parsed when reading JSON.
DateParseHandling Specifies how date formatted strings, e.g. Date(1198908717056) and 2012-03-21T05:40Z, are parsed when reading JSON.

Available Options

Option Description
None Keep date as string
DateTime Convert to DateTime (Timezone lost)
DateTimeOffset Convert to DateTimeOffset (Preserve Time zone)
OutputRawDocument Output as raw JSON document rather than parsing individual fields. This option is helpful if you have documents stored in a file and you want to pass them downstream as raw JSON string rather than parsing into columns.
IndentOutput Indent JSON output so its easy to read.
OutputFilePath Set this option to true if you want to output FilePath. This option is ignored when you consume DirectValue or data from Url rather than local files. Output column name will be __FilePath
OutputFileName Set this option to true if you want to output FileName. This option is ignored when you consume DirectValue or data from Url rather than local files. Output column name will be __FileName
EnableArchiveFile Set this option to true if you want to move processed file to archive folder.
ArchiveFolderPath Folder path where you want to move processed file.
OverwriteFileInArchiveFolder Folder path where you want to move processed file.
ArchiveFileNamingConvention File naming convention for archived file. By default it will same name as original source file processed. But you can control naming format using {%name%} and {%ext%} placeholders. e.g. {%name%}_processed{%ext%} or {%name%}{%ext%}.{{System::ContainerStartTime,yyyyMMdd_HHmmss_fff}}
UseConnection Use connection to pass credentials for authentication (e.g. Use UserID/Password or Use OAuth Protocol for token based approach)
EnablePivot When this property is true then Column is converted to Row. Pivoted names will appear under  Pivot_Name column and values will appear under Pivot_Value field.
PagingMode Specified how you want to loop through multiple pages returned by REST API.

Available Options

Option Description
ByResponseAttribute Response Attribute Mode - Read next page information from response
ByUrlParameter Url Parameter Mode - Page number is passed as query string parameter
ByUrlPath Url Path Mode - Page number is passed as URL path
ByPostData POST data Mode - Page number is passed inside POST data
PageNumberAttributeNameInUrl e.g. Type page_num if URL looks like this => http://abc.com/?page_num=1&sort=true  (page number via query string)
--or-- Type <%page%> if page number is inside URL path like this => http://abc.com/1/?sort=true  (e.g. replace page number in url with placeholder http://abc.com/<%page%>/?sort=true)
Page number will be incremented by one for next URL until last page is reached or [Max Page Number] is reached
MaxPageNumber Maximum page number until which auto increment is allowed. Type zero for no limit. Next URL contains next page number (increment by one) until last page is detected or [Max Page Number] limit is reached.
StartPageNumberVariable Variable name which will hold starting page number. This is ignored if you useing parameter name from query string to indicate page number.
PageNumberIncrement Page counter increment. By default next page is incremented by one if this value is zero. You can also enter negative number if you want to decrease page counter.
PagingEndStrategy Specified how you want detect last page.

Available Options

Option Description
DetectBasedOnResponseSize Detect last page based on response size (in bytes)
DetectBasedOnResponseErrorMessage Detect last page based on error message (sub string)
DetectBasedOnResponseStatusCode Detect last page based on status code (numeric code)
DetectBasedOnRecordCount Detect based on missing row (stop when no more records)
LastPageWhenConditionEqualsTo Condition result to compare to detect last page. Set this property to True if you want detect last page if condition is true else set this to False.
ResponseMinBytes Minimum bytes expected from response.
ResponseMaxBytes Maximum bytes from response.
ResponseErrorString Expected error message sub string from response.
ResponseStatusCode Expected status code from response when page number you trying to access not found.
MetaDataScanMode Metadata scan mode controls how data type and length is determined. By default few records scanned to determine datatype/length. Changing ScanMode affects length/datatype accuracy.
MetaDataCustomLength Length for all string column. This option is only valid for MetaDataScanMode=Custom

Error handling

Remarks

This component supports JSONPath Filtering. Lets take following sample JSON as example.

Things to remember for JSONPath expressions

Sample JSON for examples

[
  {
    "RecID": 1,
    "CustomerID": "C1",
    "CompanyName": "Company 1",
    "Branches": [
      {
        "Name": "BranchA1",
        "City": "City1_1",
        "State": "State1",
        "Features" : [ "f1", "f2","f3","f4", "f5"]
      },
      {
        "Name": "BranchB1",
        "City": "City1_2",
        "State": "State1",
        "Features" : [ "s1", "s2","s3"]
      }
    ]
  },
  {
    "RecID": 2,
    "CustomerID": "C2",
    "CompanyName": "Company 2",
    "Branches": [
      {
        "Name": "BranchA1",
        "City": "City2_2",
        "Features" : [ "f1", "f2","f3","f4", "f5", "f6"]
      },
      {
        "Name": "BranchB1",
        "City": "City2_2",
        "Features" : [ "s1", "s2","s3","s4"]
      }
    ]
  },
  {
    "RecID": 3,
    "CustomerID": "C3",
    "CompanyName": "Company 3",
    "Branches": [
      {
        "Name": "BranchA1",
        "City": "City3_1",
        "Features" : [ "f1", "f2","f3","f4", "f5", "f6"]
      },
      {
        "Name": "BranchB1",
        "City": "City3_2",
        "Features" : [ "s1", "s2","s3","s4"]
      }
    ]
  }
]

Example of JSONPath filter expression

Example Description
$
Get all records

SSIS JSON Source Adpater - JSONPath filter

$.CompanyName Get company name of all records

SSIS JSON Source Adpater - JSONPath filter

$.Branches[*] Get all branches of all records

SSIS JSON Source Adpater - JSONPath filter

$.Branches[:1] Get first branch of every record
$.Branches[1:] Get last branch of every record
$.Branches[*].Features[*] Get all features of all branch

SSIS JSON Source Adpater - Output, Read, Copy JSON Data from Web Url or File

How to pass credentials to service (Basic Authorization Header)

SSIS JSON Source Task - Pass Base64 Encoded HTTP Authorization Header
SSIS JSON Source Task - Pass Base64 Encoded HTTP Authorization Header
Base64 Encoded HTTP Basic Authorization Header View in Fiddler
Base64 Encoded HTTP Basic Authorization Header - View in Fiddler

Settings UI

SSIS JSON Source - Read from String SSIS JSON Source - Read from Variable SSIS JSON Source - Variable Mode - Call Web API or Read from File SSIS JSON Source - Configure Columns and DataType SSIS JSON Source - Direct Mode - Call REST API web service or read from JSON File

JSON Source (REST API Connector) FAQs

What type of JSON data feeds supported by JSOn Source Connector?
JSON Source supports reading JSON data from String, Files or REST API Service
How to read JSON data from multiple files?
JSON Source supports wildcard pattern (c:\data\cust*_???.json) so you dont have to implement custom looping to reach data from multiple files. It also support additional two columns in the output for FileName and FilePath so each record coming from File can be identified from which file its coming from.
My REST API feed returns data in chunk. How to read all pages until last record?
Read this article for more information about various paging scenario.
How to read data using OAuth Authrization?

References

See Also


Copyrights reserved. ZappySys LLC.