Introduction
If you have need to integrate REST API in SSIS (i.e. fetch JSON, XML, CSV via HTTP call) and wondering how to Read API data in SSIS and Load into SQL DB (i.e. SQL Server, MySQL, Oracle) then you are at the right place.
In this article we will cover simple step by step approach how to achieve API integration in few clicks. For detailed introduction please refer to our previous article.
In next few sections we will cover the following SSIS Components to read data from API like a Table (Parse in Rows / Column). Our instructions will be mostly for JSON Format but concepts are same for other formats (i.e. XML / CSV).
JSON Source Connector (Read from REST API, JSON File or OData Service): Use this dataflow component when you have to fetch data from REST API webservice like a table. This component allows you to extract JSON data from webservice and de-normalize nested structure so you can save to Relational database such as SQL Server or any other target (Oracle, FlatFile, Excel, MySQL). This component also supports reading local JSON files or direct JSON string (Wildcard pattern supported too e.g. c:\data\file*.json). | |
XML Source Connector (SOAP, File, REST) : Use this dataflow component when you have to fetch data from XML or SOAP webservice and consume data like a table. This component allows you to extract data from webservice and save to SQL Server or any other target (Oracle, FlatFile, Excel, MySQL). This component also supports reading local XML files or direct XML string. | |
CSV Source Connector (File, REST) : Use this dataflow component when you have to fetch data from File or REST API which returns data in CSV format . This component allows you to extract data from webservice and save to SQL Server or any other target (Oracle, FlatFile, Excel, MySQL). This component also supports reading local CSV files or direct CSV string. |
There are few more API components not discussed in this articles. They are REST API Task (no Parser – Just raw API calls) and Web API Destination (Write / Send Data To API / Delete rows). They both have different use case to check them see it fits in your need.
Prerequisites
Before we perform the steps listed in this article, you will need to make sure the following prerequisites are met:- SSIS designer installed. Sometimes it is referred to as BIDS or SSDT (download it from the Microsoft site).
- Basic knowledge of SSIS package development using Microsoft SQL Server Integration Services.
- Make sure ZappySys SSIS PowerPack is installed (download it, if you haven't already).
- (Optional step). Read this article, if you are planning to deploy packages to a server and schedule their execution later.
Video Tutorial
Step-By-Step – Read API data in SSIS using JSON / XML / CSV Source
In this section you will learn how to use JSON Source Adapter to Read data from API URL.
- Open Visual Studio and Create New SSIS Package Project.
- Now, Drag and Drop SSIS Data Flow Task from SSIS Toolbox.
- Double click on the Data Flow task to see Data Flow designer surface.
- From the SSIS toolbox drag and drop JSON Source on the dataflow designer surface. If your data format is XML then use XML Source, if your data format is CSV then use CSV Source.
- Double click JSON Source to configure it.
- From the Access Mode dropdown select [File path or web Url] and paste the following Url for this example .
1https://services.odata.org/V3/OData/OData.svc/Products?$format=json - Now next step is to select Filter (Click Browse button next to Array Filter) or just type below expression in the text box to define which Hierarchy you like to Flatten (i.e. Array).
- JSON / XML can have many Arrays in a single document (like many Tables inside one Database). So you have to define which branch you like to extract. Enter JSONPath expression in the Filter textbox to extract only specific part of JSON file as below ($.value[*] will get content of value property from JSON document. Value property is an array of JSON documents so we have to use [*] to indicate we want all records of that array)
1$.value[*] - Click preview to see our settings are OK.
- Click OK to save settings.
- How to pass credentials to service (Basic Authorization Header) Check this article for more information
After this section we will see how to connect Source to Target like SQL Server and Load API data into SQL Server Table.
Load API data into SQL Server Table / Other Target
- Inside Data Flow, Drag and drop Upsert Destination Component from SSIS Toolbox
- Connect our Source component to Upsert Destination
- Double click Upsert Destination to configure it
- Select Target Connection or click NEW to create new connection Configure SSIS Upsert Destination Connection - Loading data (REST / SOAP / JSON / XML /CSV) into SQL Server or other target using SSIS
- Select Target Table or click NEW to create new table based on source columns
- Click on Mappings Tab to Auto map columns by name. You can change mappings as you need SSIS Upsert Destination - Columns Mappings
- Click OK to Save Upsert Destination Settings
- That's it, You are now ready to run data flow. NOTE: If you wish to debug data flow and see records when you run, add data viewer by right click on blue arrow > Click Enable Data Viewer
- To execute data flow, Right click anywhere inside Data Flow Surface and click Execute Task
API Authentication
In our step by step example we did not cover Authentication to keep things simple but in real world you have to pass some sort of authentication details to make API calls (e.g. UserID/Password or Secure Token). There are many ways to authenticate with API Services. ZappySys offer mainly 2 Connection Managers to cover most Authentication scenarios.
Using HTTP Connection Manager
If you have use case of using UserID / Password or any other method supported by HTTP Connection Manager then use below option (Below example is for Basic Authentication)
Using OAuth Connection Manager
If your API support OAuth Standard for Authorization then use OAuth Connection Type as below.
API Pagination
Pagination is another important concept to understand when you call API using ZappySys Connector. If you ready many records from your API then by default API might not return all rows so you have to paginate until all records are read (Like a Loop in Programming). ZappySys makes it easy to paginate. Check this article to configure many different Pagination Modes
The most common error you may face when you run an SSIS package is truncation error. During the design time only 300 rows are scanned from a source (a file or a REST API call response) to detect datatypes but at runtime, it is likely you will retrieve far more records. So it is possible that you will get longer strings than initially expected. For detailed instructions on how to fix common metadata related errors read an article "How to handle SSIS errors (truncation, metadata issues)".
Another frequent error you may get is an authentication error, which happens when you deploy/copy a package to another machine and run it there. Check the paragraph below to see why it happens and how to solve this problem.
Things have gone bad: Error handling & debugging
Incidentally, bad things can happen. A remote server may go offline or your server may go out of memory. In any case, you may want to know when that happens and take actions accordingly. For that purpose, you have to redirect bad rows to some other destination. For this example, we will take and use Web API Destination, but basically, you can use any SSIS component:
Handling errors
- Add a Derived Column above Web API Destination with expression "(DT_WSTR,4000)ZS_JSON_OUT" and name it "JsonAsString". This will let you see what JSON you are actually passing.
- Then add a database or file destination or use another Trash Destination for debugging purposes and redirect the bad rows (red arrow) from Web API Destination into it. Don't forget to set Redirect row option for both, Error and Truncation columns:
- Finally, add a Data Viewer for the red path, if you want to debug the flow. You will be able to see URL, JSON and the error message for each record. You may want to copy-paste ErrorMessage to Notepad if you want it to be more readable:
Debugging HTTP requests
A common thing you have to do when working with HTTP requests is to debug those requests; e.g. to check what headers, body or URL was passed. To test how things look behind the scenes we strongly suggest to use Fiddler - a popular web debugging tool.
Inside it, you can double-click the URL entry (Right side) to see Request and Response Panels. The top panel is Request (URL, Headers, Body) and Bottom Panel is Response. For https:// (secure URL) make sure you enable HTTPS option in Fiddler (Tools > Options > HTTPS > Check Decrypt https request):
Conclusion
REST API is becoming more and more popular each day. With that Data integration from RESTful API services going to be challenge. Luckily ZappySys SSIS PowerPack provides a great way to integrate any API in SSIS via simple drag and drop approach without coding. Try SSIS PowerPack for FREE see how much time / money you can save and to integrate virtually any REST API.