Introduction
In this post we will learn how to read / write REST API data in Talend Open Studio. We will create a simple Talend Job using ZappySys JSON Driver to read from REST API / JSON Files and load into Target (e.g. File / DB). Techniques listed in this article can be also used to read from SOAP API / XML Files or CSV Files / API using XML Driver / CSV Driver.
These drivers support familiar SQL query language. Using SQL you can query virtually any API services just like relational database table. It can flatten nested hierarchy and provide output in rows / columns. Many complex REST API / SOAP API complexity is taken care automatically (e.g. Authentication, Pagination, Security, Error Handling).
So let’s get started.
Requirements
- Download and install Talend Open Studio (FREE) from here. Skip this step if you already installed.
- Download ZappySys ODBC PowerPack (JSON / XML Drivers)
- Get Microsoft JDBC driver for SQL Server from here (Download sqljdbc_6.0.8112.200_enu.exe which is self extracting file you can run and extract to some folder)
After you extract jdbc files, go to sqljdbc_6.0\enu\jre8\ folder rename sqljdbc42.jar to mssql-jdbc.jar (name must be this). We will load this file in Talend later in this article. - Basic knowledge about REST API and JSON / XML format.
Configure Data Gateway
Configure Data Gateway User / Port
Now let's look at steps to configure Data Gateway after installation. We will also create a sample data source for ODATA API (i.e. JSON based REST API Service).- Assuming you have installed ZappySys ODBC PowerPack using default options (Which also enables Data Gateway Service)
- Search "Gateway" in your start menu and click ZappySys Data Gateway
- First make sure Gateway Service is running (Verify Start icon is disabled)
- Also verify Port on General Tab
- Now go to Users tab. Click Add icon to add a new user. Check Is admin to give access to all data sources you add in future. If you don't check admin then you have to manually configure user permission for each data source.
Configure Data Source
- After user is added, go to Data Sources tab. Click Add icon to create new data source. Select appropriate driver based on your API / File format. You can choose Generic ODBC option to read data from ODBC DSN or use Native Driver option.
NOTE: Whenever possible use native driver option for better performance / security and ease of use.
- Click on "Edit" under Data source and configure as per your need (e.g. Url, Connection, Request Method, Content Type, Body, Pagination etc.). For this demo we are going to pick simple JSON REST API which doesn't need any authentication. Enter following URL.
https://services.odata.org/V3/Northwind/Northwind.svc/Invoices?$format=json
- You can also view response structure and select default hierarchy (i.e. Filter) like below (Select Array Icon) for data extraction.
Test SQL Query / Preview Data
- Now go to Preview Tab. You can click Preview button to execute default query OR Select Table name from dropdown to generate SQL with column names.
- You can also click Query Builder to generate SQL using different options in WITH clause. ANy setting you specify in WITH clause will override UI settings we applied in previous steps.
- There is another useful option for code generation. Select your Language and quickly copy code snippet. See below Example of XML Driver Query to call SOAP API.
- Click OK to Close Data Source UI
- Once data source is tested and configured you can click Save button in the Gateway UI toolbar and click Yes for Restart Service.
Register MS SQL JDBC driver in Talend
Now lets register Microsoft JDBC Driver in Talend. This is very important step because MSSQL JDBC driver is used to communicate with ZappySys Data Gateway we configured in previous step.
If you missed steps mentioned in the Requirements section then make sure you first download JDBC driver using below steps.
Get Microsoft JDBC driver for SQL Server from here (Download sqljdbc_6.0.8112.200_enu.exe which is self extracting file you can run and extract to some folder). After you extract jdbc files, go to sqljdbc_6.0\enu\jre8\ folder rename sqljdbc42.jar to mssql-jdbc.jar (name must be this).
Now lets go through the steps to register MSSQL jdbc driver in Talend.
- Open Talend Open Studio
- Go to Windows > Click Show View
- When you see Popup selection under Talend > Select Modules
- When Module window is visible click on Little Jar Icon (Bottle icon) in the toolbar.
- Select mssql-jdbc.jar file we renamed earlier and load this file.
- That’s it. Now we ready to make API calls / read from JSON / XML in the next section.
Setup Talend REST API Connection (JSON / XML / CSV)
Now let’s configure REST API Connection in Talend. To read from JSON / XML Files you can use same steps too. We will use MSSQL JDBC Driver to connect to ZappySys Data Gateway.
- In Talend Go to Metadata > Db Connections (Right click) > Create Connection
- On the connection Wizard specify following attributes.
DB Version : Microsoft
Login : username you setup in zappysys data gateway
Password : password of data gateway user
Server : machine name or IP where zappysys data gateway is running
Port : <default is 5000> Port on which zappysys data gateway is listening
DataBase : Data source name you setup in zappysys data gateway (case-sensitive) - That’s all we need to do to setup a connection which can be used to read / write REST API data in Talend. In the next section we will see how to create a job to read data from REST API Service using this connection.
Read from REST API in Talend
Now let’s look at how to read data from REST API source or JSON / XML File using the connection we configured in the previous section.
Configure REST API Source
- Create a Talend JOB and double click to open designer
- Now drag and drop MSSQL Connection we created for ZappySys Data gateway, drop it on the designer surface. It will popup UI like below.
- Select tDBInput (Microsoft SQL Server). Remember that we are using MSSQL JDBC Driver to connect to ZappySys Data Gateway for REST API Call. This gateway uses Microsoft TDS Protocol so MSSQL JDBC driver is used to communicate.
- Now rename Source to something meaningful (e.g. Read from JSON REST API)
- Double click REST Source to configure.
- Enter Query like below (Make sure to enter between double quotes). See below examples to read from URL or File. If you have double quote in SQL then escape using \” character (e.g. select \”my col\” from $ )Read From REST API Url
12"SELECT * FROM valueWITH (SRC='https://services.odata.org/V3/Northwind/Northwind.svc/Customers?$format=json')"
12"SELECT * FROM valueWITH (SRC='c:\data\Customers_*.json')" - Click on Guess Schema button and Click OK to accept detected schema.
- Now we will configure target in the next section.
Configure Target (Delimited File)
- Now search for “FileOut” in the toolbox (Hit Enter). You will see tFileOutputDelimited so just select that for now and drag on the surface.
- Double click it to configure.
- Enter correct file path (e.g. “C:/Talend/workspace/rest-api-output.csv” )
- On Advanced Tab you can configure some additional settings (e.g. Throw an error if file already exists)
Connect and Run
- Once you have configured Source and Target its time to connect them
- Drag Source Port to Target to connect like below.
- Run the job
- That’s it. So in few clicks you loaded data from REST API to File in Talend Open Studio.
Write / Send data to REST API (POST Example)
There will be a time when you want to POST data to REST API service. Let’s check how to write POST query to submit data to REST API.
Just like how we did Read query in previous example, we can set POST Body in the SQL Query to send data. Use query like below and click Guess Schema button. If Blank Filter gives you no data error then make sure you remove Filter on Data Gateway Data source. (Notice we used \” to escape double quote inside query )
1 2 3 4 5 6 7 8 9 |
"SELECT * FROM _root_ WITH ( METHOD='POST' ,HEADER='Content-Type: text/plain || first-header: AAA || second-header: BBB' ,SRC='http://httpbin.org/post' ,BODY='{id:1,notes:\"Some notes\"}' ,FILTER='' ) " |
SQL Query Examples
Click on below link to learn more writing SQL Query using ZappySys Drivers.
JSON / REST Driver – SQL Query Examples
XML / SOAP Driver – SQL Query Examples
CSV / REST Driver – SQL Query Examples
REST API / XML SOAP Pagination Settings for Talend
Paginate by Response Attribute
This example shows how to paginate API calls where you need to paginate until the last page detected. In this example, next page is indicated by some attribute called nextlink (found in response). If this attribute is missing or null then it stops fetching the next page.SELECT * FROM $ WITH( SRC=@'https://zappysys.com/downloads/files/test/pagination_nextlink_inarray_1.json' ,NextUrlAttributeOrExpr = '$.nextlink' --keep reading until this attribute is missing. If attribute name contains dot then use brackets like this $.['my.attr.name'] )
Paginate by URL Parameter (Loop until certain StatusCode)
This example shows how to paginate API calls where you need to pass page number via URL. The driver keeps incrementing page number and calls next URL until the last page detected (401 error). There are few ways to indicate the last page (e.g. By status code, By row count, By response size). If you don't specify end detection then it will use the default (i.e. No records found).SELECT * FROM $ WITH ( SRC=@'https://zappysys.com/downloads/files/test/page-xml.aspx?page=1&mode=DetectBasedOnResponseStatusCode' ,PagingMode='ByUrlParameter' ,PagingByUrlAttributeName='page' ,PagingByUrlEndStrategy='DetectBasedOnResponseStatusCode' ,PagingByUrlCheckResponseStatusCode=401 ,IncrementBy=1 )
Paginate by URL Path (Loop until no record)
This example shows how to paginate API calls where you need to pass page number via URL Path. The driver keeps incrementing page number and calls next URL until the last page is detected. There are few ways to indicate the last page (e.g. By status code, By row count, By response size). If you don't specify end detection then it will use the default (i.e. No records found).SELECT * FROM $ WITH ( SRC=@'https://zappysys.com/downloads/files/test/cust-<%page%>.xml' ,PagingMode='ByUrlPath' ,PagingByUrlAttributeName='<%page%>' ,PagingByUrlEndStrategy='DetectBasedOnRecordCount' ,IncrementBy=1 )
Paginate by Header Link (RFC 5988)
API like GitHub / Wordpress use Next link in Headers (RFC 5988)SELECT * FROM $ LIMIT 25 WITH( Src='https://wordpress.org/news/wp-json/wp/v2/categories?per_page=10' ,PagingMode='ByResponseHeaderRfc5988' ,WaitTimeMs='200' --//wait 200 ms after each request )
REST API / SOAP Web Service Connection Settings for Talend
- HTTP
- OAuth
HTTP Connection
- SOAP WSS (when accessing a SOAP WebService)
- Static Token / API Key (when need to pass an API key in HTTP header)
- Dynamic Token (same as Static Token method except that each time you need to log in and retrieve a fresh API key)
- JWT Token (As per RFC 7519)
OAuth
If you are trying to access REST API resource, it is a huge chance, you will need to use OAuth Connection. Read this article to understand how OAuth authentication and authorization works and how to use it (article originally was written for SSIS PowerPack, but the concepts and UI stay the same): https://zappysys.com/blog/rest-api-authentication-with-oauth-2-0-using-ssisOther settings for REST API / SOAP XML Call in Talend
API Limit / Throttling
While calling public API or other external web services one important aspect you have to check, how many requests are allowed by your API. Especially when you use API pagination options to pull many records you have to slow down based on API limits. For example, your API may allow you only 5 requests per second. Use Throttling Tab on Driver UI to set delay after each request.2D Array Transformation
If you are using JSON or XML API Driver then possible you may have to transform your data using 2D array transformation feature. Check this link for more information.REST API / XML SOAP Performance Tips for Talend
Use Server-side filtering if possible in URL or Body Parameters
Many API supports filtering your data by URL parameters or via Body. Whenever possible try to use such features. Here is an example of odata API, In the below query the first query is faster than the second query because in the first query we filter at the server.SELECT * FROM value WITH( Src='https://services.odata.org/V3/Northwind/Northwind.svc/Customers?$format=json&$filter=Country eq ''USA''' ,DataFormat='Odata' ) -- Slow query - Client-side filtering SELECT * FROM value WHERE Country ='USA' WITH( Src='https://services.odata.org/V3/Northwind/Northwind.svc/Customers?$format=json' ,DataFormat='Odata' )
Avoid Special features in SQL Query (e.g. WHERE, Group By, Order By)
ZappySys API engine triggers client-side processing if special features are used in Query. Following SQL Features will trigger Client-Side processing which is several times slower than server-side processing. So always try to use simple query (Select col1, col2 .... from mytable )- WHERE Clause
- GROUP BY Clause
- HAVING Clause
- ORDER BY
- FUNCTIONS (e.g. Math, String, DateTime, Regex... )
Consider using pre-generated Metadata / Cache File
Use META option in WITH Clause to use static metadata (Pre-Generated)There are two more options to speedup query processing time. Check this article for details.-
select * from value WITH( meta='c:\temp\meta.txt' ) --OR-- select * from value WITH( meta='my-meta-name' ) --OR-- select * from value WITH( meta='[ {"Name": "col1", "Type": "String", Length: 100}, {"Name": "col2", "Type": "Int32"} ...... ]' )
- Enable Data Caching Options (Found on Property Grid > Advanced Mode Only )
Consider using Metadata / Data Caching Option
ZappySys API drivers support Caching Metadata and Data rows to speed up query processing. If your data doesn't change often then you can enable this option to speed up processing significantly. Check this article for details how to enable Data cache / metadata cache feature for datasource level or query level. To define cache option at query level you can use like below.SELECT * FROM $ WITH ( SRC='https://myhost.com/some-api' ,CachingMode='All' --cache metadata and data rows both ,CacheStorage='File' --or Memory ,CacheFileLocation='c:\temp\myquery.cache' ,CacheEntryTtl=300 --cache for 300 seconds )
Use --FAST Option to enable Stream Mode
ZappySys JSON / XML drivers support --FAST suffix for Filter. By using this suffix after Filter driver enables Stream Mode, Read this article to understand how this works.SELECT * FROM $ LIMIT 10 --//add this just to test how fast you can get 10 rows WITH( Filter='$.LargeArray[*]--FAST' --//Adding --FAST option turn on STREAM mode (large files) ,SRC='https://zappysys.com/downloads/files/test/large_file_100k_largearray_prop.json.gz' --,SRC='c:\data\large_file.json.gz' ,IncludeParentColumns='False' --//This Must be OFF for STREAM mode (read very large files) ,FileCompressionType='GZip' --Zip or None (Zip format only available for Local files) )
Calling SOAP Web Service in Talend
What is SOAP Web Service?
If you are new to SOAP Web Service sometimes referred as XML Web Service then please read some concept about SOAP Web service standard from this link There are two important aspects in SOAP Web service.- Getting WSDL file or URL
- Knowing exact Web Service URL
What is WSDL
In very simple term WSDL (often pronounced as whiz-dull) is nothing but a document which describes Service metadata (e.g. Functions you can call, Request parameters, response structure etc). Some service simply give you WSDL as xml file you can download on local machine and then analyze or sometimes you may get direct URL (e.g. http://api.mycompany.com/hr-soap-service/?wsdl )Example SQL Query for SOAP API call using ZappySys XML Driver
Here is an example SQL query you can write to call SOAP API. If you not sure about many details then check next few sections on how to use XML Driver User Interface to build desired SQL query to POST data to XML SOAP Web Service without any coding.SELECT * FROM $ WITH( Src='http://www.holidaywebservice.com/HolidayService_v2/HolidayService2.asmx' ,DataConnectionType='HTTP' ,CredentialType='Basic' --OR SoapWss ,SoapWssPasswordType='PasswordText' ,UserName='myuser' ,Password='pass$$w123' ,Filter='$.soap:Envelope.soap:Body.GetHolidaysAvailableResponse.GetHolidaysAvailableResult.HolidayCode[*]' ,ElementsToTreatAsArray='HolidayCode' ,RequestMethod='POST' ,Header='Content-Type: text/xml;charset=UTF-8 || SOAPAction: "http://www.holidaywebservice.com/HolidayService_v2/GetHolidaysAvailable"' ,RequestData=' <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:hol="http://www.holidaywebservice.com/HolidayService_v2/"> <soapenv:Header/> <soapenv:Body> <hol:GetHolidaysAvailable> <!--type: Country - enumeration: [Canada,GreatBritain,IrelandNorthern,IrelandRepublicOf,Scotland,UnitedStates]--> <hol:countryCode>UnitedStates</hol:countryCode> </hol:GetHolidaysAvailable> </soapenv:Body> </soapenv:Envelope>' )Now let's look at steps to create SQL query to call SOAP API. Later we will see how to generate code for your desired programming language (e.g. C# or SQL Server)
Video Tutorial - Introduction to SOAP Web Service and SoapUI tool
Before we dive into details about calling SOAP API using ZappySys XML Driver, lets first understand what is SOAP API and how to create SOAP requests using SoapUI tool. You will learn more about this process in the later section. The video contains some fragment about using SOAP API in SSIS but just ignore that part because we will be calling Soap API using ZappySys ODBC Driver rather than SSIS Components.Using SoapUI to test SOAP API call / Create Request Body XML
Assuming you have downloaded and installed SoapUI from here, now we are ready to use WSDL for your SOAP Web Service Calls. If you do not have WSDL file or URL handy then contact your API provider (sometimes you just have to add ?wsdl at the end of your Service URL to get WSDL so try that. Example: http://mycompany/myservice?wsdl ). If you don't know what is WSDL then in short, WSDL is Web service Description Language (i.e. XML file which describes your SOAP Service). WSDL helps to craft SOAP API request Body for ZappySys XML Driver. So Let's get started.- Open SoapUI and click SOAP button to create new SOAP Project
- Enter WSDL URL or File Path of WSDLFor example WSDL for our sample service can be accessed via this URL
http://www.dneonline.com/calculator.asmx?wsdl
Create new SOAP API Project in SoapUI tool for SOAP API Testing - Once WSDL is loaded you will see possible operations you can call for your SOAP Web Service.
- If your web service requires credentials then you have to configure it. There are two common credential types for public services (SOAP WSS or BASIC )
-
To use SOAP WSS Credentials select request node and enter UserId, Password, and WSS-PasswordType (PasswordText or PasswordHash)Configure SOAP WSS Credentials for SoapUI (SOAP API Testing Tool)
- To use BASIC Auth Credentials select request node and double-click it. At the bottom click on Auth (Basic) and From Authorization dropdown click Add New and Select Basic.
Configure Basic Authorization for SoapUI (SOAP API Testing Tool)
-
- Now you can test your request first Double-click on the request node to open request editor.
- Change necessary parameters, remove optional or unwanted parameters. If you want to regenerate request you can click on Recreate default request toolbar icon.
Create SOAP Request XML (With Optional Parameters)
- Once your SOAP Request XML is ready, Click the Play button in the toolbar to execute SOAP API Request and Response will appear in Right side panel. Test SOAP API using SoapUI Tool (Change Default XML Body / Parameters, Execute and See Response)
Create DSN using ZappySys XML Driver to call SOAP API
Once you have tested your SOAP API in SoapUI tool, we are ready to use ZappySys XML driver to call SOAP API in your preferred BI tool or Programming language.- First open ODBC Data Sources (search ODBC in your start menu or go under ZappySys > ODBC PowerPack > ODBC 64 bit)
- Goto System DSN Tab (or User DSN which is not used by Service account)
- Click Add and Select ZappySys XML Driver ZappySys ODBC Driver for XML / SOAP API
- Configure API URL, Request Method and Request Body as below ZappySys XML Driver - Calling SOAP API - Configure URL, Method, Body
- (This step is Optional) If your SOAP API requires credentials then Select Connection Type to HTTP and configure as below.
ZappySys XML Driver - Configure SOAP WSS Credentials or Basic Authorization (Userid, Password)
- Configure-Request Headers as below (You can get it from Request > Raw tab from SoapUI after you test the request by clicking the Play button) Configure SOAP API Request Headers - ZappySys XML Driver
- Once credentials entered you can select Filter to extract data from the desired node. Make sure to select array node (see special icon) or select the node which contains all necessary columns if you don't have array node. Select Filter - Extract data from nested XML / SOAP API Response (Denormalize Hierarchy)
- If prompted select yes to treat selected node as Array (This is helpful when you expect one or more record for selected node) Treat selected node as XML Array Option for SOAP API Response XML
Preview SOAP API Response / Generate SQL Code for SOAP API Call
Once you configure settings for XML Driver now you can preview data or generate example code for desired language (e.g. C#, Python, Java, SQL Server). Go to Preview tab and you will see default query generated based on settings you entered in previous sections. Attributes listed in WITH clause are optional. If you omit attribute in WITH clause it will use it from Properties tab.Preview Data
Preview SOAP API Response in ZappySys XML DriverGenerate Code Option
Conclusion
In this article, we used ZappySys Drivers to read data from JSON REST API / File. You can use same technique to consume SOAP / XML API or File. Try ODBC PowerPack for FREE and check out how easy it is to consume virtually any REST / SOAP API or Read from JSON / XML / CSV Files in Talend Open Studio.