How to integrate Cosmos DB using Talend Studio
Learn how to quickly and efficiently connect Cosmos DB with Talend Studio for smooth data access.
Read and write Azure Cosmos DB data effortlessly. Query, integrate, and manage databases, containers, documents, and users — almost no coding required. You can do it all using the high-performance Cosmos DB ODBC Driver for Talend Studio (often referred to as the Cosmos DB Connector). We'll walk you through the entire setup.
Ready to dive in? Download the product to jump right in, or follow the step-by-step guide below to see how it works.
Create data source in ZappySys Data Gateway
In this section we will create a data source for Cosmos DB in the Data Gateway. Let's follow these steps to accomplish that:
-
Download and install ODBC PowerPack (if you haven't already).
-
Search for
gatewayin the Windows Start Menu and open ZappySys Data Gateway Configuration:
-
Go to the Users tab and follow these steps to add a Data Gateway user:
- Click the Add button
-
In the Login field enter a username, e.g.,
john - Then enter a Password
- Check the Is Administrator checkbox
- Click OK to save
-
Now we are ready to add a data source:
- Click the Add button
- Give the Data source a name (have it handy for later)
- Then select Native - ZappySys API Driver
- Finally, click OK
CosmosDbDSNZappySys API Driver
-
When the Configuration window appears give your data source a name if you haven't done that already, then select "Cosmos DB" from the list of Popular Connectors. If "Cosmos DB" is not present in the list, then click "Search Online" and download it. Then set the path to the location where you downloaded it. Finally, click Continue >> to proceed with configuring the DSN:
CosmosDbDSNCosmos DB
-
Now it's time to configure the Connection Manager. Select Authentication Type, e.g. Token Authentication. Then select API Base URL (in most cases, the default one is the right one). More info is available in the Authentication section.
Cosmos DB authentication
Connecting to your Azure Cosmos DB data requires you to authenticate your REST API access. Follow the instructions below:- Go to your Azure portal homepage: https://portal.azure.com/.
- In the search bar at the top of the homepage, enter Azure Cosmos DB. In the dropdown that appears, select Azure Cosmos DB.
- Click on the name of the database account you want to connect to (also copy and paste the name of the database account for later use).
-
On the next page where you can see all of the database account information, look along the left side and select Keys:
- On the Keys page, you will have two tabs: Read-write Keys and Read-only Keys. If you are going to write data to your database, you need to remain on the Read-write Keys tab. If you are only going to read data from your database, you should select the Read-only Keys tab.
- On the Keys page, copy the PRIMARY KEY value and paste it somewhere for later use (the SECONDARY KEY value may also be copied and used).
- Now go to SSIS package or ODBC data source and use this PRIMARY KEY in API Key authentication configuration.
- Enter the primary or secondary key you recorded in step 6 into the Primary or Secondary Key field.
- Then enter the database account you recorded in step 3 into the Database Account field.
- Next, enter or select the default database you want to connect to using the Default Database field.
- Continue by entering or selecting the default table (i.e. container/collection) you want to connect to using the Default Table (Container/Collection) field.
- Select the Test Connection button at the bottom of the window to verify proper connectivity with your Azure Cosmos DB account.
- If the connection test succeeds, select OK.
- Done! Now you are ready to use Cosmos DB Connector!
API Connection Manager configuration
Just perform these simple steps to finish authentication configuration:
-
Set Authentication Type to
API Key [Http] - Optional step. Modify API Base URL if needed (in most cases default will work).
- Fill in all the required parameters and set optional parameters if needed.
- Finally, hit OK button:
CosmosDbDSNCosmos DBAPI Key [Http]https://[$Account$].documents.azure.comRequired Parameters Primary or Secondary Key Fill-in the parameter... Account Name (Case-Sensitive) Fill-in the parameter... Database Name (keep blank to use default) Case-Sensitive Fill-in the parameter... API Version Fill-in the parameter... Optional Parameters Default Table (needed to invoke #DirectSQL)
-
Once the data source connection has been configured, it's time to configure the SQL query. Select the Preview tab and then click Query Builder button to configure the SQL query:
ZappySys API Driver - Cosmos DBRead and write Azure Cosmos DB data effortlessly. Query, integrate, and manage databases, containers, documents, and users — almost no coding required.CosmosDbDSN
-
Start by selecting the Table or Endpoint you are interested in and then configure the parameters. This will generate a query that we will use in Talend Studio to retrieve data from Cosmos DB. Hit OK button to use this query in the next step.
#DirectSQL SELECT * FROM root where root.id !=null order by root._ts desc
Some parameters configured in this window will be passed to the Cosmos DB API, e.g. filtering parameters. It means that filtering will be done on the server side (instead of the client side), enabling you to get only the meaningful datamuch faster . -
Now hit Preview Data button to preview the data using the generated SQL query. If you are satisfied with the result, use this query in Talend Studio:
ZappySys API Driver - Cosmos DBRead and write Azure Cosmos DB data effortlessly. Query, integrate, and manage databases, containers, documents, and users — almost no coding required.CosmosDbDSN#DirectSQL SELECT * FROM root where root.id !=null order by root._ts desc
You can also access data quickly from the tables dropdown by selecting <Select table>.AWHEREclause,LIMITkeyword will be performed on the client side, meaning that thewhole result set will be retrieved from the Cosmos DB API first, and only then the filtering will be applied to the data. If possible, it is recommended to use parameters in Query Builder to filter the data on the server side (in Cosmos DB servers). -
Click OK to finish creating the data source.
-
Once done, go to the Network Settings tab and Add a firewall rule for inbound traffic:
- This will initially allow all inbound traffic.
- Click Edit IP filters to restrict access to specific IP addresses or ranges.
-
Crucial Step: After creating or modifying the data source, you must:
- Click the Save button to persist your changes.
- Hit Yes when prompted to restart the Data Gateway service.
This ensures all changes are properly applied:
Skipping this step may cause the new settings to fail, preventing you from connecting to the data source.
Read Cosmos DB data in Talend Studio
To read Cosmos DB data in Talend Studio, we'll need to complete several steps. Let's get through them all right away!
Create connection for input
- First of all, open Talend Studio
-
Create a new connection:
-
Select Microsoft SQL Server connection:
-
Name your connection:
-
Fill-in connection parameters and then click Test connection:
CosmosDbDSN
-
If the List of modules not installed for this operation window shows up, then download and install all of them:
Review and accept all additional module license agreements during the process -
Finally, you should see a successful connection test result at the end:
Add input
-
Once we have a connection to ZappySys Data Gateway created, we can proceed by creating a job:
-
Simply drag and drop ZappySys Data Gateway connection onto the job:
-
Then create an input based on ZappySys Data Gateway connection:
-
Continue by configuring a SQL query and click Guess schema button:
-
Finish by configuring the schema, for example:
Add output
We are ready to add an output. From Palette drag and drop a tFileOutputDelimited output and connect it to the input:
Run the job
Finally, run the job and integrate your Cosmos DB data:
Supported Cosmos DB Connector actions
Got a specific use case in mind? We've mapped out exactly how to perform a variety of essential Cosmos DB operations directly in Talend Studio, so you don't have to figure out the setup from scratch. Check out the step-by-step guides below:
- Create a document in the container
- Create Permission Token for a User (One Table)
- Create User for Database
- Delete a Document by Id
- Get All Documents for a Table
- Get All Users for a Database
- Get Database Information by Id or Name
- Get Document by Id
- Get List of Databases
- Get List of Tables
- Get table information by Id or Name
- Get table partition key ranges
- Get User by Id or Name
- Query documents using Cosmos DB SQL query language
- Update Document in the Container
- Upsert a document in the container
- Make Generic REST API Request
- Make Generic REST API Request (Bulk Write)
Conclusion
In this article we showed you how to connect to Cosmos DB in Talend Studio and integrate data without writing complex code — all of this was powered by Cosmos DB ODBC Driver.
Download ODBC PowerPack now or ping us via chat if you have any questions or are looking for a specific feature (you can also reach out to us by submitting a ticket):