How to integrate Cosmos DB using Talend Studio

Integrate Talend Studio and Cosmos DB
Integrate Talend Studio and Cosmos DB

Learn how to quickly and efficiently connect Cosmos DB with Talend Studio for smooth data access.

Read and write Azure Cosmos DB data effortlessly. Query, integrate, and manage databases, containers, documents, and users — almost no coding required. You can do it all using the high-performance Cosmos DB ODBC Driver for Talend Studio (often referred to as the Cosmos DB Connector). We'll walk you through the entire setup.

Ready to dive in? Download the product to jump right in, or follow the step-by-step guide below to see how it works.

Create data source in ZappySys Data Gateway

In this section we will create a data source for Cosmos DB in the Data Gateway. Let's follow these steps to accomplish that:

  1. Download and install ODBC PowerPack (if you haven't already).

  2. Search for gateway in the Windows Start Menu and open ZappySys Data Gateway Configuration:

    Open ZappySys Data Gateway Service Manager
  3. Go to the Users tab and follow these steps to add a Data Gateway user:

    • Click the Add button
    • In the Login field enter a username, e.g., john
    • Then enter a Password
    • Check the Is Administrator checkbox
    • Click OK to save
    Data Gateway - Add User
  4. Now we are ready to add a data source:

    • Click the Add button
    • Give the Data source a name (have it handy for later)
    • Then select Native - ZappySys API Driver
    • Finally, click OK
    CosmosDbDSN
    ZappySys API Driver
    Data Gateway - Add data source
  5. When the Configuration window appears give your data source a name if you haven't done that already, then select "Cosmos DB" from the list of Popular Connectors. If "Cosmos DB" is not present in the list, then click "Search Online" and download it. Then set the path to the location where you downloaded it. Finally, click Continue >> to proceed with configuring the DSN:

    CosmosDbDSN
    Cosmos DB
    ODBC DSN Template Selection
  6. Now it's time to configure the Connection Manager. Select Authentication Type, e.g. Token Authentication. Then select API Base URL (in most cases, the default one is the right one). More info is available in the Authentication section.

    Cosmos DB authentication
    Connecting to your Azure Cosmos DB data requires you to authenticate your REST API access. Follow the instructions below:
    1. Go to your Azure portal homepage: https://portal.azure.com/.
    2. In the search bar at the top of the homepage, enter Azure Cosmos DB. In the dropdown that appears, select Azure Cosmos DB.
    3. Click on the name of the database account you want to connect to (also copy and paste the name of the database account for later use).
    4. On the next page where you can see all of the database account information, look along the left side and select Keys: Use API key to get Cosmos DB data via REST API in Azure
    5. On the Keys page, you will have two tabs: Read-write Keys and Read-only Keys. If you are going to write data to your database, you need to remain on the Read-write Keys tab. If you are only going to read data from your database, you should select the Read-only Keys tab.
    6. On the Keys page, copy the PRIMARY KEY value and paste it somewhere for later use (the SECONDARY KEY value may also be copied and used).
    7. Now go to SSIS package or ODBC data source and use this PRIMARY KEY in API Key authentication configuration.
    8. Enter the primary or secondary key you recorded in step 6 into the Primary or Secondary Key field.
    9. Then enter the database account you recorded in step 3 into the Database Account field.
    10. Next, enter or select the default database you want to connect to using the Default Database field.
    11. Continue by entering or selecting the default table (i.e. container/collection) you want to connect to using the Default Table (Container/Collection) field.
    12. Select the Test Connection button at the bottom of the window to verify proper connectivity with your Azure Cosmos DB account.
    13. If the connection test succeeds, select OK.
    14. Done! Now you are ready to use Cosmos DB Connector!
    API Connection Manager configuration

    Just perform these simple steps to finish authentication configuration:

    1. Set Authentication Type to API Key [Http]
    2. Optional step. Modify API Base URL if needed (in most cases default will work).
    3. Fill in all the required parameters and set optional parameters if needed.
    4. Finally, hit OK button:
    CosmosDbDSN
    Cosmos DB
    API Key [Http]
    https://[$Account$].documents.azure.com
    Required Parameters
    Primary or Secondary Key Fill-in the parameter...
    Account Name (Case-Sensitive) Fill-in the parameter...
    Database Name (keep blank to use default) Case-Sensitive Fill-in the parameter...
    API Version Fill-in the parameter...
    Optional Parameters
    Default Table (needed to invoke #DirectSQL)
    ODBC DSN HTTP Connection Configuration

  7. Once the data source connection has been configured, it's time to configure the SQL query. Select the Preview tab and then click Query Builder button to configure the SQL query:

    ZappySys API Driver - Cosmos DB
    Read and write Azure Cosmos DB data effortlessly. Query, integrate, and manage databases, containers, documents, and users — almost no coding required.
    CosmosDbDSN
    Open Query Builder in API ODBC Driver to read and write data to REST API
  8. Start by selecting the Table or Endpoint you are interested in and then configure the parameters. This will generate a query that we will use in Talend Studio to retrieve data from Cosmos DB. Hit OK button to use this query in the next step.

    #DirectSQL SELECT * FROM root where root.id !=null order by root._ts desc
    Configure table/endpoint parameters in ODBC data source based on API Driver
    Some parameters configured in this window will be passed to the Cosmos DB API, e.g. filtering parameters. It means that filtering will be done on the server side (instead of the client side), enabling you to get only the meaningful data much faster.
  9. Now hit Preview Data button to preview the data using the generated SQL query. If you are satisfied with the result, use this query in Talend Studio:

    ZappySys API Driver - Cosmos DB
    Read and write Azure Cosmos DB data effortlessly. Query, integrate, and manage databases, containers, documents, and users — almost no coding required.
    CosmosDbDSN
    #DirectSQL SELECT * FROM root where root.id !=null order by root._ts desc
    API ODBC Driver-based data source data preview
    You can also access data quickly from the tables dropdown by selecting <Select table>.
    A WHERE clause, LIMIT keyword will be performed on the client side, meaning that the whole result set will be retrieved from the Cosmos DB API first, and only then the filtering will be applied to the data. If possible, it is recommended to use parameters in Query Builder to filter the data on the server side (in Cosmos DB servers).
  10. Click OK to finish creating the data source.

  11. Once done, go to the Network Settings tab and Add a firewall rule for inbound traffic:

    Data Gateway - Add firewall rule for inbound connections
    • This will initially allow all inbound traffic.
    • Click Edit IP filters to restrict access to specific IP addresses or ranges.
  12. Crucial Step: After creating or modifying the data source, you must:

    • Click the Save button to persist your changes.
    • Hit Yes when prompted to restart the Data Gateway service.

    This ensures all changes are properly applied:

    ZappySys Data Gateway - Save Changes
    Skipping this step may cause the new settings to fail, preventing you from connecting to the data source.

Read Cosmos DB data in Talend Studio

To read Cosmos DB data in Talend Studio, we'll need to complete several steps. Let's get through them all right away!

This article is compatible with Talend Open Studio (a free version, currently retired by Qlik). If you don't have it, you can still purchase a shareware version of Talend Studio from Qlik.

Create connection for input

  1. First of all, open Talend Studio
  2. Create a new connection: Creating a new connection in Talend Studio
  3. Select Microsoft SQL Server connection: Creating SQL Server connection in Talend Studio
  4. Name your connection: Naming a connection in Talend Studio
  5. Fill-in connection parameters and then click Test connection:
    CosmosDbDSN
    Configuring the ZappySys Data Gateway connection in Talend Studio
  6. If the List of modules not installed for this operation window shows up, then download and install all of them: Configure the connection
    Review and accept all additional module license agreements during the process
  7. Finally, you should see a successful connection test result at the end: Connection test successful

Add input

  1. Once we have a connection to ZappySys Data Gateway created, we can proceed by creating a job:

    Create a job in Talend Studio
  2. Simply drag and drop ZappySys Data Gateway connection onto the job:

    Creating an input based on ZappySys Data Gateway connection
  3. Then create an input based on ZappySys Data Gateway connection:

    Creating an input based on ZappySys Data Gateway connection
  4. Continue by configuring a SQL query and click Guess schema button:

    Configuring a SQL query in Talend Studio
  5. Finish by configuring the schema, for example:

    Configuring a schema in Talend Studio

Add output

We are ready to add an output. From Palette drag and drop a tFileOutputDelimited output and connect it to the input: Connecting tFileOutputDelimited output in Talend Studio

Run the job

Finally, run the job and integrate your Cosmos DB data: Integrating Cosmos DB data in Talend Studio

Supported Cosmos DB Connector actions

Got a specific use case in mind? We've mapped out exactly how to perform a variety of essential Cosmos DB operations directly in Talend Studio, so you don't have to figure out the setup from scratch. Check out the step-by-step guides below:

Conclusion

In this article we showed you how to connect to Cosmos DB in Talend Studio and integrate data without writing complex code — all of this was powered by Cosmos DB ODBC Driver.

Download ODBC PowerPack now or ping us via chat if you have any questions or are looking for a specific feature (you can also reach out to us by submitting a ticket):

Explore Talend Studio connectors

All
Big Data & NoSQL
Database
CRM & ERP
Marketing
Collaboration
Cloud Storage
Reporting
Commerce
API & Files

More Cosmos DB integrations

All
Data Integration
Database
BI & Reporting
Productivity
Programming Languages
Automation & Scripting
ODBC applications