Import Azure Blob files into SQL Server (CSV/JSON/XML Driver)

Introduction

There might be a case when you have many CSV, JSON or XML files in Azure Blob and you want them to be imported straight into a SQL Server table. Here come ZappySys ODBC PowerPack and ZappySys Data Gateway (part of ODBC PowerPack) which will enable you to accomplish that. ZappySys ODBC PowerPack includes powerful Azure Blob CSV, Azure Blob JSON and Azure Blob XML drivers that let you connect to an Azure Storage Explorer and read the contents from the files in the Container. Bringing in ZappySys Data Gateway allows doing that right from a SQL Server. Let’s begin and see how to Import Azure Blob files into SQL Server (CSV/JSON/XML Driver).

These drivers are related to this article:

Prerequisites

  1. Firstly, Download and install Microsoft Azure Storage Emulator and Microsoft Azure Storage Explorer.
  2. SQL Server instance installed (can be a SQL Server Express LocalDB instance).
  3. SQL Server Management Studio (SSMS) installed.
  4. ZappySys ODBC PowerPack installed (must be installed on a Windows machine; can be a different machine than SQL Server is installed on).

NOTE: If you want to use Live account (Azure Storage) then you can skip Step #3

Getting started

In order to start, we will show several examples. ZappySys includes Data Gateway that will help you in reading data of Azure Blob files. Here we are showing you is, how to Import Azure Blob files into SQL Server (CSV/JSON/XML Driver).

You can connect to Azure Storage Service from SSIS, you will need Storage Account Name and Access Key. Ask your SysAdmin or responsible person to provide that information to you. Click here to read more about how to get your Storage Account Name and Access Key. Here are sample Credentials.

If you don’t have Azure Storage account then you can try offline mode on your local machine. You can just download Azure Storage Emulator and start testing.

Setup Azure Storage Explorer

Setup Azure Storage client tools we will start from a simple example of how to load Invoice Data from a CSV that is located in Azure Blob, then we will load JSON contacts data from them and finally, we will proceed by loading contacts from many XML files those are compressed in ZIP.

  1. Once you have downloaded and installed storage emulator You can launch Microsoft Azure Storage Emulator from its Physical location or from the desktop or start menu shortcut.

    Azure Storage Emulator Physical Location

    Azure Storage Emulator Physical Location

  2. If you can see the below-attached Command Prompt screen after Emulator started. Then you can proceed to start Microsoft Azure Storage Explorer as the Azure Storage Emulator is started successfully.

    Command Prompt Screen after Microsoft Azure Storage Emulator Started

    Command Prompt Screen after Microsoft Azure Storage Emulator Started

  3. Now, you have to download and install Microsoft Azure Storage Explorer and then you can launch Microsoft Azure Storage Explorer from its Physical location or from the desktop or start menu shortcut.

    Microsoft Azure Storage Explorer Location

    Microsoft Azure Storage Explorer Location.

  4. For Creating a Blob Container, First of all, you need to go to Microsoft Storage Explorer Window. Then you can go through like this way (Storage Accounts –> (Development) –> Blob Containers).
    Microsoft Azure Storage Explorer : Create Blob Container

    Microsoft Azure Storage Explorer: Create a Blob Container

Open and Configure ZappySys Data Gateway

Now let's look at steps to configure Data Gateway after installation:
  1. Assuming you have installed ZappySys ODBC PowerPack using default options (Which also enables Data Gateway Service)
  2. Search "Gateway" in your start menu and click ZappySys Data Gateway
    Open ZappySys Data Gateway

    Open ZappySys Data Gateway

  3. First make sure Gateway Service is running (Verify Start icon is disabled)
  4. Also verify Port on General Tab
    Port Number setting on ZappySys Data Gateway

    Port Number setting on ZappySys Data Gateway

  5. Now go to Users tab. Click Add icon to add a new user. Check Is admin to give access to all data sources you add in future. If you don't check admin then you have to manually configure user permission for each data source.
    Add Data Gateway User

    Add Data Gateway User

 

Import CSV file located at Azure Blob into SQL Server

Overview

Let’s say you have CSV file that is located in Azure Blob Container, and you want them all data into your SQL Server. We will follow the same steps for that we will do for import JSON and XML file into SQL Server.

Azure Blob Container - CSV File Located

Azure Blob Container – CSV File Located

Create an Azure Blob CSV data source in ZappySys Data Gateway

  1. The first thing you will have to do is to create a data source in ZappySys Data Gateway. In the Data Source tab, Just click on Add button, give the data source a name, e.g. “MyInvoiceCSV“, and then select Native – ZappySys Azure Blob CSV Driver.
    Create Data Source - Azure Blob CSV Driver

    Create Data Source – Azure Blob CSV Driver

  2. Then click on Edit and add the Data Gateway user you created in the Users tab. We will use this user later when adding a Linked Server to the Data Gateway to authenticate:
  3. Now, click on the Edit button to configure “MyInvoiceCSV” Azure Blob CSV Data Source.
    Configure Data Gateway

    Configure Data Gateway

  4. When a window open, click on Click here to Configure the Connection and select Use the Microsoft Azure Storage Emulator. If you have Online Storage Account and Account Key then Select Enter Storage Account Credentials and enter it.
    Create Azure Storage Connection

    Create Azure Storage Connection

  5. Now, Select Azure Blob Container and file from it and In the Data Format / Compression (Zip / GZip) tab set suitable file Compression Format (Zip or GZip).
    ZappySys Azure Blob CSV - Configure Driver

    ZappySys Azure Blob CSV – Configure Driver

  6. Go to the Preview tab and click on the Preview Data button to make sure everything is configured correctly and preview the results.
    ZappySys Azure Blob CSV - Preview Data

    ZappySys Azure Blob CSV – Preview Data

  7. Now, Click on OK button also Save and Restart Service.
    ZappySys Data Gateway Service - Save and Restart

    ZappySys Data Gateway Service – Save and Restart

  8. That’s all, now we are ready to proceed and add a Linked Server to this data source.

Set up a SQL Server Linked Server

Once you configured the data source in Gateway, we can now set up a Linked Server in a SQL Server.
  1. Open SSMS and connect to a SQL Server.
  2. Go to Root > Server Objects > Linked Servers node. Right click and click New Linked Server...
    Add Linked Server in SQL Server

    Adding Linked Server in SQL Server

  3. Now enter the linked server name, select Provider as SQL Native Client.
  4. Enter data source as GatewayServerName,PORT_NUMBER where server name is where ZappySys Gateway is running (can be the same as SQL Server machine or a remote machine). Default PORT_NUMBER is 5000 but confirm that on the Gateway > General tab in case it's different.
  5. Enter Catalog Name. This must match the name from the Gateway Data sources grid > Name column:

    Configuring Linked Server Provider, Catalog, Server, Port for ZappySys Data Gateway Connection

  6. Click on Security Tab and select the last option "Be made using this security context". Enter your gateway user account here.

    Configuring Linked Server credentials

  7. Optional: Under the Server Options Tab, Enable RPC and RPC Out and Disable Promotion of Distributed Transactions(MSDTC).

    RPC and MSDTC Settings

    RPC and MSDTC Settings


    You need to enable RPC Out if you plan to use EXEC(...) AT [MY_LINKED_SERVER_NAME] rather than OPENQUERY.
    If don't enabled it, you will encounter the 'Server "MY_LINKED_SERVER_NAME" is not configured for RPC' error.

    Query Example: EXEC('Select * from Products') AT [MY_LINKED_SERVER_NAME]


    If you plan to use 'INSERT INTO...EXEC(....) AT [MY_LINKED_SERVER_NAME]' in that case you need to Disable Promotion of Distributed Transactions(MSDTC).
    If don't disabled it, you will encounter the 'The operation could not be performed because OLE DB provider "SQLNCLI11" for linked server "MY_LINKED_SERVER_NAME" was unable to begin a distributed transaction.' error.

    Query Example:

    Insert Into dbo.Products 
    EXEC('Select * from Products') AT [MY_LINKED_SERVER_NAME]

  8. Click OK to save the Linked Server.

Execute the SQL query

Once again, if you created the Linked Server with name “GATEWAY”, execute this SQL query in SSMS:

SELECT * INTO MyInvoices FROM OPENQUERY([GATEWAY], 'SELECT * FROM $')
SELECT * FROM MyInvoices

The results

You should see a similar view, once you execute the query:

The results of loading many CSVs from Azure Blob into SQL Server

The results of loading many CSVs from Azure Blob into SQL Server

Import JSON file located at Azure Blob into SQL Server

Overview

Let’s say you have JSON file that is located in Azure Blob Container, and you want them all data into your SQL Server. We will follow the same steps of the import CSV file into SQL Server and we will do for XML file into SQL Server.

Create an Azure Blob JSON data source in ZappySys Data Gateway

Follow the steps as in Create an Azure Blob CSV data source in ZappySys Data Gateway section when adding a new data source, except for these two steps:

  1. The first thing you will have to do is to create a data source in ZappySys Data Gateway. Just click Add button, give the data source a name, e.g. “MyContactsJSON“, and then select Native – ZappySys Azure Blob JSON Driver.
    ZappySys Data Gateway - Create New Data Source

    ZappySys Data Gateway – Create New Data Source

  2. After that, select a JSON file from container you want to load, and then click Select Filter button to choose data you want to be displayed in SQL Server.
    Configuring the JSON data source to load a JSON from Azure Blob container into SQL Server

    Configuring the JSON data source to load a JSON from Azure Blob container into SQL Server

Now you are ready to create a Linked Server and connect to the data source you just created. If you used “MyContactsJSON” as the data source name, make sure to use the same name when creating a Linked Server.

Set up a SQL Server Linked Server

Again, follow the very same steps from the section above where we added the Linked Server to a CSV data source, except that change Catalog property of the Linked Server to match the name of the Azure Blob JSON data source we created. Once you do that, you are ready to load JSON into your SQL Server.

Execute the SQL query

Once you created the Linked Server to ZappySys Data Gateway, you are ready to execute the SQL query and load data into SQL Server. Supposedly, you created the Linked Server with the name “GATEWAY”, then open SSMS and execute the following query:

SELECT * INTO MyContacts FROM openquery([GATEWAY], 'SELECT * FROM $')
SELECT * FROM MyContacts

The results

You should see similar results after you execute the query:

The results of getting JSON data straight from Azure Blob JSON

The results of getting JSON data straight from Azure Blob JSON

Now we are ready to proceed to the next section and import many XML files at once.

Import many XML files located at Azure Blob into SQL Server

Overview

Supposedly, you have many XMLs where each one is zipped and you want to load them all into a SQL Server table.

Zipped XML files located in Azure Blob container to be loaded into SQL Server

Zipped XML files located in Azure Blob container to be loaded into SQL Server

The first thing you will have to do is to create a data source, based on ZappySys Azure Blob XML Driver. Let’s proceed and just do that.

Create an Azure Blob XML data source in ZappySys Data Gateway

Follow the same steps as in Create an Azure Blob JSON data source in ZappySys Data Gateway section when adding a new data source, except for these two steps:

  1. When adding a data source select Native – ZappySys Azure Blob XML Driver as Connector Type.
    Choosing "ZappySys Azure Blob XML Driver" to load XMLs to SQL Server

    Choosing “ZappySys Azure Blob XML Driver” to load XMLs to SQL Server

  2. Then click on Edit and select Azure Blob Container and file from it and In the Data Format / Compression (Zip / GZip) tab set suitable file Compression Format (Zip or GZip).
    Configuring data source based on ZappySys Azure Blob XML Driver to load XMLs into SQL Server

    Configuring data source based on ZappySys Azure Blob XML Driver to load XMLs into SQL Server

  3. Now we are ready to set up a Linked Server to this newly created data source.

Set up a SQL Server Linked Server

Follow the very same steps from the section above where we added the Linked Server to a CSV data source, except that change Catalog property of the Linked Server to match the name of the Azure Blob XML data source you created. Once you do that, you are ready to load XMLs into your SQL Server.

Execute the SQL query

Again, if you created the Linked Server with name “GATEWAY”, execute the following SQL query in SSMS:

SELECT * INTO MyContacts FROM openquery([GATEWAY], 'SELECT * FROM $')
SELECT * FROM MyContacts

The results

You should see a similar view once you execute the query:

The data of many XMLs loaded from Azure Blob Container into SQL Server

The data of many XMLs loaded from Azure Blob Container into SQL Server

Now we are ready to move to the next section and recursively scan CSVs and load them into SQL Server.

Conclusion

We had a goal to import Azure Blob files into SQL Server from SQL Server itself. We learned how to Import Azure Blob files into SQL Server (CSV/JSON/XML Driver), which some of them were zipped and were located at different folders and subfolders. It was possible to accomplish that using ODBC PowerPack, ZappySys Data Gateway, and ODBC PowerPack drivers – Azure Blob JSON Driver, Azure Blob XML Driver, and Azure Blob CSV Driver. At first, we created data sources in the Data Gateway, then we created Linked Servers in SQL Server and connected to ZappySys Data Gateway, which finally made it possible to load data from Azure Blob straight from SQL queries in SSMS.

References

Finally, you can use the following links for more information:

Posted in Azure Blob CSV Driver, Azure Blob JSON Driver, Azure Blob XML Driver, ODBC Drivers, ODBC Gateway, ODBC PowerPack and tagged , , , , , , .