Amazon S3 CSV File Connector for SSIS

Amazon S3 CSV File Connector can be used to read CSV Files stored in AWS S3 Buckets. Using this you can easily integrate AWS S3 CSV File data. It's supports latest security standards, and optimized for large data files. It also supports reading compressed files (e.g. GZip /Zip).

In this article you will learn how to quickly and efficiently integrate Amazon S3 CSV File data in SSIS without coding. We will use high-performance Amazon S3 CSV File Connector to easily connect to Amazon S3 CSV File and then access the data inside SSIS.

Let's follow the steps below to see how we can accomplish that!

Prerequisites

Before we begin, make sure the following prerequisites are met:

  1. SSIS designer installed. Sometimes it is referred as BIDS or SSDT (download it from Microsoft).
  2. Basic knowledge of SSIS package development using Microsoft SQL Server Integration Services.
  3. SSIS PowerPack is installed (if you are new to SSIS PowerPack, then get started!).

Read data from Amazon S3 CSV File in SSIS using Amazon S3 CSV File Source (Export data)

In this section we will learn how to configure and use Amazon S3 CSV File Connector in the API Source to extract data from the Amazon S3 CSV File using Amazon S3 CSV File Source.

  1. Begin with opening Visual Studio and Create a New Project.

  2. Select Integration Service Project and in new project window set the appropriate name and location for project. And click OK.

    In the new SSIS project screen you will find the following:

    • SSIS ToolBox on left side bar
    • Solution Explorer and Property Window on right bar
    • Control flow, data flow, event Handlers, Package Explorer in tab windows
    • Connection Manager Window in the bottom
    SSIS Project Screen
    Note: If you don't see ZappySys SSIS PowerPack Task or Components in SSIS Toolbox, please refer to this help link.
  3. Now, Drag and Drop SSIS Data Flow Task from SSIS Toolbox. Double click on the Data Flow Task to see Data Flow designer.

    SSIS Data Flow Task - Drag and Drop
  4. From the SSIS toolbox drag and drop Amazon S3 CSV File Source on the dataflow designer surface
    SSIS Amazon S3 CSV File Source - Drag and Drop

  5. Double click on Amazon S3 CSV File Source component to configure it.

  6. Create and configure a connection for the Amazon S3 storage account.

    Create Amazon S3 Storage Connection
  7. You can use select your desired single file by clicking [...] path button.

    mybucket/dbo.tblNames.csv
    dbo.tblNames.csv
    Read Amazon S3 CSV File data


    ----------OR----------

    You can also read the multiple files stored in Amazon S3 Storage using wildcard pattern supported e.g. dbo.tblNames*.csv.

    Note: If you want to operation with multiple files then use wild card pattern as below 
    (when you use wild card pattern in source path then system will treat target path as folder regardless you end with slash)
    
    mybucket/dbo.tblNames.csv (will read only single .CSV file)
    mybucket/dbo.tbl*.csv (all files starting with file name)
    mybucket/*.csv (all files with .csv Extension and located under folder subfolder)
    

    mybucket/dbo.tblNames*.csv
    Use wildcard pattern .* to read multiple Amazon S3 Files data


    ----------OR----------

    You can also read the zip and gzip compressed files also without extracting it in using Amazon S3 CSV File Source.

    Reading zip and gzip compressed files (stream mode)
  8. Click on Preview button to see the data and click on OK.

    Amazon S3 CSV Files data Preview
  9. That's it; we are done. In a few clicks we configured the to Read the Amazon S3 CSV File data using ZappySys Amazon S3 CSV File Connector

Load Amazon S3 CSV File data into SQL Server using Upsert Destination (Insert or Update)

Once you configured the data source, you can load Amazon S3 CSV File data into SQL Server using Upsert Destination.

Upsert Destination can merge or synchronize source data with the target table. It supports Microsoft SQL Server, PostgreSQL, and Redshift databases as targets. Upsert Destination also supports very fast bulk upsert operation along with bulk delete.

Upsert operation - a database operation which performs INSERT or UPDATE SQL commands based on record's existence condition in the target table. It inserts records that don't have matching records in the target table or updates them, if they do, by matching them by key columns.

Upsert Destination supports INSERT, UPDATE, and DELETE operations, so it is similar to SQL Server's MERGE command, except it can be used directly in SSIS package.

  1. From the SSIS Toolbox drag-and-drop Upsert Destination component onto the Data Flow designer background.

  2. Connect your SSIS source component to Upsert Destination.

  3. Double-click on Upsert Destination component to open configuration window.

  4. Start by selecting the Action from the list.

  5. Next, select the desired target connection or create one by clicking <New [provider] Connection> menu item from the Target Connection dropdown.

  6. Then select a table from the Target Table list or click New button to create a new table based on the source columns.

  7. Continue by checking Insert and Update options according to your scenario (e.g. if Update option is unchecked, no updates will be made).

  8. Finally, click Map All button to map all columns and then select the Key columns to match the columns on:

    Configure SSIS Upsert Destination component to merge data with SQL Server, PostgreSQL, or Redshift table
  9. Click OK to save the configuration.

  10. Run the package and Amazon S3 CSV File data will be merged with the target table in SQL Server, PostgreSQL, or Redshift:

    Execute Package - Reading data from API Source and load into target
  11. Done!

Deploy and schedule SSIS package

After you are done creating SSIS package, most likely, you want to deploy it to SQL Server Catalog and run it periodically. Just follow the instructions in this article:

Running SSIS package in Azure Data Factory (ADF)

To use SSIS PowerPack in ADF, you must first prepare Azure-SSIS Integration Runtime. Follow this link for detailed instructions:

Centralized data access via Data Gateway

In some situations, you may need to provide Amazon S3 CSV File data access to multiple users or services. Configuring the data source on a Data Gateway creates a single, centralized connection point for this purpose.

This configuration provides two primary advantages:

  • Centralized data access
    The data source is configured once on the gateway, eliminating the need to set it up individually on each user's machine or application. This significantly simplifies the management process.
  • Centralized access control
    Since all connections route through the gateway, access can be governed or revoked from a single location for all users.
Data Gateway
Local ODBC
data source
Simple configuration
Installation Single machine Per machine
Connectivity Local and remote Local only
Connections limit Limited by License Unlimited
Central data access
Central access control
More flexible cost

If you need any of these requirements, you will have to create a data source in Data Gateway to connect to Amazon S3 CSV File, and to create an ODBC data source to connect to Data Gateway in SSIS.

Let's not wait and get going!

Creating Amazon S3 CSV File data source in Gateway

In this section we will create a data source for Amazon S3 CSV File in Data Gateway. Let's follow these steps to accomplish that:

  1. Download and install ODBC PowerPack.

  2. Search for gateway in Windows Start Menu and open ZappySys Data Gateway Configuration:

    Opening Data Gateway
  3. Go to Users tab and follow these steps to add a Data Gateway user:

    • Click Add button
    • In Login field enter username, e.g., john
    • Then enter a Password
    • Check Is Administrator checkbox
    • Click OK to save
    Data Gateway - Adding User
  4. Now we are ready to add a data source:

    • Click Add button
    • Give Datasource a name (have it handy for later)
    • Then select Native - ZappySys Amazon S3 CSV Driver
    • Finally, click OK
    AmazonS3CsvFileDSN
    ZappySys Amazon S3 CSV Driver
    Data Gateway - Adding data source
  5. When the ZappySys Amazon S3 CSV Driver configuration window opens, configure the Data Source the same way you configured it in ODBC Data Sources (64-bit), in the beginning of this article.

  6. Very important step. Now, after creating or modifying the data source make sure you:

    • Click the Save button to persist your changes.
    • Hit Yes, once asked if you want to restart the Data Gateway service.

    This will ensure all changes are properly applied:

    ZappySys Data Gateway - Save Changes
    Skipping this step may result in the new settings not taking effect and, therefore you will not be able to connect to the data source.

Creating ODBC data source for Data Gateway

In this part we will create ODBC data source to connect to Data Gateway from SSIS. To achieve that, let's perform these steps:

  1. Open ODBC Data Sources (x64):

    Open ODBC Data Source
  2. Create a User data source (User DSN) based on ODBC Driver 17 for SQL Server:

    ODBC Driver 17 for SQL Server
    Create new User DSN for ODBC Driver 17 for SQL Server
    If you don't see ODBC Driver 17 for SQL Server driver in the list, choose a similar version driver.
  3. Then set a Name of the data source (e.g. Gateway) and the address of the Data Gateway:

    GatewayDSN
    localhost,5000
    ODBC driver for SQL Server - Setting hostname and port
    Make sure you separate the hostname and port with a comma, e.g. localhost,5000.
  4. Proceed with authentication part:

    • Select SQL Server authentication
    • In Login ID field enter the user name you used in Data Gateway, e.g., john
    • Set Password to the one you configured in Data Gateway
    ODBC driver for SQL Server - Selecting SQL Authentication
  5. Then set the default database property to AmazonS3CsvFileDSN (the one we used in Data Gateway):

    AmazonS3CsvFileDSN
    ODBC driver for SQL Server - Selecting database
  6. Continue by checking Trust server certificate option:

    ODBC driver for SQL Server - Trusting certificate
  7. Once you do that, test the connection:

    ODBC driver for SQL Server - Testing connection
  8. If connection is successful, everything is good:

    ODBC driver for SQL Server - Testing connection succeeded
  9. Done!

We are ready to move to the final step. Let's do it!

Accessing data in SSIS via Data Gateway

Finally, we are ready to read data from Amazon S3 CSV File in SSIS via Data Gateway. Follow these final steps:

  1. Go back to SSIS.

  2. From the SSIS toolbox drag and drop ODBC Source on the dataflow designer surface:

    Drag-and-drop ODBC Source onto Control Flow in SSIS
  3. Double-click on ODBC Source component to configure it.

  4. Click on New... button, it will open Configure ODBC Connection Manager window. Once it opens, click on New... button to create a new ODBC connection to Amazon S3 CSV File ODBC data source:

    Create ODBC Connection Manager in SSIS
  5. Then choose the data source from the list and click Test Connection button. If the connection test is successful, close the window, and then click OK button to finish the configuration:

    GatewayDSN
    Create ODBC Connection Manager in SSIS
  6. Read the data the same way we discussed at the beginning of this article.

  7. That's it!

Now you can connect to Amazon S3 CSV File data in SSIS via the Data Gateway.

If you are asked for authentication details, use Database authentication or SQL Authentication option and enter credentials you used when configuring Data Gateway, e.g. john and your password.

Conclusion

In this article we showed you how to connect to Amazon S3 CSV File in SSIS and integrate data without any coding, saving you time and effort.

We encourage you to download Amazon S3 CSV File Connector for SSIS and see how easy it is to use it for yourself or your team.

If you have any questions, feel free to contact ZappySys support team. You can also open a live chat immediately by clicking on the chat icon below.

Download Amazon S3 CSV File Connector for SSIS Documentation

More integrations

Other connectors for SSIS

All
Big Data & NoSQL
Database
CRM & ERP
Marketing
Collaboration
Cloud Storage
Reporting
Commerce
API & Files

Other application integration scenarios for Amazon S3 CSV File

All
Data Integration
Database
BI & Reporting
Productivity
Programming Languages
Automation & Scripting
ODBC applications