Read Amazon S3 Storage Files in SSIS (CSV, JSON, XML)

Introduction

Amazon S3 - AWS StorageIn our previous blog we saw how to load data into Amazon S3. Now in this blog, we will see How to read Amazon S3 Storage Files in SSIS (CSV, JSON, XML Format files). To illustrate, we will use ZappySys SSIS PowerPack, which includes several tasks to import/export data from multiples sources to multiple destinations like flat files, Azure, AWS, databases, Office files and more. They are Coding free, drag and drop high-performance suite of Custom SSIS Components and SSIS Tasks. If you like perform File operations on Amazon S3 Files (e.g. Download, Upload, Create, Delete) then check these articles.

In nutshell, this post will focus on how to Read Amazon S3 Storage CSV, JSON and XML Files using respective SSIS Source tasks.

 

Components Mentioned in this article

Prerequisite

  1. First, you will need to have SSIS installed
  2. Secondly, make sure to have SSDT
  3. You have obtained Amazon S3 account access key / secret key.
  4. Finally, do not forget to install ZappySys SSIS PowerPack

What is Amazon S3 Storage

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. This means customers of all sizes and industries can use it to store and protect any amount of data for a range of use cases, such as websites, mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics. Amazon S3 provides easy-to-use management features so you can organize your data and configure finely-tuned access controls to meet your specific business, organizational, and compliance requirements. Amazon S3 is designed for 99.999999999% (11 9’s) of durability, and stores data for millions of applications for companies all around the world.

Getting Started

In order to start, we will show several examples. ZappySys includes an SSIS Amazon S3 Source for CSV/JSON/XML File that will help you in reading CSV, JSON and XML Files from Amazon S3 to the Local machine, Upload files(s) to Amazon S3 Storage. It will also support Delete, Rename, List, Get Property, Copy, Move, Create, Set Permission … and many more operations. Here we are showing you is, How to download files from Amazon S3 Storage.

You can connect to your Amazon S3 Account by entering your storage account credentials.

Read Amazon S3 Storage Files in SSIS (CSV, JSON, XML)

Let´s start with an example. In this SSIS Amazon S3 Source for CSV/JSON/XML File task example, we will read CSV/JSON/XML files from Amazon S3 Storage to SQL Server database.

  1. First of All, Drag and drop Data Flow Task from SSIS Toolbox and double click it to edit.
    Drag and Drop SSIS Data Flow Task from SSIS Toolbox

    Drag and Drop SSIS Data Flow Task from SSIS Toolbox

  2. Drag and Drop relevant Amazon S3 Source for CSV/JSON/XML File Task from the SSIS Toolbox.
    Add Amazon S3 Source Tasks

    Add Amazon S3 Source Tasks

  3. Create a connection for Amazon S3 Storage Account.
    Create Amazon S3 Storage Connection

    Create Amazon S3 Storage Connection

  4. Select the relevant single file to read from Amazon S3 Storage in their relevant source of CSV/JSON/XML File Task.
    Select File From Azure Blob Storage

    Select File From Amazon S3 Storage

  5. We can also read the multiple files stored in Amazon S3 Storage using wildcard pattern supported e.g. dbo.tblNames*.csv / dbo.tblNames*.json / dbo.tblNames*.xml in relevant source task
    Use wildcard pattern .* to read multiple files data

    Use wildcard pattern .* to read multiple files data

  6. We can also read the zip and gzip compressed files also without extracting it in the specific Amazon S3 Source for CSV/JSON/XML File Task.
    Reading zip and gzip compressed files (stream mode)

    Reading zip and gzip compressed files (stream mode)

  7. Finally, we are ready to load this file(s) data into the SQL Server.

Load Amazon S3 files data into SQL Server

ZappySys SSIS PowerPack makes it easy to load data from various sources such as REST, SOAP, JSON, XML, CSV or from other source into SQL Server, or PostgreSQL, or Amazon Redshift, or other targets. The Upsert Destination component allows you to automatically insert new records and update existing ones based on key columns. Below are the detailed steps to configure it.

Step 1: Add Upsert Destination to Data Flow

  1. Drag and drop the Upsert Destination component from the SSIS Toolbox.
  2. Connect your source component (e.g., JSON / REST / Other Source) to the Upsert Destination.

SSIS - Data Flow - Drang and Drop Upsert Destination Component

Step 2: Configure Target Connection

  1. Double-click the Upsert Destination component to open the configuration window.
  2. Under Connection, select an existing target connection or click NEW to create a new connection.
    • Example: SQL Server, or PostgreSQL, or Amazon Redshift.

Step 3: Select or Create Target Table

  1. In the Target Table dropdown, select the table where you want to load data.
  2. Optionally, click NEW to create a new table based on the source columns.

Configure SSIS Upsert Destination Connection - Loading data (REST / SOAP / JSON / XML /CSV) into SQL Server or other target using SSIS

Step 4: Map Columns

  1. Go to the Mappings tab.
  2. Click Auto Map to map source columns to target columns by name.
  3. Ensure you check the Primary key column(s) that will determine whether a record is inserted or updated.
  4. You can manually adjust the mappings if necessary.

SSIS Upsert Destination - Columns Mappings

Step 5: Save Settings

  • Click OK to save the Upsert Destination configuration.

Step 6: Optional: Add Logging or Analysis

  • You may add extra destination components to log the number of inserted vs. updated records for monitoring or auditing purposes.

Step 7: Execute the Package

  • Run your SSIS package and verify that the data is correctly inserted and updated in the target table.

SSIS Upsert Destination Execution

Conclusion

Above all, in this blog, we learned how to Read Amazon S3 Storage Files in SSIS. We used Amazon S3 Source for CSV fileAmazon S3 Source for JSON file and Amazon S3 Source for XML file to read the file(s) from Amazon S3 Storage and load data into SQL server. You can download SSIS PowerPack here to try many other scenarios not discussed in this blog along with 70+ other components.

References

Finally, you can use the following links for more information:

Posted in S3 (Simple Storage Service), SSIS Amazon S3 Connection, SSIS Amazon S3 CSV Source, SSIS Amazon S3 JSON Source, SSIS Amazon S3 XML Source and tagged , , , , , , .