SSIS Amazon Storage Task (AWS S3)
PreviousNext

Amazon S3 Storage Task can be used to perform various operations with Amazon S3 Storage objects (buckets and files) (e.g. Download, Upload, Copy, Move, Rename, Delete etc). You can also get single property of any S3 file or get list of files as ADO.net Table and which can be easily loop through using ForEachLoop Task. Most of operations in this task support recursive search using wild card pattern (e.g. Myfile*.txt). Amazon S3 is one of the most popular Cloud Storage Services available right now.

Download SSIS PowerPack

Content

Video Tutorial

Step-By-Step

In this section you will learn how to download file from Amazon S3 using Amazon S3 Storage Task. in few clicks.
  1. In order to connect to Amazon S3 from SSIS you will need Access Key and Secret Key. Ask your SysAdmin or responsible person to provide that to you. Your keys will look something like this: (this is just example key which may differ in your case)
    AccessKey: AKIAIOSFODNN7EXAMPLE
    SecretKey: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
                
    Click here to read more how to get your AccessKey and Secret Key

  2. Once you have Access Key and Secret Key you may proceed to create new SSIS package. In BIDS/SSDT create new SSIS package
  3. From toolbox drag BW Amazon Storage Task on the designer surface.
    configure-ssis-amazon-s3-storage-task
  4. Double click on the task to configure it.
  5. From the Storage Action dropdown select DownloadFileFromAmazon (This is default option).
  6. Under Source Connection click [New] to create S3 Connection Manager.
    configure-ssis-amazon-s3-storage-task
  7. On Connection Manager UI enter your Amazon S3 account Access Key and Secret Key and leave all other options as default. Click Test Connection and if test successful then hit OK to close the connection manager dialogbox.
    configure-ssis-amazon-s3-storage-task
  8. Now on Task UI browse source file you want to download. To browse Amazon S3 file click on browse button this will launch S3 File Picker dialogbox. After file is selected click OK.
    Note: If you want to download multiple files then use wild card pattern as below (when you use wild card pattern in source path then system will treat target path as folder regardless you end with slash )
    Examples:
    mybucket/myfile123.txt (single file)
    mybucket/myfile*.txt (all files starting with myfile)
    mybucket/mysubfolder1/myfile*.txt (all files starting with myfile and located under folder mysubfolder1)

    configure-ssis-amazon-s3-storage-task
  9. Once source file is selected browse target path where you want to place downloaded file. You may type folder path or exact file name you want to assign to downloaded file.
    configure-ssis-amazon-s3-storage-task
  10. Once Source and Target path configured you may run the package to test. That's it you downloaded S3 file to local machine without learning any Complex APIs.
    configure-ssis-amazon-s3-storage-task configure-ssis-amazon-s3-storage-task

Properties

Property Name Description
StorageAction This property determines what action needs to be performed on Amazon S3 object(s).

Available Options

  • DownloadFileFromAmazon : Download Amazon S3 file(s) to local machine (Wild card pattern is supported). File Transfer is done using multiple threads which can be configured using MaxParallelThreads property of S3 Connection(From Connection Manager UI)

  • UploadFileToAmazon : Upload local file(s) to Amazon S3 storage (Wild card pattern is supported). File Transfer is done using multiple threads which can be configured using MaxParallelThreads property of S3 Connection(From Connection Manager UI)

  • DeleteAmazonFile : Delete file(s) from Amazon S3 bucket (Wild card pattern is supported)

  • DeleteAmazonBucket : Delete single Amazon S3 bucket

  • CopyAmazonFileToAmazon : Copy Amazon S3 file(s) to different S3 bucket or different subfolder (Wild card pattern is supported).
    Note: In current version source and target S3 Account must be same for Copy/Move operation. However in the future we may add support to copy/move between different S3 Accounts.

  • MoveAmazonFileToAmazon : Move Amazon S3 file(s) to different S3 bucket or different subfolder (Wild card pattern is supported).
    Note: In current version source and target S3 Account must be same for Copy/Move operation. However in the future we may add support to copy/move between different S3 Accounts.

  • RenameAmazonFile : Rename single Amazon S3 file

  • SetFilePermission : Set permission for Amazon S3 file(s) (Wild card pattern is supported). Permission option is located under different tab on Task UI (Look under Storage and Security Tab).

  • CreateNewAmazonFile : Create new Amazon S3 file and write content from direct value or user defined SSIS variable at runtime.

  • CreateAmazonBucket : Create new Amazon S3 bucket. New bucket will be created in the selected region (specified on the S3 connection Manager UI). Throw error if bucket already exists.

  • CreateAmazonBucketIfNotExists : Create new Amazon S3 bucket only if its not found otherwise Task completes without error. New bucket will be created in the selected region (specified on the S3 connection Manager UI).

  • GetAmazonFileSize : Returns file(s) size as Int64 and stores in ResultSet Variable. Use wild card pattern to get total file size of multiple files using recursive option.

  • GetAmazonFileExistsFlag : Returns True if specified file found else return false.

  • GetAmazonFileCount : Returns count of total file(s) matching with wild card pattern.

  • GetAmazonFileLastModifiedDate : Returns date when file was last modified (Date is in Utc so to convert to local time add timezone difference).

  • GetAmazonFileContentType : Returns content type of specified file

  • GetAmazonFileMetadata : returns metadata of specified file in the following format
    <MetaData><User>SamW</User><Host>BW001-DEV</Host></MetaData>

  • GetAmazonFileDataAsString : Returns file content as string and stores to result variable

  • GetAmazonFileDataAsBinary : Returns file content as binary array and stores to result variable (Result variable must be set to Object datatype)

  • GetAmazonBucketExistsFlag : Returns True if specified bucket exists else returns False and result is stored into result variable

  • GetAmazonFileListAsADONETDataTable : Returns DataTable with the following information about one or more file(s). You can loop through this record set using Native ForEachLoop Task (Make sure you select [ADO.net dataset only] option as below)

    ssis-foreachloop-azure-storage-container

    ssis-foreachloop-azure-storage-container

    ssis-foreachloop-amazon-s3-files

    ssis-foreachloop-amazon-s3-files

  • GetAmazonBucketListAsADONETDataTable : Returns DataTable with following information about one or more bucket(s). You can loop through this record set using Native ForEachLoop Task (Make sure you select [ADO.net dataset only] option as below)

    ssis-foreachloop-amazon-s3-bucket

    ssis-foreachloop-amazon-s3-bucket

Permission This property determines what permission applied on new objects. Permissions are only applied if you are performing Upload, Copy, Move, Rename or Create File actions.

Available Options

  • Default : No permission applied. Under default option all S3 files are created/updated as NoACL setting(see next option).

  • NoACL : Owner gets FULL_CONTROL. No one else has access rights (default).

  • Private : Owner gets FULL_CONTROL. No one else has access rights (default).

  • PublicRead : Owner gets FULL_CONTROL and the anonymous principal is granted READ access. If this policy is used on an object, it can be read from a browser with no authentication.

  • PublicReadWrite : Owner gets FULL_CONTROL, the anonymous principal is granted READ and WRITE access. This can be a useful policy to apply to a bucket, but is generally not recommended.

  • AuthenticatedRead : Owner gets FULL_CONTROL, and any principal authenticated as a registered Amazon S3 user is granted READ access.

  • BucketOwnerRead : Object Owner gets FULL_CONTROL, Bucket Owner gets READ This ACL applies only to objects and is equivalent to private when used with PUT Bucket. You use this ACL to let someone other than the bucket owner write content (get full control) in the bucket but still grant the bucket owner read access to the objects.

  • BucketOwnerFullControl : Object Owner gets FULL_CONTROL, Bucket Owner gets FULL_CONTROL. This ACL applies only to objects and is equivalent to private when used with PUT Bucket. You use this ACL to let someone other than the bucket owner write content (get full control) in the bucket but still grant the bucket owner full rights over the objects.

  • LogDeliveryWrite : The LogDelivery group gets WRITE and READ_ACP permissions on the bucket.

AutoSetContentType When this option is True (default is false) it will automatically determines content type of the file during Upload, Create, Copy and Move operation.Default content type of any new file is application/octet-stream
UserDefinedContentTypes If content type is not correctly set for certain file extension then you may provide your own list of content types. It has to be in the following format. Each content type in new line
*.csv=text/plain
*.dtsx=text/plain
*.dtsconfig=text/plain
.....more entries
.....more entries
DefaultContentType By default every new file on Amazon S3 is set to content type application/octet-stream. If this is not what you want as default content type then you may provide UserDefinedContentTypes or just set this property to your own content type. System set content type in the following order

1. Check if UserDefinedContentTypes is set (if extension of file being processed match with user defined content type list then set file content Type)
2. Check if AutoSetContentType is set (if True then Lookup Predefined list and set file content Type)
3. Check if Check DefaultContentType is set (if DefaultContentType is set then use it for file being processed)
ThrowErrorIfSourceNotFound If this option is set to True (default is true) then when source path not found on Amazon S3 exception is thrown. If this option is False then no error is thrown when file not found.
DeleteSourceWhenDone If this option is set to True (default is false) then after successful upload/download operation source file is deleted. This option is applicable only for Upload and Download actions.
MetadataKeyValue For any new file you may specify metadata in Key/Value pair from Metadata Tab of task UI. If you want to enter from expression then use the following syntax for multiple Key/value pairs of metadata. This option is ignored if EnableMetadataUpdate=False (see next option).
<MetaData><User>SamW</User><Host>BW001-DEV</Host></MetaData>
EnableMetadataUpdate If this option is True (default is False) then MetadataKeyValue property is processed (see previous option)
SourceStorageConnection Connection Name for Source S3 Connection (Not applicable for UploadFileToAmazon and CreateNewAmazonFile operations)
SourcePathAccessMode Specifies how source path will be provided

Available Options

  • Direct (default) : Direct value is provided.

  • Variable : Value is read from specified variable at runtime.

  • Connection : Value is read from path specified by file connection. This option is only applicable when Source is local File system.

SourcePathValue Direct static value (only applicable when SourcePathAccessMode=Direct)
SourcePathVariable Variable Name (only applicable when SourcePathAccessMode=Variable)
SourcePathConnection Connection Name or ID (only applicable when SourcePathAccessMode=Connection)
Recursive Specifies whether source path pattern should scan sub folder (default false)
ResultVariable Variable name which will hold result from S3 Action (default false)
TargetStorageConnection Connection Name for target S3 Connection (Only applicable for UploadFileToAmazon, CopyAmazonFileToAmazon, MoveAmazonFileToAmazon and CreateNewAmazonFile operations)
TargetPathAccessMode Specifies how target path will be provided

Available Options

  • Direct (default) : Direct value is provided.

  • Variable : Value is read from specified variable at runtime.

  • Connection : Value is read from path specified by file connection. This option is only applicable when target is local File system.

TargetPathValue Direct static value (only applicable when TargetPathAccessMode=Direct)
TargetPathConnection Connection Name or ID which points to target file path (only applicable when TargetPathAccessMode=Connection)
TargetPathVariable Variable Name (only applicable when TargetPathAccessMode=Variable)
OverWriteOption Specifies whether overwrite action when target file exists

Available Options

  • Direct (default) : Direct value is provided.

  • Variable : Value is read from specified variable at runtime.

  • Connection : Value is read from path specified by file connection. This option is only applicable when target is local File system.

CreateMissingTargetFolder Create target folder if missing (default True). This option is applicable for DownloadAmazonFileToLocal
ExcludeRegXPattern Regular expression pattern to exclude items from selection (if you apply MyFile*.* filter for source path and then use ExcludeRegXPattern=(.msi|.exe) then it will include all files with matching name but exclude *.msi and *.exe
IncludeRegXPattern Regular expression pattern to include items from selection (if you apply MyFile*.* filter for source path and then use IncludeRegXPattern=(.txt|.csv) then it will include only txt and csv files with matching name pattern
SortBy Property by which you want to sort result

Available Options

Option Description
Size File Size in Bytes
LastModifiedDate File Last Modified DateTime
CreationDate File Creation DateTime
AgeInDays File Age in Days
LastEditInDays Last Edit in Days
Content File Content
Exists File Exists Flag
FileCount File Count
FolderPath File Directory Path
FilePath File Path
FileName File Name
FileExtension File Extension
FileEncoding File Encoding
SortDirection Sort order (e.g. Ascending or Descending)

Available Options

Option Description
Asc Asc
Desc Desc
WhereClause Where clause expression to filter items. (e.g. Size>100 and Extension IN ('.txt','.csv') )
EnableSort Sort items by specified attribute
MaxItems Maximum items to return (e.g. TOP)
ContinueOnError This option prevents task failure on error such as transfer of file(s) failed for some reason, delete operation failed for few files.
LoggingMode Determines logging detail level. Available options are Normal, Medium, Detailed, Debugging

Remarks

Encryption

Amazon S3 Task supports both encryption options offered by AWS (Server Side and Client Side encryption). This section will provide information about each encryption type so you can pick what is best suited for your need.
  • Client Side Encryption: In client side encryption data is encrypted using customer generated encryption key before uploading to AWS server. Client side encryption option can be configured on S3 Connection Manager UI (See Encryption Tab). Unlike server-side encryption in this option your key never sent to AWS server along with your Web Request so this option is more secure than Server-Side Encryption becuase in order to get your data hacker has to get AccessKey/SecrentKey and then encryption key. For client side encryption you have two options.
    • AES 256 bit Encryption (Asymmetric Encryption)
    • RSA 1024 bit Encryption (Symmetric Encryption)
  • Server Side Encryption: In server side encryption data is encrypted after it reaches to AWS server and automatically decrypted when data leaves AWS network. Server side encryption option can be configured on Security and Storage tab of S3 Task. In Server side encryption you can have two options as below.
    • Server Side Encryption - Using AWS Generated Encryption Key (AES 256bit): This is the most easiest option because you don't have to worry about Encryption key. Data is automatically encrypted after it reaches to S3 storage and automatically decrypted when data leaves S3 storage. However this option is not as secure as other options because if attacker gets Amazon AccessKey and SecretKey then he can get access to your data. In this option AWS manages encryption keys.
    • Server Side Encryption - Using Customer Generated Encryption Key (AES 256bit): In this option you have to supply encryption key along with your web request. Encryption key travels to S3 storage along with your data to S3. Once data reaches to S3 storage AWS encrypts it using encryption key you supplied. Once data is encrypted key is discarted from AWS server. In this option you have to manage encryption keys.

Settings UI

ui-amazon-s3-task-settings

Security & Storage UI

ui-amazon-s3-task-storage-security
Load Encrypted files to Redshift using Client Side Encyption

Metadata UI

ui-amazon-s3-task-metadata ui-amazon-s3-task-content-type

FAQ

References

See Also


Copyrights reserved. ZappySys LLC.