Introduction
In our previous posts we looked at how to use SSIS Amazon Storage Task (Amazon S3 Task) to automate some Amazon S3 related tasks in SSIS. In this blog post will continue discovering more use cases. Let’s learn how to delete Amazon S3 File, delete Amazon S3 Folder and delete Amazon S3 Bucket.
Deleting S3 Files are straight forward using SSIS Amazon Storage Task (Amazon S3 Task but deleting Bucket or Folders require some additional checks and steps, so let’s look at that in depth.
Requirements
Before we look into Step-By-Step section to call Amazon S3 related operations in SSIS let’s make sure you met the following requirements.
- SSIS designer installed. Sometimes it is referred as BIDS or SSDT (download it from Microsoft site).
- Basic knowledge of SSIS package development using Microsoft SQL Server Integration Services.
- ZappySys SSIS PowerPack installed. Click on the link to download a FREE trial.
- You have obtained Amazon S3 account access key / secret key.
Configure SSIS Amazon S3 Connection
Before we perform any S3 related action, you must configure Amazon Storage Connection. This connection manager support connectivity to not only S3 Storage but also other AWS Storage such as DynamoDB, Redshift
There are two ways you can create Amazon Storage Connection Manager in SSIS designer. You can right click in Connection Manager Panel and create new ZS-AMAZON-STORAGE connection type or use Task UI and click New connection.
For now, we will use easier method which is Create S3 Connection from S3 Storage Task UI.
- Open SSIS Package
- Drag ZS Amazon S3 Storage Task from the toolbox
- Double click it to edit. You can now click on New Connection like below.
- On the connection Manager UI. Select S3 as service type. Enter your Access Key, Secret Key and Select Region. We also support IAM Role for EC2 instances (For this package must run inside EC2 VM). Check this article.
- Click Test connection to confirm its working. Click OK to Save Connection Manager UI.
- Back to Task UI click OK to close save (We will configure task in later sections.
Using SSIS Amazon S3 Task for common actions
Before we deep dive in delete scenarios let’s first look at what is SSIS Amazon Storage Task (Amazon S3 Task). This task is developed to achieve many S3 related operations. See below screenshot to get idea about many actions you can perform using just one task. In this post we will focus only on delete actions (for file, folder and bucket). In other posts we will cover different operations in depth.
Delete Amazon S3 Files
Once you have configured SSIS S3 Connection Manager, we can perform different S3 related operations using many different Amazon S3 related tasks / components. For now let’s just look at how to delete files from S3 bucket.
Delete Multiple S3 files – Using Simple Pattern Search (wildcard)
Here is how to delete amazon s3 file / files using wildcard pattern search. Using * or ? allowed. (Example SalesOrder_*.??? )
Delete Multiple S3 files – Using Advanced Search – Regex / SQL Expression
Sometimes it’s necessary to filter file based on advanced attributes such as file Age (How old), File Size, Multiple File Patterns (e.g. Delete *.zip, *.gz, *.rar in one action)
You can use Advanced Tab to filter further. First supply primary filter on Source Tab (e.g. mybucket/*.* ) and then supply your advanced filter (e.g. you can exclude *.exe and *.msi file from your delete list as below screenshot, also only delete files between 1000 and 50000 bytes in file size).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
////////////////////////////// Examples: ////////////////////////////// AgeInDays < 5 //Get files created in last 5 days LastEditInDays > 3 //Get files modified before 3 days AgeInDays > 10 AND LastEditInDays < 3 //Get files created before 10 days and modified in last 3 days Size < 1024 //Get files where File size is less than 1024 bytes CreationDate BETWEEN #2015-01-01# AND #2015-01-01# Name IN ('File1.txt', 'Download.msi') FullPath LIKE 'mybucket\somefolder\month_*') Extension IN ('.txt','.csv') LastModifiedDate > #2013-01-01T16:00:00# IsReadOnly = True ///////////////////////////////////////// //Supported Columns for expression: ///////////////////////////////////////// Name, FullPath, Extension, FolderPath, AgeInDays, LastEditInDays, Size, CreationDate, CreationDateUtc, LastModifiedDate, LastModifiedDateUtc, IsReadOnly |
Delete Amazon S3 files older than X days
If you like to delete files older than certain days then use below technique.
- First Specify primary search criteria on Source tab
- Go to Advanced Filter Tab and specify Following Pattern in Where Textbox (For example delete Files older than 10 days )
1AgeInDays > 10
Delete Amazon S3 Bucket
Now let’s look at how to delete Amazon S3 Bucket including all its content. There is an option to delete Bucket but it wont delete files and empty folders. So you have to first clear all its content before calling Delete bucket action.
Here is high level steps to delete bucket in SSIS using ZappySys S3 Task.
- First call Delete S3 Files Action using Source Path = mybucket/*. Delete file action won’t delete any real folders (i.e. name ends with slash).
- Once files are deleted from bucket you can get list of empty folders
- Select Action = Get Amazon folder list as ADO.net Table
- In the path enter Path = mybucket/*
- In the Result Variable Create or Select variable (e.g. User::FolderList)
- Use ForEachLoop Task to Delete empty folders (Use Delete File Action because there is no folder concept in S3. File name ends with slash is treated as folder)
- In the ForEachLoop Task select enumerator as Foreach ADO enumerator
- In the variable select a variable which holds list of folders (obtained in previous step)
- On the Variable Mappings Tab. Select Some variable (e.g. User::FolderPath) to map to Index =0
- Drag new S3 Task inside ForEachLoop and Select Delete File Action
- In the File Path enter variable as {{User::FolderPath}} (or whatever name you given which holds iteration FolderPath in previous step)
- Call Delete Bucket Action once bucket is totally empty.
Delete Amazon S3 Folders
There is no real folder concept in Amazon S3 file system. S3 File System is just key / value system. Key is basically file name (e.g. somefolder/subfolder/myfile.txt). You can use slash in name to indicate hierarchy which can be used by some client apps like ZappySys S3 File Picker to Mimic Treeview like structure so user feel like real file system but there are no rules if you have to use slash or not.
Most client tools for S3 treats any file name ends with slash as real folder. See below step to delete amazon s3 folder (real folder which was created explicitly – mentioned in next section).
- Open S3 Task and select action = Delete Amazon File(s)
- In the path enter your Folder Path (e.g. mybucketname/output/sub2/ )
Create Amazon S3 Folder
Some time you may want to create real S3 folder which is not treated as file. In that case use this technique to create s3 folder.
Conclusion
In this post we saw how easy it is to use SSIS Amazon Storage Task (Amazon S3 Task) to delete Amazon S3 file, folder and bucket. You can also perform other common amazon storage related actions (e.g. Upload / Download / Create / List ). Download SSIS PowerPack to try this component and 70+ other tasks / components which are not covered in this article.