SSIS Amazon DynamoDB Source Adapter (Bulk Extract,Read,Copy NoSQL Data)

Help > Source Adapters > Amazon DynamoDB Source >

SSIS-Amazon-DynamoDB-Source-Adpater
Amazon DynamoDB Source can be used to extract large amount of data from Amazon DynamoDB. This component supports SQL like query language to query DynamoDB data without worrying about complex API. Amazon DynamoDB is a cloud based NoSQL data storage service from Amazon AWS

Video Tutorial

Coming soon...

Step-By-Step

In this section you will learn how to use Amazon DynamoDB Source Adapter to extract data(In this case Table and Query Mode).

In order to connect to Amazon DynamoDB from SSIS you will need Access Key and Secret Key. Ask your SysAdmin or responsible person to provide that to you. Your keys will look something like this: (this is just example key which may differ in your case)
```
AccessKey: AKIAIOSFODNN7EXAMPLE
SecretKey: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
```
Click here to read more how to get your AccessKey and Secret Key.
Once you have Account Key (its like UserID) and Secret Key (its like password) you may proceed to create new SSIS package. In BIDS/SSDT create new SSIS package.
From the SSIS toolbox drag and drop Data Flow Task on the controlflow designer surface.
Double click on the DataFlow task to see DataFlow designer surface.
From the SSIS toolbox drag and drop Amazon DynamoDB Source on the dataflow designer surface.
Double click Amazon DynamoDB Source to configure it.
Click on [New] button for Select Connection Manager. Select ServiceType from Top Dropdown, Enter AccessKey and SecrentKey and Click Test connection to verify credentials.
Click OK to save connection information.
Select table from the Tables dropdown or change AccessMode to Query and type SQL Statement and click preview. You can use SQL Style query language.

---- OR ----
Click OK to save settings.
From the SSIS toolbox drag and drop Trash Destination on the data flow designer surface.
Now single click on the Amazon DynamoDB Source, once you see blue arrow from source ... connect it to Trash Destination.
Double click on ZS Trash Destination to Configure it.
Click on OK button to save Trash Destination configure setting UI.
Right click on the path and Add Data Viewer.
Execute the package and verify source data in data viewer.

Component Properties

Property Name

Description

LoggingMode

LoggingMode determines how much information is logged during Package Execution. Set Logging mode to Debugging for maximum log.

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option	Description
Normal [0]	Normal
Medium [1]	Medium
Detailed [2]	Detailed
Debugging [3]	Debugging

PrefixTimestamp

When you enable this property it will prefix timestamp before Log messages.

TableName

Table name from where you want to read data

BatchSize

Indicates how many records are process in a single request. Setting this value to 0 will extract all records in single request.

AccessMode

Indicates Access Mode of data (e.g. Table or Sql Query)

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option	Description
Table [0]	Table
Query [1]	Query

SqlCommand

Specifies SQL query to access data. This setting is ignored when access mode is set to Table.

OutputMode

Specifies how you want to output columns. If you select RAW JSON Document mode then only single document will output for each record in DT_NTEXT datatype

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option	Description
Columns [0]	Output Columns - Default Mode
ColumnsWithCleanJson [2]	Output Columns (RegX Clean Mode - SLOW)
RawDocuments [10]	Output Raw JSON documents
RawDocumentsShellMode [11]	Output Raw JSON documents - Keep Native Syntax

IndentOutput

Indent JSON output so its easy to read.

DateFormatString

Specifies how custom date formatted strings are parsed when reading JSON.

DateParseHandling

Specifies how date formatted strings, e.g. Date(1198908717056) and 2012-03-21T05:40Z, are parsed when reading JSON.

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option	Description
None [0]	Keep date as string
DateTime [1]	Convert to DateTime (Timezone lost)
DateTimeOffset [2]	Convert to DateTimeOffset (Preserve Time zone)

EnableJsonPath

This option allows you to extract array content and sub documents of array.

Filter

Enter expression here to filter data.(Example: $.Users[*].UserName ) This will fetch User names from users records

IncludeParentColumns

Use this option to include parent properties (Non array) in the output along with Filtered Rows

IncludeParentColumnsWhenChildMissing

By default child and parent information is not included in the output if children not found for specified expression. For example if you want to extract all orders from all customers nodes then you can type $.Customers[*].Orders[*]. This will fetch all orders from all customers. By default customers records without orders wont be included in the output. If you want to include those customers where orders not found then check this option (Output null information for order attributes). This behavior is similar to LEFT OUTER JOIN in SQL (Left side is parent, right side is child). This option is ***resource intensive*** so only check if you really care about this behavior.

IncludeParentColumnsWithArrayType

Set this option to true if you want to output parent columns which are array. By default any parent column which is an array is not included in output. See also FilterForParentColumnsWithArrayType property if you set this property

FilterForParentColumnsWithArrayType

Filter expression to extract value form parent

ParentColumnPrefix

Prefix for parent column name. This option is only valid if you have set IncludeParentColumns=True

ThrowErrorIfPropertyMissing

Throw error if property name specified in filter expression is missing. By default it will ignore any missing property errors.

MaxLevelsToScan

This property how many nested levels should be scanned to fetch various properties. 0=Scan all child levels.

ExcludedProperties

List comma separated property names from JSON document which you want to exclude from output. Specify parent property name to exclude all child nodes.

LevelSeparator

Property level separator used in generated property name (separator for outer properties - Above selected filter node). Use this if default separator is producing duplicate property name which is conflicting with existing name.

EnableArrayFlattening

Enables deep array flattening for selected filtered hierarchy. When you turn on this property it will flatten each property of each array item and expose as column (e.g. If you have Filter set as $.customers[*] and for each customer you have an array of Addresses then you may see output columns like Addresses.1.City, Addresses.1.State, Addresses.2.City, Addresses.2.State .... Addresses.N.City, Addresses.N.State). You can control how many array items you want to flatten by setting MaxArrayItemsToFlatten property.

MaxArrayItemsToFlatten

Maximum number of array items to flatten. inner array flattening. Adjust this property to control how many columns being generated. This option is ignored if you set EnableArrayFlattening=false

MetaDataScanMode

Metadata scan mode controls how data type and length is determined. By default few records scanned to determine datatype/length. Changing ScanMode affects length/datatype accuracy.

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option	Description
Auto [0]	Auto
Strict [1]	Strict - Exact length
Guess2x [2]	Guess2x - 2 times bigger
Guess3x [3]	Guess3x - 3 times bigger
Guess4x [4]	Guess4x - 4 times bigger
TreatAsUnicodeString [5]	Set all columns as string
Guess10x [6]	Guess10x - 10 times bigger
TreatStringAsMaxLength [7]	Set string columns with MAX Length - i.e. DT_WSTR(4000)
TreatStringAsBlob [8]	Set string columns as BLOB - i.e. DT_NTEXT

MetaDataCustomLength

Length for all string column. This option is only valid for MetaDataScanMode=Custom

MetaDataTreatStringAsAscii

When this option is true, it detects all string values as DT_STR (Ascii) rather than DT_WSTR (Unicode)

Output Column Properties

Error handling

This component supports error output.
Any bad value will cause Error or Truncation (e.g. If your sample value is 5 character long but output column length is set to 3 chars). If you have error output attached with RedirectRow option on Error, Truncation then you can capture bad rows along with exact reason of error.
Use error handling tab to set option for error handling. By default component fails on any data error or truncation issue

Remarks

Amazon DynamoDB API doesn't support SQL Queries like traditional RDBMS so we have implemented light weight parser which takes your SQL Queries and turn into Native DynamoDB query. Here is simple example

SELECT TOP 2 * FROM Customer WHERE (CustomerID='ALFKI' OR CustomerID='BOLID')

SELECT Age, CustomerID FROM Customer WHERE CustomerID IN ('ALFKI','BOLID','C3')

DynamoDB SQL Query Language

If you work with traditional RDBMS and you recently come to NoSQL world you will quickly realize that most of NoSQL database including DynamoDB is missing familiar SQL query language. But no worry we got you covered. We have implemented custom query parser which will do hard work for you so you don't have to spend time learning new query language for DynamoDB. Our query engine will convert SQL query to Native DynamoDB query language.

SQL Grammar for DynamoDB:

<Select Statement> ::= SELECT [{TOP Clause}] {Column List} {From Clause} [{Where Clause}] [{With Clause}]

<TOP Clause>   ::= TOP IntegerLiteral
<Column List>   ::= * | IDENTIFIER [ [,IDENTIFIER] [,IDENTIFIER] ... ]
<From Clause>   ::= FROM IDENTIFIER
<Where Clause>  ::= WHERE <Expression List>
<With Clause>   ::= WITH ( [SCAN|QUERY] [,INDEX=your_index_name] [,LIMIT=some_integer] [,SCAN_INDEX_FORWARD=true|false] [,CONSISTENT_READ] )

! ============================================================================= 
! Expressions
! =============================================================================
! Mixing AND, OR not allowed. Just one type of Logical Operator can be used across multiple conditions in your filter
<Expression List>  ::= 
                | <Expression> [[AND <Expression>] [AND <Expression>] ...]
                | <Expression> [[OR <Expression>] [OR <Expression>] ...]

<Expression> ::= IDENTIFIER = <Value>
<Expression> ::= IDENTIFIER != <Value>
<Expression> ::= IDENTIFIER <> <Value>
<Expression> ::= IDENTIFIER > <Value>
<Expression> ::= IDENTIFIER >= <Value>
<Expression> ::= IDENTIFIER  < <Value>
<Expression> ::= IDENTIFIER <= <Value>
<Expression> ::= IDENTIFIER IN (<Value> [[, <Value>] [, <Value>] ...])
<Expression> ::= IDENTIFIER BETWEEN <Value> AND  <Value> 
<Expression> ::= IDENTIFIER LIKE <Value>
<Expression> ::= IDENTIFIER NOT LIKE <Value>
<Expression> ::= IDENTIFIER CONTAINS <Value>
<Expression> ::= IDENTIFIER IS NULL
<Expression> ::= IDENTIFIER IS NOT NULL

<Value>      ::= IntegerLiteral | StringLiteral

Supported Operators:

AND
OR
LIKE
NOT LIKE
IN
BETWEEN
= (Equal)
!=, <> (Not Equal)
< (Less than)
> (Greater than)
<= (Less than or equal to)
>= (Greater than or equal to)

Supported Options for WITH clause:

Option	Description
SCAN	A `Scan` operation reads every item in a table or a secondary index. By default, a `Scan` operation returns all of the data attributes for every item in the table or index. You can specify column list so that `Scan` only returns some of the attributes, rather than all of them. `Scan` always returns a result set. If no matching items are found, the result set will be empty. A single `Scan` request can retrieve a maximum of 1 MB of data; DynamoDB can optionally apply a filter expression to this data, narrowing the results before they are returned to the user.
QUERY (This is default)	A `Query` operation finds items in a table or a secondary index using only primary key attribute values. You must provide a hash key attribute name and a distinct value to search for. You can optionally provide a range key attribute name and value, and use a comparison operator to refine the search results. By default, a `Query` operation returns all of the data attributes for items with the specified primary key(s); however, you can use the select column list rather than * so that the `Query` operation only returns some of the attributes, rather than all of them. These are the possible ways to Query DynamoDB table: By Hash Column By Hash Column + Range Column By Hash Column + Local Index Column By Global Index Column By Global Index Column + Range Index Read here to learn more about DynamoDB Indexing consideration In a `Query` operation, you use one or more filter conditions to determine the items to be read from the table or index. You must specify the hash key attribute name and value as an equality condition. You can optionally provide a second condition for the range key attribute (if present). The range key condition must use one of the following comparison operators: `a = b` - true if the attribute `a` is equal to the value `b` `a < b` - true if `a` is less than `b` `a <= b` - true if `a` is less than or equal to `b` `a > b` - true if `a` is greater than `b` `a >= b` - true if `a` is greater than or equal to `b` `a BETWEEN b AND c` - true if `a` is greater than or equal to `b`, and less than or equal to `c`. The following function is also supported: `begins_with (a, substr)`- true if the value of attribute `a` begins with a particular substring.
LIMIT	The maximum number of items to evaluate (not necessarily the number of matching items). If DynamoDB processes the number of items up to the limit while processing the results, it stops the operation and returns the matching values up to that point, and a key in LastEvaluatedKey to apply in a subsequent operation, so that you can pick up where you left off. Also, if the processed data set size exceeds 1 MB before DynamoDB reaches this limit, it stops the operation and returns the matching values up to the limit, and a key in LastEvaluatedKey to apply in a subsequent operation to continue the operation. For more information, see Query and Scan in the Amazon DynamoDB Developer Guide.
INDEX	The name of an index to query. This index can be any local secondary index or global secondary index on the table. Note that if you use the IndexName parameter, you must also provide TableName.
SCAN_INDEX_FORWARD [=true\|false]	Applicable when QUERY Mode is used. Determines the read consistency model: If set to true, then the operation uses strongly consistent reads; otherwise, the operation uses eventually consistent reads.Strongly consistent reads are not supported on global secondary indexes. If you query a global secondary index with ConsistentRead set to true, you will receive an error message.
CONSISTENT_READ	Applicable when QUERY Mode is used. Determines the read consistency model: If set to true, then the operation uses strongly consistent reads; otherwise, the operation uses eventually consistent reads.Strongly consistent reads are not supported on global secondary indexes. If you query a global secondary index with ConsistentRead set to true, you will receive an error message.

Understanding Scan vs Query operation:

By default every request will be treated as QUERY operation to reduce the cost associated with READ operation. Amazon provided some guidelines here for Difference between DynamoDB QUERY vs Scan and Cost calculation

Some points to remember for DynamoDB QUERY vs SCAN operation

QUERY operation is faster operation and incurs lower cost compared to SCAN
QUERY operation must include HASH KeyColumn in one or more conditions in your filter, you can optionally also include RANGE KeyColumns or Index Columns
SCAN is slower operation and incurs higher cost compared to QUERY
SCAN operation scans your table row by row to match your condition from filter, so be careful if you have very large table. Minimum data scan in each request is 1MB (unless you have table less than 1MB data)
SCAN operation can include any column in Filter condition unlike QUERY operation where HASH KeyColumn must present
Use query HINTS in WITH clause to indicate SCAN or QUERY operation. If you don't specify it then QUERY operation will be assumed in which you must include HASH KeyColumn in Condition along with Range or Index KeyColumns

Using Date/Time in your DynamoDB Query:

There is no Datetime datatype in DynamoDB but if you are storing your date values in ISO 8601 DateTime Format as string then your query will understand automatically if you specify >, >=, <, <= or BETWEEN Filter Condition

select * from Orders where OrderDate > '2015-12-31T23:59:59.000Z' WITH(SCAN)

Using variable in Date/Time query
If you want to make your query dynamic then you can use SSIS variable placeholder (use quote around it). If your variable DataType is datetime then you can format it using following way so it has ISO date format. If your variable DataType is String then make sure its correctly formatted.

select * from Orders where OrderDate > '{{User::vMyDate,yyyy-MM-ddTHH:mm:ssZ}}' WITH(SCAN)

Click here to see various ISO date formats

Specifying Options in WITH Clause of DynamoDB Query:

Specify SCAN mode using SCAN option

select * from mytable where SomeColumn > 55 WITH(SCAN)

Specify SCAN mode with LIMIT of 1000 rows

select * from mytable where SomeColumn > 55 WITH(SCAN, LIMIT=1000)

Specify QUERY mode using QUERY option

select * from mytable where MyHashKeyColumn = 'XYZ' And MyRangeKey > '1223' WITH(QUERY)

Specify QUERY mode using QUERY option, and use of INDEX

select * from mytable where MyHashKeyColumn = 'XYZ' And MyRangeKey > '1223' and Phone LIKE '111-%' WITH(QUERY, INDEX=[index-phone-number])

Names with space or other special characters:

Use brackets around identifier if it has space or dash or dot.

select * from mytable where MyHashKeyColumn = 'XYZ' And MyRangeKey > '1223' and Phone LIKE '111-%' WITH(QUERY, INDEX=[index-phone-number])

Amazon DynamoDB Query Examples:

-- Selecting all columns

select * from CustomerTest

-- Selecting multiple columns

select CustomerID,CompanyName from CustomerTest

-- Selecting multiple columns, where some attribute name has space in it

select CustomerID,[Contact Title] from CustomerTest

-- Find records where City attribute is missing (Missing is different than actual NULL value {City: null, Id:1} -vs- {Id:1} )

select * from CustomerTest Where City IS NULL

-- Find records where City attribute is NULL (i.e. attribute is found but value is NULL e.g. {City: null, Id:1} )

select * from CustomerTest Where City = NULL

-- Using Where clause with simple condition, select all columns

select * from CustomerTest Where CustomerID='ALC3R'

-- Limiting records returned from query using TOP clause (Similar as LIMIT found in some RDBMS query engine)

select top 7 * from CustomerTest

-- Using OR

select * from CustomerTest Where CustomerID = 'ALFKI' OR CustomerID = 'BOLID'

-- Using AND

select * from CustomerTest Where CustomerID = 'ALFKI' AND Age > 3

-- Using comparison operators

select * from CustomerTest Where CustomerID <> 'ALFKI'

select * from CustomerTest Where CustomerID != 'ALFKI'

select * from CustomerTest Where Age > 5

select * from CustomerTest Where Age >= 5

select * from CustomerTest Where Age < 5

select * from CustomerTest Where Age = 5 AND CustomerID = 'C5'

Setting UI

SSIS Amazon DynamoDB Source - Stiting UI

Amazon AWS DynamoDB Source FAQs

Amazon AWS - Getting Your Access Key ID and Secret Access Key?

Articles / Tutorials

Click here to see all articles for [SSIS Amazon DynamoDB Src] category

How to Read / Write Amazon DynamoDB in SSIS

Introduction In this article we will look at how to Read / Write Amazon DynamoDB in SSIS. ZappySys developed many AWS related components but in this article we will look at 3 Tasks/Components for DynamoDB Integration Scenarios (Read, Write, Update, Bulk Insert, Create / Drop Table etc.).Â We will discuss how to useÂ SSIS DynamoDB Source Connector […]

Content

Video Tutorial

Step-By-Step

---- OR ----

Component Properties

Output Column Properties

Error handling

Remarks

DynamoDB SQL Query Language

SQL Grammar for DynamoDB:

Supported Operators:

Supported Options for WITH clause:

Understanding Scan vs Query operation:

Using Date/Time in your DynamoDB Query:

Specifying Options in WITH Clause of DynamoDB Query:

Names with space or other special characters:

Amazon DynamoDB Query Examples:

Setting UI

Amazon AWS DynamoDB Source FAQs

See Also

Articles / Tutorials

How to Read / Write Amazon DynamoDB in SSIS