SSIS Amazon DynamoDB Source Adapter (Bulk Extract,Read,Copy NoSQL Data)
PreviousNext

Amazon DynamoDB Source can be used to extract large amount of data from Amazon DynamoDB. This component supports SQL like query language to query DynamoDB data without worrying about complex API. Amazon DynamoDB is a cloud based NoSQL data storage service from Amazon AWS

Download SSIS PowerPack

Content

Video Tutorial

Coming soon...

Step-By-Step

In this section you will learn how to use Amazon DynamoDB Source Adapter to extract data(In this case Table and Query Mode).
  1. In order to connect to Amazon DynamoDB from SSIS you will need Access Key and Secret Key. Ask your SysAdmin or responsible person to provide that to you. Your keys will look something like this: (this is just example key which may differ in your case)
    AccessKey: AKIAIOSFODNN7EXAMPLE
    SecretKey: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
    
    Click here to read more how to get your AccessKey and Secret Key.

  2. Once you have Account Key (its like UserID) and Secret Key (its like password) you may proceed to create new SSIS package. In BIDS/SSDT create new SSIS package.
  3. From the SSIS toolbox drag and drop Data Flow Task on the controlflow designer surface.
    SSIS Data Flow Task - Drag and Drop
  4. Double click on the DataFlow task to see DataFlow designer surface.
  5. From the SSIS toolbox drag and drop Amazon DynamoDB Source on the dataflow designer surface.
    SSIS Amazon DynamoDB Source - Drag and Drop
  6. Double click Amazon DynamoDB Source to configure it.
  7. Click on [New] button for Select Connection Manager. Select ServiceType from Top Dropdown, Enter AccessKey and SecrentKey and Click Test connection to verify credentials.
    SSIS Amazon DynamoDB Source - Configure Connection
  8. Click OK to save connection information.
  9. Select table from the Tables dropdown or change AccessMode to Query and type SQL Statement and click preview. You can use SQL Style query language.
    SSIS Amazon DynamoDB Source - Configure Table Mode

    ---- OR ----


    SSIS Amazon DynamoDB Source - Configure Query Mode
  10. Click OK to save settings.
  11. From the SSIS toolbox drag and drop Trash Destination on the data flow designer surface.
    SSIS Trash Destination - Drag and Drop
  12. Now single click on the Amazon DynamoDB Source, once you see blue arrow from source ... connect it to Trash Destination.
  13. Double click on ZS Trash Destination to Configure it.
    SSIS Trash Destination - Configure
  14. Click on OK button to save Trash Destination configure setting UI.
  15. Right click on the path and Add Data Viewer.
  16. Execute the package and verify source data in data viewer.
    SSIS Amazon DynamoDB Source - Execute

Component Properties

Property Name Description
LoggingMode LoggingMode determines how much information is logged during Package Execution. Set Logging mode to Debugging for maximum log.

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
Normal [0] Normal
Medium [1] Medium
Detailed [2] Detailed
Debugging [3] Debugging
PrefixTimestamp When you enable this property it will prefix timestamp before Log messages.
TableName Table name from where you want to read data
BatchSize Indicates how many records are process in a single request. Setting this value to 0 will extract all records in single request.
AccessMode Indicates Access Mode of data (e.g. Table or Sql Query)

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
Table [0] Table
Query [1] Query
SqlCommand Specifies SQL query to access data. This setting is ignored when access mode is set to Table.
OutputMode Specifies how you want to output columns. If you select RAW JSON Document mode then only single document will output for each record in DT_NTEXT datatype

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
Columns [0] Output Columns - Default Mode
ColumnsWithCleanJson [2] Output Columns (RegX Clean Mode - SLOW)
RawDocuments [10] Output Raw JSON documents
RawDocumentsShellMode [11] Output Raw JSON documents - Keep Native Syntax
IndentOutput Indent JSON output so its easy to read.
DateFormatString Specifies how custom date formatted strings are parsed when reading JSON.
DateParseHandling Specifies how date formatted strings, e.g. Date(1198908717056) and 2012-03-21T05:40Z, are parsed when reading JSON.

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
None [0] Keep date as string
DateTime [1] Convert to DateTime (Timezone lost)
DateTimeOffset [2] Convert to DateTimeOffset (Preserve Time zone)
EnableJsonPath This option allows you to extract array content and sub documents of array.
Filter Enter expression here to filter data.(Example:  $.Users[*].UserName ) This will fetch User names from users records
IncludeParentColumns Use this option to include parent properties (Non array) in the output along with Filtered Rows
IncludeParentColumnsWhenChildMissing By default child and parent information is not included in the output if children not found for specified expression. For example if you want to extract all orders from all customers nodes then you can type $.Customers[*].Orders[*]. This will fetch all orders from all customers. By default customers records without orders wont be included in the output. If you want to include those customers where orders not found then check this option (Output null information for order attributes). This behavior is similar to LEFT OUTER JOIN in SQL (Left side is parent, right side is child). This option is ***resource intensive*** so only check if you really care about this behavior.
IncludeParentColumnsWithArrayType Set this option to true if you want to output parent columns which are array. By default any parent column which is an array is not included in output. See also FilterForParentColumnsWithArrayType property if you set this property
FilterForParentColumnsWithArrayType Filter expression to extract value form parent
ParentColumnPrefix Prefix for parent column name. This option is only valid if you have set IncludeParentColumns=True
ThrowErrorIfPropertyMissing Throw error if property name specified in filter expression is missing. By default it will ignore any missing property errors.
MaxLevelsToScan This property how many nested levels should be scanned to fetch various properties. 0=Scan all child levels.
ExcludedProperties List comma separated property names from JSON document which you want to exclude from output. Specify parent property name to exclude all child nodes.
LevelSeparator Property level separator used in generated property name (separator for outer properties - Above selected filter node). Use this if default separator is producing duplicate property name which is conflicting with existing name.
EnableArrayFlattening Enables deep array flattening for selected filtered hierarchy. When you turn on this property it will flatten each property of each array item and expose as column (e.g. If you have Filter set as $.customers[*] and for each customer you have an array of Addresses then you may see output columns like Addresses.1.City, Addresses.1.State, Addresses.2.City, Addresses.2.State .... Addresses.N.City, Addresses.N.State). You can control how many array items you want to flatten by setting MaxArrayItemsToFlatten property.
MaxArrayItemsToFlatten Maximum number of array items to flatten. inner array flattening. Adjust this property to control how many columns being generated. This option is ignored if you set EnableArrayFlattening=false
MetaDataScanMode Metadata scan mode controls how data type and length is determined. By default few records scanned to determine datatype/length. Changing ScanMode affects length/datatype accuracy.

Available Options (Use numeric value listed in bracket if you have to define expression on this property (for dynamic behavior).

Option Description
Auto [0] Auto
Strict [1] Strict - Exact length
Guess2x [2] Guess2x - 2 times bigger
Guess3x [3] Guess3x - 3 times bigger
Guess4x [4] Guess4x - 4 times bigger
TreatAsUnicodeString [5] Set all columns as string
Guess10x [6] Guess10x - 10 times bigger
TreatStringAsMaxLength [7] Set string columns with MAX Length - i.e. DT_WSTR(4000)
TreatStringAsBlob [8] Set string columns as BLOB - i.e. DT_NTEXT
MetaDataCustomLength Length for all string column. This option is only valid for MetaDataScanMode=Custom
MetaDataTreatStringAsAscii When this option is true, it detects all string values as DT_STR (Ascii) rather than DT_WSTR (Unicode)

Output Column Properties

Error handling

  • This component supports error output.
  • Any bad value will cause Error or Truncation (e.g. If your sample value is 5 character long but output column length is set to 3 chars). If you have error output attached with RedirectRow option on Error, Truncation then you can capture bad rows along with exact reason of error.
  • Use error handling tab to set option for error handling. By default component fails on any data error or truncation issue

Remarks

Amazon DynamoDB API doesn't support SQL Queries like traditional RDBMS so we have implemented light weight parser which takes your SQL Queries and turn into Native DynamoDB query. Here is simple example
SELECT TOP 2 * FROM Customer WHERE (CustomerID='ALFKI' OR CustomerID='BOLID')
SELECT Age, CustomerID FROM Customer WHERE CustomerID IN ('ALFKI','BOLID','C3')

DynamoDB SQL Query Language

If you work with traditional RDBMS and you recently come to NoSQL world you will quickly realize that most of NoSQL database including DynamoDB is missing familiar SQL query language. But no worry we got you covered. We have implemented custom query parser which will do hard work for you so you don't have to spend time learning new query language for DynamoDB. Our query engine will convert SQL query to Native
DynamoDB query language.

SQL Grammar for DynamoDB:

<Select Statement> ::= SELECT [{TOP Clause}] {Column List} {From Clause} [{Where Clause}] [{With Clause}]

<TOP Clause>   ::= TOP IntegerLiteral
<Column List>   ::= * | IDENTIFIER [ [,IDENTIFIER] [,IDENTIFIER] ... ]
<From Clause>   ::= FROM IDENTIFIER
<Where Clause>  ::= WHERE <Expression List>
<With Clause>   ::= WITH ( [SCAN|QUERY] [,INDEX=your_index_name] [,LIMIT=some_integer] [,SCAN_INDEX_FORWARD=true|false] [,CONSISTENT_READ] )

! ============================================================================= 
! Expressions
! =============================================================================
! Mixing AND, OR not allowed. Just one type of Logical Operator can be used across multiple conditions in your filter
<Expression List>  ::= 
                | <Expression> [[AND <Expression>] [AND <Expression>] ...]
                | <Expression> [[OR <Expression>] [OR <Expression>] ...]

<Expression> ::= IDENTIFIER = <Value>
<Expression> ::= IDENTIFIER != <Value>
<Expression> ::= IDENTIFIER <> <Value>
<Expression> ::= IDENTIFIER > <Value>
<Expression> ::= IDENTIFIER >= <Value>
<Expression> ::= IDENTIFIER  < <Value>
<Expression> ::= IDENTIFIER <= <Value>
<Expression> ::= IDENTIFIER IN (<Value> [[, <Value>] [, <Value>] ...])
<Expression> ::= IDENTIFIER BETWEEN <Value> AND  <Value> 
<Expression> ::= IDENTIFIER LIKE <Value>
<Expression> ::= IDENTIFIER NOT LIKE <Value>
<Expression> ::= IDENTIFIER CONTAINS <Value>
<Expression> ::= IDENTIFIER IS NULL
<Expression> ::= IDENTIFIER IS NOT NULL

<Value>      ::= IntegerLiteral | StringLiteral

Supported Operators:

Supported Options for WITH clause:

Option Description
SCAN

A Scan operation reads every item in a table or a secondary index. By default, a Scan operation returns all of the data attributes for every item in the table or index. You can specify column list so that Scan only returns some of the attributes, rather than all of them.

Scan always returns a result set. If no matching items are found, the result set will be empty.

A single Scan request can retrieve a maximum of 1 MB of data; DynamoDB can optionally apply a filter expression to this data, narrowing the results before they are returned to the user.

QUERY (This is default)

A Query operation finds items in a table or a secondary index using only primary key attribute values. You must provide a hash key attribute name and a distinct value to search for. You can optionally provide a range key attribute name and value, and use a comparison operator to refine the search results. By default, a Query operation returns all of the data attributes for items with the specified primary key(s); however, you can use the select column list rather than * so that the Query operation only returns some of the attributes, rather than all of them.
These are the possible ways to Query DynamoDB table:

  • By Hash Column
  • By Hash Column + Range Column
  • By Hash Column + Local Index Column
  • By Global Index Column
  • By Global Index Column + Range Index
Read here to learn more about DynamoDB Indexing consideration

In a Query operation, you use one or more filter conditions to determine the items to be read from the table or index. You must specify the hash key attribute name and value as an equality condition. You can optionally provide a second condition for the range key attribute (if present). The range key condition must use one of the following comparison operators:

  • a = b - true if the attribute a is equal to the value b

  • a < b - true if a is less than b

  • a <= b - true if a is less than or equal to b

  • a > b - true if a is greater than b

  • a >= b - true if a is greater than or equal to b

  • a BETWEEN b AND c - true if a is greater than or equal to b, and less than or equal to c.

The following function is also supported:

  • begins_with (a, substr)- true if the value of attribute a begins with a particular substring.

LIMIT The maximum number of items to evaluate (not necessarily the number of matching items). If DynamoDB processes the number of items up to the limit while processing the results, it stops the operation and returns the matching values up to that point, and a key in LastEvaluatedKey to apply in a subsequent operation, so that you can pick up where you left off. Also, if the processed data set size exceeds 1 MB before DynamoDB reaches this limit, it stops the operation and returns the matching values up to the limit, and a key in LastEvaluatedKey to apply in a subsequent operation to continue the operation. For more information, see Query and Scan in the Amazon DynamoDB Developer Guide.
INDEX The name of an index to query. This index can be any local secondary index or global secondary index on the table. Note that if you use the IndexName parameter, you must also provide TableName.
SCAN_INDEX_FORWARD [=true|false] Applicable when QUERY Mode is used. Determines the read consistency model: If set to true, then the operation uses strongly consistent reads; otherwise, the operation uses eventually consistent reads.Strongly consistent reads are not supported on global secondary indexes. If you query a global secondary index with ConsistentRead set to true, you will receive an error message.
CONSISTENT_READ Applicable when QUERY Mode is used. Determines the read consistency model: If set to true, then the operation uses strongly consistent reads; otherwise, the operation uses eventually consistent reads.Strongly consistent reads are not supported on global secondary indexes. If you query a global secondary index with ConsistentRead set to true, you will receive an error message.

Understanding Scan vs Query operation:

By default every request will be treated as QUERY operation to reduce the cost associated with READ operation. Amazon provided some guidelines here for Difference between DynamoDB QUERY vs Scan and Cost calculation

Some points to remember for DynamoDB QUERY vs SCAN operation
  • QUERY operation is faster operation and incurs lower cost compared to SCAN
  • QUERY operation must include HASH KeyColumn in one or more conditions in your filter, you can optionally also include RANGE KeyColumns or Index Columns
  • SCAN is slower operation and incurs higher cost compared to QUERY
  • SCAN operation scans your table row by row to match your condition from filter, so be careful if you have very large table. Minimum data scan in each request is 1MB (unless you have table less than 1MB data)
  • SCAN operation can include any column in Filter condition unlike QUERY operation where HASH KeyColumn must present
  • Use query HINTS in WITH clause to indicate SCAN or QUERY operation. If you don't specify it then QUERY operation will be assumed in which you must include HASH KeyColumn in Condition along with Range or Index KeyColumns

Using Date/Time in your DynamoDB Query:

There is no Datetime datatype in DynamoDB but if you are storing your date values in ISO 8601 DateTime Format as string then your query will understand automatically if you specify >, >=, <, <= or BETWEEN Filter Condition
select * from Orders where OrderDate > '2015-12-31T23:59:59.000Z' WITH(SCAN)
Using variable in Date/Time query
If you want to make your query dynamic then you can use SSIS variable placeholder (use quote around it). If your variable DataType is datetime then you can format it using following way so it has ISO date format. If your variable DataType is String then make sure its correctly formatted.
select * from Orders where OrderDate > '{{User::vMyDate,yyyy-MM-ddTHH:mm:ssZ}}' WITH(SCAN)
Click here to see various ISO date formats

Specifying Options in WITH Clause of DynamoDB Query:

Names with space or other special characters:

Use brackets around identifier if it has space or dash or dot.

select * from mytable where MyHashKeyColumn = 'XYZ' And MyRangeKey > '1223' and Phone LIKE '111-%' WITH(QUERY, INDEX=[index-phone-number])

Amazon DynamoDB Query Examples:

-- Selecting all columns

select * from CustomerTest

 

-- Selecting multiple columns

select CustomerID,CompanyName from CustomerTest

 

-- Selecting multiple columns, where some attribute name has space in it

select CustomerID,[Contact Title] from CustomerTest

 

-- Find records where City attribute is missing (Missing is different than actual NULL value {City: null, Id:1} -vs- {Id:1} )

select * from CustomerTest Where City IS NULL

 

-- Find records where City attribute is NULL (i.e. attribute is found but value is NULL e.g. {City: null, Id:1} )

select * from CustomerTest Where City = NULL

 

-- Using Where clause with simple condition, select all columns

select * from CustomerTest Where CustomerID='ALC3R'

 

-- Limiting records returned from query using TOP clause (Similar as LIMIT found in some RDBMS query engine)

select top 7 * from CustomerTest

 

-- Using OR 

select * from CustomerTest Where CustomerID = 'ALFKI' OR CustomerID = 'BOLID'

 

-- Using AND 

select * from CustomerTest Where CustomerID = 'ALFKI' AND Age > 3

 

-- Using comparison operators

select * from CustomerTest Where CustomerID <> 'ALFKI'

select * from CustomerTest Where CustomerID != 'ALFKI'

select * from CustomerTest Where Age > 5

select * from CustomerTest Where Age >= 5

select * from CustomerTest Where Age < 5

select * from CustomerTest Where Age = 5 AND CustomerID = 'C5'

Setting UI

SSIS Amazon DynamoDB Source - Setting UI
SSIS Amazon DynamoDB Source - Stiting UI
SSIS Amazon DynamoDB Source - Setting UI
SSIS Amazon DynamoDB Source - Setting UI

Amazon AWS DynamoDB Source FAQs

See Also

Articles / Tutorials

Click here to see all articles for [SSIS Amazon DynamoDB Src] category
How to Read / Write Amazon DynamoDB in SSIS

How to Read / Write Amazon DynamoDB in SSIS

Introduction In this article we will look at how to Read / Write Amazon DynamoDB in SSIS. ZappySys developed many AWS related components but in this article we will look at 3 Tasks/Components for DynamoDB Integration Scenarios (Read, Write, Update, Bulk Insert, Create / Drop Table etc.). We will discuss how to use SSIS DynamoDB Source Connector […]



Copyrights reserved. ZappySys LLC.