SSIS MongoDB Source Adapter (Bulk Extract,Read,Copy NoSQL Data)
PreviousNext

MongoDB Source can be used to extract large amount of data from MongoDB Database. This component supports SQL like query language to query MongoDB data without worrying about complex API or proprietary query language. MongoDB is an open-source document database, and the most popular NoSQL database out there.

Download SSIS PowerPack

Video Tutorial

Content

Step-By-Step

In this section you will learn how to use MongoDB Source Adapter to extract data from MongoDB Collection (i.e. Table).
  1. From the SSIS toolbox drag and drop Data Flow Task on the control flow designer surface.
  2. Double click on the DataFlow task to see DataFlow designer surface.
  3. From the SSIS toolbox drag and drop MongoDB Source on the dataflow designer surface.
  4. Double click MongoDB Source to configure it.
  5. Click on [New] button for Select Connection Manager. Enter MongoDB server name, userid, password, database and Click Test connection to verify credentials.
    Configure SSIS MongoDB Source Connection Configure SSIS MongoDB Source Connection Advanced Options
  6. Click OK to save connection information.
  7. Select table from the Tables dropdown or change AccessMode to Query and type SQL Statement and click preview. You can use SQL Style query or use native MongoDB Query language.
    Configure SSIS MongoDB Source Connection

    ---- OR ----


    Configure SSIS MongoDB Source Connection

    ---- OR ----


    Configure SSIS MongoDB Source Connection
  8. Click OK to save settings.
  9. From the SSIS toolbox drag and drop Trash Destination on the dataflow designer surface.
  10. Now single click on the MongoDB Source once you see blue arrow from source ... connect it to Trash Destination.
  11. Right click on the path and Add Data Viewer.
  12. Execute the package and verify source data in data viewer.
    Configure SSIS MongoDB Source Connection

Properties

Property Name Description
LoggingMode LoggingMode determines how much information is logged during Package Execution. Set Logging mode to Debugging for maximum log.

Available Options

Option Description
Normal Normal
Medium Medium
Detailed Detailed
Debugging Debugging
PrefixTimestamp When you enable this property it will prefix timestamp before Log messages.
TableName Table name from where you want to read data
BatchSize Indicates how many records are process in a single request. Setting this value to 0 will extract all records in single request.
AccessMode Indicates Access Mode of data (e.g. Table or Sql Query)

Available Options

Option Description
Table Table
Query Query
SqlCommand Specifies SQL query to access data. This setting is ignored when access mode is set to Table.
OutputMode Specifies how you want to output columns. If you select RAW JSON Document mode then only single document will output for each record in DT_NTEXT datatype

Available Options

Option Description
Columns Output Columns - Default Mode
ColumnsWithCleanJson Output Columns (RegX Clean Mode - SLOW)
RawDocuments Output Raw JSON documents
RawDocumentsShellMode Output Raw JSON documents - Keep Native Syntax
IndentOutput Indent JSON output so its easy to read.
DateFormatString Specifies how custom date formatted strings are parsed when reading JSON.
DateParseHandling Specifies how date formatted strings, e.g. Date(1198908717056) and 2012-03-21T05:40Z, are parsed when reading JSON.

Available Options

Option Description
None Keep date as string
DateTime Convert to DateTime (Timezone lost)
DateTimeOffset Convert to DateTimeOffset (Preserve Time zone)
EnableJsonPath This option allows you to extract array content and sub documents of array.
Filter Enter expression here to filter data.(Example:  $.Users[*].UserName ) This will fetch User names from users records
IncludeParentColumns Use this option to include parent properties (Non array) in the output along with Filtered Rows
IncludeParentColumnsWhenChildMissing By default child and parent information is not included in the output if children not found for specified expression. For example if you want to extract all orders from all customers nodes then you can type $.Customers[*].Orders[*]. This will fetch all orders from all customers. By default customers records without orders wont be included in the output. If you want to include those customers where orders not found then check this option (Output null information for order attributes). This behavior is similar to LEFT OUTER JOIN in SQL (Left side is parent, right side is child). This option is ***resource intensive*** so only check if you really care about this behavior.
IncludeParentColumnsWithArrayType Set this option to true if you want to output parent columns which are array. By default any parent column which is an array is not included in output. See also FilterForParentColumnsWithArrayType property if you set this property
FilterForParentColumnsWithArrayType Set this option to true if you want to output parent columns which are array. By default any parent column which is an array is not included in output. See also FilterForParentColumnsWithArrayType property if you set this property
ParentColumnPrefix Prefix for parent column name. This option is only valid if you have set IncludeParentColumns=True
ThrowErrorIfPropertyMissing Throw error if property name specified in filter expression is missing. By default it will ignore any missing property errors.
MaxLevelsToScan This property how many nested levels should be scanned to fetch various properties. 0=Scan all child levels.
ExcludedProperties List comma separated property names from JSON document which you want to exclude from output. Specify parent property name to exclude all child nodes.
MetaDataScanMode Metadata scan mode controls how data type and length is determined. By default few records scanned to determine datatype/length. Changing ScanMode affects length/datatype accuracy.
MetaDataCustomLength Length for all string column. This option is only valid for MetaDataScanMode=Custom

Error handling

  • This component supports error output.
  • Any bad value will cause Error or Truncation (e.g. If your sample value is 5 character long but output column length is set to 3 chars). If you have error output attached with RedirectRow option on Error, Truncation then you can capture bad rows along with exact reason of error.
  • Use error handling tab to set option for error handling. By default component fails on any data error or truncation issue

Filter data / Extract from arrays

MongoDB source comes with JSON Path feature to allow you to extract data from nested array. Limitation of this feature is you can only extract one array only. If you have need to extract data from multiple arrays found inside MongoDB document then check below article.
http://zappysys.com/blog/how-to-extract-multiple-arrays-from-mongodb-using-ssis/
Below video tutorial demonstrates following things

MongoDB Native Query Language

Many features supported by MongoDB query engine is not implemented in our light weight SQL Style query language for MongoDB. In that case you can always use
Native MongoDB query syntax for full capability found in db.collection.find(...). When you use SQL Query for MongoDB Source it generates native query behind the scene. Syntax for using Native query is below. Below syntax has two parts. First part includes Table name, projection field names (optional) and row limit (optional). Second part includes actual native query document ( first argument of db.collection.find(...) method goes here). MongoDB Native query support many query operators (see below).

Syntax for MongoDB native query inside SSIS MongoDB Source:

{Table:YourTableName [, MaxRows: N] [,Columns:"Column1|Column2...ColumnN"] [,Option1...OptionN]}
{
	<MongoDB-Query-Goes-here>
}
        

Example : MongoDB native query (use of Query operators)

See below example of using native query inside MongoDB Source (MaxRows and Columns are optional. If you omit them all rows and all columns returned).
{Table:Customer, MaxRows: 2, Columns:"Age|CustomerID"}
{
  "$and": 
  [
  	 {"$or": 
	  [{"CustomerID": "ALFKI"},
       {"CustomerID": "BOLID"}
      ]},
    {"$or": 
	  [{"Age": 1},
       {"Age": 2},
       {"Age": 3}
      ]}
  ]
}
	
SSIS MongoDB Source - Example of MongoDB Native Query Syntax Above code is same as below SQL query
SELECT TOP 2 Age, CustomerID FROM Customer WHERE (CustomerID='ALFKI' OR CustomerID='BOLID') AND (Age=1 OR Age=2 OR Age=3)

MongoDB Aggregation Framework (Group by)

If you want to write aggregation queries in MongoDB (similar to Group By query in SQL language) then you must use Native query language because custom SQL Query language of MongoDB Source doesn't support Group By construct.To learn more about aggregation framework and pipeline operators check below links.
SQL vs MongoDB Aggregation Comparison
Aggregation framework pipeline operators
db.collection.aggregate()
Aggregation Pipeline Process
Here is syntax of Aggregation query syntax in MongoDB Source
{Table:YourTableName ,Method:Aggregation [,MaxRows: N] [,AllowDiskUse:true] [,AllowCursor:true] [,Timeout:#####] }
{
	 pipeline: [
	  [ <pipeline-stage1>]
	  [,<pipeline-stage2>]
	  ...
	  [,<pipeline-stageN>]
	 ]
}
        
Option Description
Table (Required) Collection name on which you want to perform Aggregation.
Method (Required) Supported methods are Aggregation and Distinct. If you specify Method:Aggregation then you must specify pipeline attribute in query body. If you specify Distinct then you must specify Columns: YourColumnName.
MaxRows (Optional) Default=0 (i.e. no limit). Maximum number of rows you want to output (Similar as select top N from collection)
AllowDiskUse (Optional) Default=True. This option allow to exceed 100MB RAM limitation for pipeline. When set to true, aggregation operations can write data to the _tmp subdirectory in the dbPath directory
AllowCursor (Optional) Default=True. This option enables cursor which fetch data in batch. See BatchSize property of component to control size of batch. If you have large documents then reduce size of Batch so all records together is less than 16MB limit of single batch. You may sometimes get error aggregation result exceeds maximum document size (16MB). In that case reduce Batch size until error goes away.
Timeout (Optional). Timeout for command in seconds. This option specify how long single command can execute before throwing timeout error.

Performance tuning / Handling large documents / Buffers

Many times you may face slow queries. In that case make sure following few things.

Example : MongoDB native query using aggregation framework (Group By)

Here is an example of group by query using MongoDB Source. Notice how we arranged various pipeline stages in specific order under pipeline:[ {..},{..},{..} ] just like how you write SQL query in specific way (SELECT..FROM..WHERE..GROUP BY..HAVING..ORDER BY). MongoDB Aggregation Pipeline has similar concepts (e.g. $match, $group, $projection, $sort). Its not exact same way as SQL but you get an idea how to do aggregation in MongoDB. Notice that $match operator is used for filter (e.g. WHERE and HAVING both) in above example. If $match appears before $group stage then it acts as WHERE clause else HAVING clause (see below). To read more about various pipeline operators check this link
{Table: ZipCodes,Method: aggregate}
{
  pipeline: 
  [
    {$match : { status: "A" } } ,
    {$group : {_id :  "$state", TotalPopulation: { $sum: "$pop" }}},
    {$match: { TotalPopulation: { $gt: 25 } } },
    {$project : { StateAbbr: "$_id" , _id: 0, TotalPopulation:1 } },
    {$sort : {TotalPopulation : -1}}
  ]
}
Above MongoDB query is same as below relational SQL query
SELECT State as StateAbbr,SUM(population) as TotalPopulation 
FROM ZipCodes 
WHERE status='Active' 
HAVING SUM(population) > 25
Order By TotalPopulation DESC
SSIS MongoDB Source - Example of MongoDB Group By Query (Aggregation Framework - Pipeline)

Example : Getting distinct values in MongoDB Source

If you want to fetch distinct values for specific field using MongoDB Source then write following Native query.
{Table: ZipCodes,Method: distinct, Columns:'State'}
{
}
	
Above query is same as below SQL
SELECT DISTINCT state FROM ZipCodes
	
--OR-- Use some filter like below
{Table: ZipCodes,Method: distinct, Columns:'State'}
{
  $and: { status:'A', country:'USA' }
}
	
Above query is same as below SQL
SELECT DISTINCT state FROM ZipCodes WHERE status='A' AND country='USA'
	

Example : MongoDB native query (use of $where expression)

Here is another powerful use of $where query operator. $where support expressions and javascript functions for advanced query. However only use this operator if you cant use other operators such as (comparison operators and logical operators) because when you javascript in $where it may not use index defined on columns in your collection. Use the $where operator to pass either a string containing a JavaScript expression or a full JavaScript function to the query system. The $where provides greater flexibility, but requires that the database processes the JavaScript expression or function for each document in the collection. Reference the document in the JavaScript expression or function using either this or obj .
{Table:Customer}
{ $where: "this.credits == this.debits" } 

MongoDB SQL Query Language

If you work with traditional RDBMS and you recently come to NoSQL world you will quickly realize that most of NoSQL database vendors including MongoDB is missing familiar SQL query language. But no worry we got you covered. We have implemented custom query parser which will do hard work for you so you don't have to spend time learning new query language for MongoDB. Our query engine will convert SQL query to Native
MongoDB query language.

SQL Grammar for MongoDB:

<Select Statement> ::= SELECT [{TOP Clause}] {Column List} {From Clause} [{Where Clause}] [{OrderBy Clause}]

<TOP Clause>    ::= TOP IntegerLiteral
<Column List>   ::= * | IDENTIFIER [ [,IDENTIFIER] [,IDENTIFIER] ... ]
<From Clause>   ::= FROM IDENTIFIER
<Where Clause>  ::= WHERE <Expression List>
<OrderBy Clause>::= ORDER BY IDENTIFIER [ASC|DESC] [[,IDENTIFIER [ASC|DESC] [,IDENTIFIER [ASC|DESC] ... ]

! ============================================================================= 
! Expressions
! =============================================================================
! Nesting AND, OR supported using Multiple conditions
<Expression List>  ::= <Expression>
                           | <Expression> AND <Expression>
                           | <Expression> AND (<Expression List>)
                           | <Expression> OR <Expression>
                           | <Expression> OR (<Expression List>)


<Expression> ::= IDENTIFIER = <Value>
<Expression> ::= IDENTIFIER != <Value>
<Expression> ::= IDENTIFIER <> <Value>
<Expression> ::= IDENTIFIER > <Value>
<Expression> ::= IDENTIFIER >= <Value>
<Expression> ::= IDENTIFIER  < <Value>
<Expression> ::= IDENTIFIER <= <Value>
<Expression> ::= IDENTIFIER IN (<Value> [[, <Value>] [, <Value>] ...])
<Expression> ::= IDENTIFIER BETWEEN <Value> AND  <Value> 
<Expression> ::= IDENTIFIER LIKE <Value>
<Expression> ::= IDENTIFIER NOT LIKE <Value>
<Expression> ::= IDENTIFIER NOT LIKE <Value>
<Expression> ::= IDENTIFIER CONTAINS <Value>
<Expression> ::= IDENTIFIER IS NULL
<Expression> ::= IDENTIFIER IS NOT NULL

<Value>      ::= IntegerLiteral | StringLiteral
	

Supported Operators:


Searching for text patterns (Use of Regular Expressions):

To search text pattern you may use simple wild card (i.e. % ) in your LIKE clause or specify REGX hint. Your text pattern can end with search hints. To specify search hints use vertical bar and then one or more following options
select * from mytable where column LIKE 'some-pattern|[i][r][x][m][s]'
where options are as below Searching for text (case-sensitive mode - this is default):
by default text pattern search is case-sensitive so no option needed.
select * from customers where password LIKE 'P@ss'
Searching for text patterns (case-insensitive mode):
by default text pattern search is case-sensitive. If you want to perform case in-sensitive search then you must provide option [i] at the end after vertical bar.
select * from customers where City LIKE 'New%|i'
Searching for text patterns using Regular Expression:
select * from customers where phone LIKE '\d\d\d|r'
Using multiple options:
select * from customers where notes LIKE '^ABC\d(.*)|rm'

MongoDB SQL query language examples:

-- Selecting all columns

select * from CustomerTest

 

-- Selecting multiple columns

select CustomerID,CompanyName from CustomerTest

 

-- Selecting multiple columns, where some attribute name has space in it

select CustomerID,[Contact Title] from CustomerTest

 

-- Using Where clause with simple condition, select all columns

select * from CustomerTest Where CustomerID='ALC3R'

 

-- Query ISO Date attribute

select * from Orders Where OrderDate='DATE(2015-01-31)'

 

-- Using ORDER BY on single column

select * from CustomerTest Order By CustomerID

 

-- Using muliple columns in ORDER BY

select * from CustomerTest Order By CustomerID ASC,Country DESC

 

-- Using CONTAINS to do substring search

select * from CustomerTest Where CustomerID CONTAINS 'C3'

 

-- Using LIKE to do substring search (Only % allowed as wildcard character)

select * from CustomerTest Where CustomerID LIKE '%C3%'

 

-- Using NOT LIKE: Return all records where CustomerID doesnt contain substring 'C3'

select * from CustomerTest Where CustomerID NOT LIKE '%C%'

 

-- Using LIKE to search for starts with: search all record where ContactName starts with Maria

select * from CustomerTest Where ContactName LIKE 'Maria%'

 

-- Limiting records returned from query using TOP clause (Similar as LIMIT found inn some RDBMS query engine)

select top 7 * from CustomerTest

 

-- Using IN with String Values: select all records where CustomerID is 'ALFKI' or 'BOLID' or 'C3' (String values)

select * from CustomerTest where CustomerID IN ('ALFKI', 'BOLID','C3')

 

-- Using IN with Numeric Values: select all records where Age is 5 or 7 (Numeric values)

select * from CustomerTest where Age IN (5,7)

 

-- Using OR 

select * from CustomerTest Where CustomerID = 'ALFKI' OR CustomerID = 'BOLID'

 

-- Using OR 

select * from CustomerTest Where CustomerID = 'ALFKI' OR CustomerID = 'BOLID'

 

-- Mixing OR / AND

select * from CustomerTest Where (CustomerID = 'ALFKI' OR CustomerID='BOLID') AND (Age =1 OR Age =2 OR Age =3)

 

-- Using IS NOT NULL: Get all records where Age attribute is present and set to some value other than NULL

select * from CustomerTest Where Age IS NOT NULL

 

-- Using IS NULL: Get all records where Age attribute is either MISSING or is set to null

select * from CustomerTest Where Age IS NULL

 

 

-- Using comparision operators

select * from CustomerTest Where CustomerID <> 'ALFKI'

select * from CustomerTest Where CustomerID != 'ALFKI'

select * from CustomerTest Where Age > 5

select * from CustomerTest Where Age >= 5

select * from CustomerTest Where Age < 5

select * from CustomerTest Where Age = 5 AND CustomerID = 'C5'

 

-- Using BETWEEN for range compare

select * from CustomerTest Where Age Between 5 AND 10

 

-- Using NOT BETWEEN for range compare

select * from CustomerTest Where Age Not Between 5 AND 10

 

-- Mixing conditions

select * from CustomerTest Where (Age >= 1 OR Age <= 10) AND (CustomerID ='C7' Or CustomerID ='C9' )

select * from CustomerTest Where (CustomerID = 'ALFKI' OR (CustomerID = 'BOLID' OR (CustomerID = 'C1')))

 

 

Dynamic Query and Use DateTime Filter

MongoDB has
ISODate datatype support. If you want to query ISODate attribute then you can use following syntax.

Query ISODate in SQL Query

You can use DATE function to cast any string to valid ISODate for MongoDB query (This is custom function of ZappySys MongoDB SQL Language). DATE function has to be inside quotes as below.
SELECT * FROM Orders WHERE OrderDate > 'DATE(2015-01-31)'

Using Placeholder to make query dynamic

Placeholders are great way to make your query dynamic. Use placeholders anywhere in your Query and ZappySys MongoDB Source will replace Variable name with correct value at runtime.
SELECT * FROM Orders WHERE OrderDate < 'DATE({{User::LastOrderDate}})'

Using ISODate in MongoDB Native Query

{Table:Orders}
{
	"OrderDate" : { "$lt" : new ISODate("2015-01-02T00:16:15.184Z")}
}
For more information read this blog post

Settings UI

MongoDB Source Connector - Query data using SQL, Extract from array using JSON Path

SSIS MongoDB Source Connector - Query data using SQL, Filter using JSON Path

MongoDB Source Connector - Use native JSON Query format

SSIS MongoDB Source Connector - Query data using SQL, Filter using JSON Path

MongoDB Source Connector - Advanced Filter Options

SSIS MongoDB Source Adapter Advanced Filter

SSIS MongoDB Source Connector - Output as JSON, Extract raw JSON from nested array

SSIS MongoDB Source - Output as JSON, Extract raw JSON from nested Array

MongoDB Source Connector - Select Columns

SSIS MongoDB Source Adapter Select Columns

MongoDB Source Connector - General Settings

SSIS MongoDB Source Adapter General Settings

MongoDB Connection Manager

SSIS MongoDB Connection Manager

SSIS MongoDB Source - Extract Data

SSIS MongoDB Source - Extract Data

MongoDB Source FAQs

References

See Also


Copyrights reserved. ZappySys LLC.