Introduction
In this post you will learn how to write flexible MongoDB Aggregation Queries in SSIS (i.e. Group By Query) using SSIS MongoDB Source Component. MongoDB provides powerful Aggregation Pipeline Engine which is conceptually same as writing Group By queries in traditional SQL world but its not exactly same. SSIS MongoDB Source returns nested MongoDB JSON documents into flat looking table with rows and column (denormalized data).
If you want to learn how to write data to MongoDB (Insert, Update, Delete, Upsert) then click here. If you want learn how to read data from MongoDB then click here.
So let’s explore concepts of MongoDB Aggregation Pipeline and see some MongoDB query examples for SSIS use case.
Prerequisites
Before we perform the steps listed in this article, you will need to make sure the following prerequisites are met:- SSIS designer installed. Sometimes it is referred to as BIDS or SSDT (download it from the Microsoft site).
- Basic knowledge of SSIS package development using Microsoft SQL Server Integration Services.
- Make sure ZappySys SSIS PowerPack is installed (download it, if you haven't already).
- (Optional step). Read this article, if you are planning to deploy packages to a server and schedule their execution later.
SSIS MongoDB Source Introduction
SSIS MongoDB Source supports three modes to read data from MongoDB collection. Each mode also supports JSONPath Expression to extract nested array.
- Table Mode
- SQL Query Mode
- JSON Query Mode (Native)
To read more about supported SQL Syntax read help file here.
Using SSIS MongoDB Source to Query Data (Aggregation)
Now lets look at how to use SSIS MongoDB Source to query MongoDB data using Aggregation Engine.
MongoDB Aggregation Pipeline Syntax in SSIS
SSIS MongoDB Source support all pipeline commands to write aggregation queries but its slighly different syntax than shell command so your SHell Query won’t work directly but you can copy some part of your Shell query. Check below Syntax.
MongoDB Native Query Syntax (Shell Command) for Aggregation:
1 2 3 4 5 6 7 8 9 |
db.YourCollectionName.aggregate( [ [ <pipeline-stage1>] [,<pipeline-stage2>] ... ... [,<pipeline-stageN>] ] ) |
SSIS MongoDB Source Query Syntax for Aggregation:
1 2 3 4 5 6 7 8 9 10 |
{Table:YourTableName ,Method:aggregate [,MaxRows: N] [,AllowDiskUse:true] [,AllowCursor:true] [,Timeout:#####] } { pipeline: [ [ <pipeline-stage1>] [,<pipeline-stage2>] ... ... [,<pipeline-stageN>] ] } |
Example Aggregation Queries (SQL to SSIS Mapping)
Assume that we want to write following SQL Query using SSIS MongoDB Source or using MongoDB Shell
1 2 3 4 5 |
SELECT State as StateAbbr,SUM(population) as TotalPopulation FROM ZipCodes WHERE status='Active' HAVING SUM(population) > 25 Order By TotalPopulation DESC |
Above SQL Query can be translated to below JSON Query which can be used in SSIS MongoDB Source. As we noted before Below syntax is not same as Shell command so it will only run in SSIS MongoDB Source.
1 2 3 4 5 6 7 8 9 10 11 |
{Table: ZipCodes,Method: aggregate} { pipeline: [ {$match : { status: "A" } } , {$group : {_id : "$state", TotalPopulation: { $sum: "$pop" }}}, {$match: { TotalPopulation: { $gt: 25 } } }, {$project : { StateAbbr: "$_id" , _id: 0, TotalPopulation:1 } }, {$sort : {TotalPopulation : -1}} ] } |
So as you saw in above query various clause of SELECT queries are defined as pipeline stages. You can add any possible stages in different order. Based on their order it will be applied so order of stage is very important. For more information about various pipeline stages click here. Below is high level translation of SQL Query to MongoDB Aggregation Query.
SQL to MongoDB Aggregation Mapping
SQL Clause | Mongo Pipeline Stage |
---|---|
SELECT | $project |
WHERE | $match |
GROUP BY | $group |
HAVING | $match |
ORDER BY | $sort |
Now let’s look at few examples how to translate traditional SQL Queries to JSON Query for SSIS MongoDB Source. We will use sample data we inserted in previous section.
Creating Sample Data for MongoDB Query Examples
For this exercise to write sample queries using below data. So let’s first create a collection called Order and insert following records using your favourite MongoDB client tool.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
db.Orders.insert( { "OrderId" : 1000, "CustomerId" : "CUS001", "OrderTotal" : 2000.50, "Quantity" : 20, "OrderDate" : ISODate("2010-01-01"), "Country" : "USA", "Items" : [ {ItemId:10, UnitPrice: 100.00,Quantity: 10}, {ItemId:11, UnitPrice: 200.10,Quantity: 5}], "PaymentInfo" : { "Method" : "Card", "Currency" : "USD" } } ) db.Orders.insert( { "OrderId" : 1001, "CustomerId" : "CUS001", "OrderTotal" : 1000.00, "Quantity" : 10, "OrderDate" : ISODate("2011-01-02"), "Country" : "USA", "Items" : [ {ItemId:10, UnitPrice: 100.00,Quantity: 10}], "PaymentInfo" : { "Method" : "Card", "Currency" : "USD" } } ) db.Orders.insert( { "OrderId" : 1002, "CustomerId" : "CUS002", "OrderTotal" : 1000.00, "Quantity" : 5, "OrderDate" : ISODate("2012-01-01"), "Country" : "USA", "Items" : [ {ItemId:13, UnitPrice: 200.00,Quantity: 5}], "PaymentInfo" : { "Method" : "Bank", "Currency" : "USD" } } ) db.Orders.insert( { "OrderId" : 1003, "CustomerId" : "CUS003", "OrderTotal" : 59000.00, "Quantity" : 10, "OrderDate" : ISODate("2005-01-02"), "Country" : "India", "Items" : [ {ItemId:14, UnitPrice: 5900.00,Quantity: 10}], "PaymentInfo" : { "Method" : "Check", "Currency" : "INR" } } ) |
Here is how it would look like (We used RoboMongo i.e. Robo 3T) but you can use mongo.exe command line to issue above commands.
Example 1 – MongoDB Aggregation Query with COUNT(*) – $group, $project
Here is how to write COUNT(*) query in MongoDB. Below query will find order count by country. Each query must output _id field and you can refer Document column using $ prefix (i.e. “$Country” or for nested object use dot separator “$PaymentInfo.Method” ) . To output _id column with different name we have used Country: “$_id”. Because we including copy of _id we can remove _id column by doing _id: 0. To include column in output use YourColumanName: 1 in $project stage.
SQL Query:
1 2 3 |
SELECT Country,Count(*) as OrderCount FROM Orders GROUP BY Country |
Mongo Shell Query:
1 2 3 4 5 6 |
db.Orders.aggregate( [ {$group : {_id : "$Country",OrderCount: { $sum: 1 }}} ,{$project : {Country: "$_id", _id: 0, OrderCount: 1 }} ] ) |
SSIS MongoDB Source Query:
1 2 3 4 5 6 7 |
{Table: Orders,Method: aggregate} { pipeline: [ {$group : {_id : "$Country",OrderCount: { $sum: 1 }}} ,{$project : {Country: "$_id", _id: 0, OrderCount: 1 }} ] } |
Example 2 – MongoDB Aggregation Query with WHERE and ORDER BY- $match, $group, $project, $sort
Here is how to write COUNT(*) query in MongoDB. Below query will find order count by country.
SQL Query:
1 2 3 4 5 |
SELECT Country,Count(*) as OrderCount WHERE Quantity > 1 FROM Orders GROUP BY Country ORDER BY Country |
Mongo Shell Query:
1 2 3 4 5 6 7 8 |
db.Orders.aggregate( [ {$match : {Quantity : { $gt: 1 }}} ,{$group : {_id : "$Country",OrderCount: { $sum: 1 }}} ,{$project : {Country: "$_id", _id: 0, OrderCount: 1 }} ,{$sort : {Country: 1}} ] ) |
SSIS MongoDB Source Query:
1 2 3 4 5 6 7 8 |
{Table: Orders,Method: aggregate} { pipeline: [ {$match : {Quantity : { $gt: 1 }}} ,{$group : {_id : "$Country",OrderCount: { $sum: 1 }}} ,{$project : {Country: "$_id", _id: 0, OrderCount: 1 }} ,{$sort : {Country: 1}} ] } |
Example 3 – MongoDB Aggregation Query with WHERE and HAVING Clause – $match, $group
Here is an example of WHERE and HAVING clause. They both use $match stage but their orders are different. WHERE condition is applied before GROUP BY (i.e. $group) and HAVING condition is applied after GROUP BY.
SQL Query:
1 2 3 4 5 |
SELECT Country,Count(*) as OrderCount FROM Orders WHERE Quantity > 1 GROUP BY Country HAVING Count(*) > 0 |
Mongo Shell Query:
1 2 3 4 5 6 7 8 |
db.Orders.aggregate( [ {$match : {Quantity: { $gt: 1 }}} ,{$group : {_id : "$Country",OrderCount: { $sum: 1 }}} ,{$match : {OrderCount: { $gte: 1 }}} ,{$project : {Country: "$_id", _id: 0, OrderCount: 1 }} ] ) |
SSIS MongoDB Source Query:
1 2 3 4 5 6 7 8 |
{Table: Orders,Method: aggregate} { pipeline: [ { $group : {_id : "$Country",OrderCount: { $sum: 1 }} } ] } |
Example 4 – MongoDB Aggregation Query with $unwind (Flatten nested Array)
Now lets look at an example of $unwind operator. $unwind can be used to perform operations on nested array. Common things would be flatten the hierarchy and then apply group by operation. Lets find out Average Quantity of Items By Order Number
SQL Query:
1 2 3 4 |
SELECT Orders.OrderId as OrderNumber,Avg(Items.Quantity) as AvgItemQty FROM Orders JOIN Items ON Orders.OrderId = Items.OrderId GROUP BY Orders.OrderId |
Mongo Shell Query:
1 2 3 4 5 6 7 |
db.Orders.aggregate( [ {$unwind : "$Items"} ,{$group : {_id : "$OrderId",AvgItemQty: { $avg: "$Items.Quantity" }}} ,{$project : {OrderNumber: "$_id", _id: 0, AvgItemQty: 1 }} ] ) |
SSIS MongoDB Source Query:
1 2 3 4 5 6 7 8 |
{Table: Orders,Method: aggregate} { pipeline: [ {$unwind : "$Items"} ,{$group : {_id : "$OrderId",AvgItemQty: { $avg: "$Items.Quantity" }}} ,{$project : {OrderNumber: "$_id", _id: 0, AvgItemQty: 1 }} ] ) |
MongoDB Query with Dynamic Value / ISO DateTime
There will be a time when you have to supply Dynamic Value from Variable or supply Dynamic Date time inside your query. Dates are little different than other data types. Check this article to learn more about passing dates in MongoDB Query.
For Dynamic Date use SSIS Variables Format Specifiers
Example of DateTime query (Static Value)
1 2 3 4 5 6 7 8 |
{Table: Orders,Method: aggregate} { pipeline: [ {$match : {OrderDate : { $gt: new ISODate("2010-01-01") }}} ,{$group : {_id : "$Country",OrderCount: { $sum: 1 }}} ,{$project : {Country: "$_id", _id: 0, OrderCount: 1 }} ,{$sort : {Country: 1}} ] } |
Example of using Dynamic Value from Variable
To pass date from SSIS Variable you can use Format Specifiers. (e.g. yyyy-MM-dd or for full ISO date yyyy-MM-ddTHH:mm:ssZ )
1 2 3 4 5 6 7 8 |
{Table: Orders,Method: aggregate} { pipeline: [ {$match : {OrderDate : { $gt: new ISODate("{{User::varOrderDate,yyyy-MM-dd}}") }}} ,{$group : {_id : "$Country",OrderCount: { $sum: 1 }}} ,{$project : {Country: "$_id", _id: 0, OrderCount: 1 }} ,{$sort : {Country: 1}} ] } |
Conclusion
MongoDB is one of the most popular NoSQL / Document Storage Engine. SSIS PowerPack make super easy and efficient to read/write data in MongoDB without any coding effort. Download SSIS PowerPack to try out MongoDB features along with 70+ other components .
Search terms:
mongodb group by count
mongodb group by multiple fields
mongodb group by date
mongodb group example
mongodb group sort
mongodb group by month
mongodb group by having
mongodb group distinct
mongodb group by sum
mongodb aggregate
mongodb aggregate sort
mongodb aggregate unwind