How to write MongoDB Aggregation Queries in SSIS (Group By)

Introduction

In this post you will learn how to write flexible MongoDB Aggregation Queries in SSIS (i.e. Group By Query) using SSIS MongoDB Source Component. MongoDB provides powerful Aggregation Pipeline Engine which is conceptually same as writing Group By queries in traditional SQL world but its not exactly same. SSIS MongoDB Source returns nested MongoDB JSON documents into flat looking table with rows and column (denormalized data).

If you want to learn how to write data to MongoDB (Insert, Update, Delete, Upsert) then click here. If you want learn how to read data from MongoDB then click here.

So let’s explore concepts of MongoDB Aggregation Pipeline and see some MongoDB query examples for SSIS use case.

Prerequisites

Before we perform steps listed in this article, you will need to make sure following prerequisites are met:

  1. SSIS designer installed. Sometimes it is referred as BIDS or SSDT (download it from Microsoft site).
  2. Basic knowledge of SSIS package development using Microsoft SQL Server Integration Services.
  3. Make sure ZappySys SSIS PowerPack is installed (download it).
  4. Optional (If you want to Deploy and Schedule ) – Deploy and Schedule SSIS Packages

SSIS MongoDB Source Introduction

SSIS MongoDB Source supports three modes to read data from MongoDB collection. Each mode also supports JSONPath Expression to extract nested array.

  1. Table Mode
  2. SQL Query Mode
  3. JSON Query Mode (Native)

To read more about supported SQL Syntax read help file here.

Using SSIS MongoDB Source to Query Data (Aggregation)

Now lets look at how to use SSIS MongoDB Source to query MongoDB data using Aggregation Engine.

MongoDB Aggregation Pipeline Syntax in SSIS

SSIS MongoDB Source support all pipeline commands to write aggregation queries but its slighly different syntax than shell command so your SHell Query won’t work directly but you can copy some part of your Shell query. Check below Syntax.

MongoDB Native Query Syntax (Shell Command) for Aggregation:
db.YourCollectionName.aggregate(
[
[ <pipeline-stage1>]
[,<pipeline-stage2>]
...
...
[,<pipeline-stageN>]
] )

SSIS MongoDB Source Query Syntax for Aggregation:
{Table:YourTableName ,Method:aggregate [,MaxRows: N] [,AllowDiskUse:true] [,AllowCursor:true] [,Timeout:#####] }
{
pipeline: [
[ <pipeline-stage1>]
[,<pipeline-stage2>]
...
...
[,<pipeline-stageN>]
] }

Example Aggregation Queries (SQL to SSIS Mapping)

Assume that we want to write following SQL Query using SSIS MongoDB Source or using MongoDB Shell

Above SQL Query can be translated to below JSON Query which can be used in SSIS MongoDB Source. As we noted before Below syntax is not same as Shell command so it will only run in SSIS MongoDB Source.

So as you saw in above query various clause of SELECT queries are defined as pipeline stages. You can add any possible stages in different order. Based on their order it will be applied so order of stage is very important.  For more information about various pipeline stages click here. Below is high level translation of SQL Query to MongoDB Aggregation Query.

SSIS MongoDB Source - Example of MongoDB Group By Query (Aggregation Framework - Pipeline)

SSIS MongoDB Source – Example of MongoDB Group By Query (Aggregation Framework – Pipeline)

SQL to MongoDB Aggregation Mapping

SQL Clause Mongo Pipeline Stage
SELECT $project
WHERE $match
GROUP BY $group
HAVING $match
ORDER BY $sort

Now let’s look at few examples how to translate traditional SQL  Queries to JSON Query for SSIS MongoDB Source. We will use sample data we inserted in previous section.

Creating Sample Data for MongoDB Query Examples

For this exercise to write sample queries using below data. So let’s first create a collection called Order and insert following records using your favourite MongoDB client tool.

Here is how it would look like (We used RoboMongo i.e. Robo 3T) but you can use mongo.exe command line to issue above commands.

Inserting Example data for MongoDB Aggregation Queries (RoboMongo / Robo 3T)

Inserting Example data for MongoDB Aggregation Queries (RoboMongo / Robo 3T)

Example 1 – MongoDB Aggregation Query with COUNT(*) – $group, $project

Here is how to write COUNT(*) query in MongoDB. Below query will find order count by country. Each query must output _id field and you can refer Document column using $ prefix (i.e. “$Country” or for nested object use dot separator “$PaymentInfo.Method” ) . To output _id column with different name we have used Country: “$_id”. Because we including copy of _id we can remove _id column by doing  _id: 0.  To include column in output use YourColumanName: 1 in $project stage.

SQL Query:

Mongo Shell Query:

SSIS MongoDB Source Query:

Example 2 – MongoDB Aggregation Query with WHERE and ORDER BY- $match, $group, $project, $sort

Here is how to write COUNT(*) query in MongoDB. Below query will find order count by country.

SQL Query:

Mongo Shell Query:

SSIS MongoDB Source Query:

Example 3 – MongoDB Aggregation Query with WHERE and HAVING Clause – $match, $group

Here is an example of WHERE and HAVING clause. They both use $match stage but their orders are different. WHERE condition is applied before GROUP BY (i.e. $group) and HAVING condition is applied after GROUP BY.

SQL Query:

Mongo Shell Query:

SSIS MongoDB Source Query:

Example 4 – MongoDB Aggregation Query with $unwind (Flatten nested Array)

Now lets look at an example of $unwind operator. $unwind can be used to perform operations on nested array. Common things would be flatten the hierarchy and then apply group by operation. Lets find out Average Quantity of Items By Order Number

SQL Query:

Mongo Shell Query:

SSIS MongoDB Source Query:

MongoDB Query with Dynamic Value / ISO DateTime

There will be a time when you have to supply Dynamic Value from Variable or supply Dynamic Date time inside your query. Dates are little different than other data types. Check this article to learn more about passing dates in MongoDB Query.

For Dynamic Date use SSIS Variables Format Specifiers

Example of DateTime query (Static Value)

Example of using Dynamic Value from Variable

To pass date from SSIS Variable you can use Format Specifiers. (e.g. yyyy-MM-dd  or for full ISO date yyyy-MM-ddTHH:mm:ssZ )

Conclusion

MongoDB is one of the most popular NoSQL / Document Storage Engine. SSIS PowerPack make super easy and efficient to read/write data in MongoDB without any coding effort. Download SSIS PowerPack to try out MongoDB features along with 70+ other components .

Search terms:
mongodb group by count
mongodb group by multiple fields
mongodb group by date
mongodb group example
mongodb group sort
mongodb group by month
mongodb group by having
mongodb group distinct
mongodb group by sum
mongodb aggregate
mongodb aggregate sort
mongodb aggregate unwind

 

Posted in SSIS MongoDB Source and tagged .