Introduction
In our previous blog post we saw how to perform Read and Write operations in MongoDB using SSIS (i.e. Bulk Update, Delete, Upsert, Insert). In this post we specifically focus on how to update MongoDB Array items / elements using SSIS.
To make things simple to follow we have used JSON Source to produce sample JSON documents for MongoDB update (NOTE: Output as Document option is checked for JSON Source). However in real world you may have to use SSIS JSON Generator Transform to produce input documents for MongoDB. To learn more about generating JSON documents for MongoDB Load process check this blog post or watch video found on this page.
Video Tutorial – Example of insert/update documents inside MongoDB Array (Use $set, $push operator)
Basic Concepts: Performing MongoDB CRUD operations using SSIS
If you never heard term called CRUD stands for Create, Read, Update, Delete. These are the most common operations you have to do with any data source. SSIS PowerPack comes with many MongoDB Components to help you with CRUD operations. You can use one or more following components to achieve drag and drop MongoDB CRUD operations in SSIS. When you use SSIS PowerPack for MongoDB all operations will be Bulk Operations by default which means it will provide very high throughput.
Download Example SSIS Package
Here is the link to download Sample SSIS Package
Update MongoDB Array Items ($set operator)
In next couple of sections we will see how to perform Update, Insert and Delete operations with MongoDB Array Items using various update operators and array operators. Consider the following Sample JSON documents for MongoDB. First we will see how to use MongoDB Shell commands to perform CRUD operations on MongoDB Array items.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
{ "_id" : ObjectId("5883ebe669e0f22ba890b49b"), "CustomerID" : "BOLID", "Name" : "Some name", "Orders" : [ { "OrderID" : 200, "ShipCountry" : "USA", "City" : "Atlanta" }, { "OrderID" : 201, "ShipCountry" : "USA", "City" : "New York" } ] } |
Update Array element using Shell command
Assume you want to update ShipCountry attribute of one of the Orders (OrderID=200) and for CustomerID=”BOLID” in above sample document.
Shell Command for Update Array Item
Use below shell command in your favorite MongoDB Client Tool (e.g. RoboMongo). Notice two things in below command because we will apply same concepts when we take this approach to SSIS. In below command Actual JSON document is wrapped inside. This is MongoDB Update operator. Click here to learn more about $set. Also notice that rather than nested Array syntax we used Orders.$.ShipCountry to update ShipCountry which is part of Array element. MongoDB provides special way to navigate array elements using $ Operator (MongoDB Positional Update)
1 2 3 4 5 |
db.MyCollection.update( { "CustomerID" : "BOLID", "Orders.OrderID" : 200 }, { $set : { "Orders.$.ShipCountry" : "Test1"} } , { upsert: false, multi: false} ) |
After Update (MongoDB Document)
After you run above shell command notice all other attributes remain there in Array document and only ShipCountry is changed. This is because we used $ Operator (MongoDB Positional Update) rather than supplying Raw JSON for Array document (i.e. Orders : [ { … } ] ).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
{ "_id" : ObjectId("5883ebe669e0f22ba890b49b"), "CustomerID" : "BOLID", "Name" : "Some name", "Orders" : [ { "OrderID" : 200, "ShipCountry" : "Test1", "City" : "Atlanta" }, { "OrderID" : 201, "ShipCountry" : "USA", "City" : "New York" } ] } |
The issue with this approach its not Bulk operation. Assume that you have 100,000 records to update in various documents and its nested array items then row by row operation is not effective. To solve this issue we will use SSIS MongoDB Destination which performs Bulk operations (e.g. Bulk Update, Delete, Insert and Upsert).
Update MongoDB Array Items using SSIS
Now lets implement above scenario using SSIS. Assume that you have sample MongoDB document (as per above example – Before Update). We want to update ShipCountry for OrderID=200 for Customer BOLID. To do that perform following steps in SSIS. If you are updating Second level array (i.e. Root >> Orders >> OrderItems) then you have follow slight different process. Check next section in this article about updating second level array (Insert items)
- Download and Install SSIS PowerPack
- Create new SSIS Project with one data flow task
- Drag ZS JSON Source and check Output as Document option (as below sceenshot). You can use JSON Source is your JSON is stored inside file or Variable or coming from Some API call which eventually go into MongoDB. You can also use JSON Source for quick testing where you Hardcode JSON to feed to MongoDB.
- Enter the following JSON in the text area. Click OK to save.
1234{"CustomerID":"BOLID", "Orders.$.OrderID":201, "Orders.$.ShipCountry": "Test1"}{"CustomerID":"BOLID", "Orders.$.OrderID":202, "Orders.$.ShipCountry": "Test2"}/**Below is extra record not found in destination - will be ignored in Update**/{"CustomerID":"BOLID", "Orders.$.OrderID":203, "Orders.$.ShipCountry": "Test3"} - Right click in Connections area and Create new ZS-MONGODB connection. Specify credentials and click Test before you hit OK to save.
- Now drag ZS MongoDB Destination from SSIS Toolbox.
- Select Runtime connection.
- On the properties tab edit following properties
- Set ColumnsForLookup property as below (Notice how we use 2 columns for JOIN. Also used AS keywords for 2nd Column so we have <column-from-input-doc> AS <column-for-server-side-find>
CustomerID,Orders.$.OrderID AS Orders.OrderID - Set LoadOptions as below (Op is Update Operator you like to use. In our case its $set, Multi=True means if multiple match found then all matching documents will be updated). If you want to use multiple update operators then you have to supply that from Input document and use op=none option rather op=$set.
op=$set;multi=true - Set Operation property to Update
- Set ColumnsForLookup property as below (Notice how we use 2 columns for JOIN. Also used AS keywords for 2nd Column so we have <column-from-input-doc> AS <column-for-server-side-find>
- On the Mappings tab Map __DOCUMENT__ from upstream (In our case you will see same name column from JSON Source if you checked Output as Document option)
- Click OK to save MongoDB Destination UI
- Run package. You will notice after execution two orders of CustomerID=BOLID will be updated.
Insert MongoDB Array Item ($push / $addToSet Operator)
Now lets look at how to insert new item into existing Array. Lets say we already have two orders for a customer and now we want to add new order. For this you can either use $push operator or $addToSet operator. $push operator doesn’t check for duplicate item so its little faster. If you wish to skip append if item already exists (by matching all attributes) then use $addToSet operator.
Using Shell Command : Insert MongoDB Array Item (Append to array)
Here is the shell command which will push new item to the array
1 2 3 4 5 |
db.MyCollection.update( {CustomerID:"BOLID"}, {$push:{Orders: {"OrderID":202,"ShipCountry":"USA","ShipCity":"Atlanta" } }}, {upsert: false,multi: false} ) |
Using Shell Command : Insert MongoDB Array Item at second level
Here is the shell command which will push new item to the array which is stored at second level e.g. { Orders : [ { Items : [ insert-here ] } ] }
1 2 3 4 5 |
db.MyCollection.update( {CustomerID:"BOLID", "Orders.OrderID" : 202 }, {$push:{ "Orders.$.Items" : {"ProductID":1001,"Quantity":3} } }, {upsert: false,multi: false} ) |
Using SSIS : Insert MongoDB Array Item at Second Level
Now lets look at some example how to Insert Items inside Orders array. Since it requires two JOIN columns to perform second level array insert you may need to add metadata for join.
First step is create documents for update. There are atleast 4 ways you can create input documents for MongoDB Destination.
- You can use JSON generator Transform to create documents for load process
- You can use Export JSON Task to generate input documents file and then use JSON Source to Read Documents (Check Output as Document option on JSON Source to read as RAW JSON)
- You can use JSON Source to feed direct JSON (This approach is demonstrated in the below example sceenshot)
- You can also use SQL query to output JSON Strings along with columns used for JOIN condition
1select CustomerID, OrderID as 'Orders.$.OrderID', DOC = '{Orders.$.Items : {ItemID: 1, Qty: 5} }'
To make it simple lets use #3 approach from above list. We will create few documents and insert New orders Items
Sample Document for MongoDB Collection
For testing purpose you can create MongoDB collection called test with following one document. As you see Items array is empty for both orders but once you run sample each will have 2 items.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
{ "_id" : ObjectId("58c01bee0610fea8ddda04b9"), "CustomerID" : "ALFKI", "Orders" : [ { "OrderID" : 100, "Items" : [ ] }, { "OrderID" : 101, "Items" : [ ] } ] } |
Now lets look at how to configure SSIS package
Step-1 : Create JSON Documents for Array Insert
Lets use JSON Source like below. Use following Sample JSON
1 2 3 4 5 6 7 |
/*Insert new items into order#100*/ {CustomerID:"ALFKI","Orders.$.OrderID" : 100, Doc: "{ \"Orders.$.Items\" : {ItemID: 1, Qty: 10} }" } {CustomerID:"ALFKI","Orders.$.OrderID" : 100, Doc: "{ \"Orders.$.Items\" : {ItemID: 2, Qty: 11} }" } /*Insert new items into order#101*/ {CustomerID:"ALFKI","Orders.$.OrderID" : 101, Doc: "{ \"Orders.$.Items\" : {ItemID: 1, Qty: 20} }" } {CustomerID:"ALFKI","Orders.$.OrderID" : 101, Doc: "{ \"Orders.$.Items\" : {ItemID: 2, Qty: 21} }" } |
Step-2 : Configure MongoDB Destination for $push operation on Second level Array
Now lets connect source to Destination and configure MongoDB Destination. Select connection on first tab. Then On second tab change properties like below.
Step-3 : Add missing JOIN column for Mapping
When you click on Mappings tab you will notice only root level columns are listed there for mapping but if you are JOINing based on Nested column found inside array then you have to add missing mapping column by hand like below screenshot. In our case we Join by CustomerID and OrderID where OrderID is found in Orders array which is missing on mappings tab. So lets add it. Just make sure name new column using $ positional operator… e.g. Orders.$.OrderID this will tell MongoDB that our OrderID column must be lookup under Array called Orders: [ ]
Step-4 : Map JOIN columns and Input DOCUMENT
Now finally map all columns
Using SSIS : Insert MongoDB Array Item (Append to array)
Now lets look at how to perform same operation to insert array item in SSIS using $push operator. In this case we will do little different than Update Item example ($set operator). First we have not included CustomerID as part of Input columns and we mapped it on Mapping Screen rather than supplying columns names in ColumnsForLookup property (On Component Properties Tab of Destination UI). If you are using JSON Generator Transform to generate your JSON then you have option to Hide/Include certain columns part of your flow. Check Include column to downstream option if you want to use for JOIN column.
Set following settings
- Set LoadOptions as below
op=$push;multi=true - Set Operation=Update
- Leave ColumnsForLookup as empty
- On the mappings tab map __DOCUMENT__ and CustomerID in the Target. If you don’t see CustomerID column in the mapping create atleast one sample doc in MongoDB Collection so metadata is detected.
Download Sample for more detail.
Delete MongoDB Array Item ($pull operator)
If you wish to remove array item then you can use $pull operator. Only difference is rather than specifying two conditions in first argument… you just specify JOIN columns without array column (i.e. OrderID). To specify Array item condition you have to set inside $pull : { … }
Using Shell Command for MongoDB Delete Array Item (Remove by condition)
Below example shows how to remove OrderID=201 from Orders array for CustomerID=BOLID.
1 2 3 4 5 |
db.MyCollection.update( {CustomerID:"BOLID"}, {$pull:{Orders: {"OrderID":201} }}, {upsert: false,multi: false} ) |
Using SSIS for MongoDB Delete Array Item
Now lets look at how to implement Delete logic for MongoDB Array Item in SSIS. As you see in below screenshots You have to modify following things
- Set ColumnsForLookup property to CustomerID (no need to add OrderID here because its part of input document)
- Set LoadOptions property to op=none;multi=false. op=none means we are supplying Update operator part of input document.
Assume you have doc like below in MongoDB and you like to remove Order#101 from array
1 2 3 4 5 6 7 8 9 |
{ "_id" : ObjectId("58c01bee0610fea8ddda04b9"), "ParentID" : 1, "Orders" : [ 100, 101, 102 ] } |
For this supply 2 columns from upstream
col1 = ParentID
col2 = DOCUMENT
Document column should look like below
{Orders: 101}
- On MongoDB destination change LoadOptions property to op=$pull;multi=true
- Change Operation to Update
- Map ParentID and DOCUMENT columns on Mapping screen
- Run the package … It will remove Order# 101 from array
Array Upsert – Update / Insert using Custom JOIN condition (such as $ne )
If you like to perform Upsert Operation on MongoDB Array Items then refer to below blog post.
Conclusion
MongoDB integration can be challenging if you are new to NoSQL world. If you are using SSIS as your primary ETL tool then not to worry because SSIS PowerPack can give super power needed to complete your project on time with drag and drop high performance connectors.
Keywords: mongodb update array item | mongodb update array elements | mongodb update array documents | MongoDB $ operator | MongoDB update sub document items | MongoDB CRUD operations | MongoDB Bulk Updates | MongoDB bulk updates to array items