SSIS – Loading data into MongoDB (Upsert, Delete, Update)

Introduction

MongoDB is one of the most poplar NoSQL database out there. In this article you will learn how to use ZappySys MongoDB Destination Connector to perform various bulk operations on MongoDB such as Insert, Update, Delete and Upsert. If you wish to learn how to Update MongoDB Array items then check this blog post instread.

 

MongoDB data read/write Example SSIS Package

Before we talk more about loading data into MongoDB lets look at screenshot and steps involved in our sample package for this tutorial. Below sample SSIS package uses 4 Custom SSIS Tasks

  1. SSIS MongoDB Source Connector (Used to read data from MongoDB Collection)
  2. SSIS MongoDB Destination Connector (Used to write data into MongoDB Collection also you can do other bulk operations such as MongoDB update, MongoDB upsert and MongoDB delete)
  3. SSIS MongoDB ExecuteSQL Task (Used to call any MongoDB Shell commands including MongoDB DDL or DML type operations such as CREATE/DROP, Insert, Update, Delete, Read, you can also call server side JavaScript)
  4. Dummy Data Source (Used to generate sample JSON Documents which can be loaded in MongoDB)

MongoDB Examples

Here is the list of various MongoDB examples included in attached sample package (see at the end for download links)

  • MongoDB example – How to read data from MongoDB collection using SQL query language
  • MongoDB example – How to load raw JSON document into MongoDB Collection (Bulk Insert with Reload)
  • MongoDB example – How to do MongoDB Upsert (Bulk Update or Insert) for specific fields or entire document
  • MongoDB example – How to perform MongoDB Delete ( Bulk Delete based on Matching Rows)
  • MongoDB example – How to create new MongoDB collection (Only if not exist – Use safe option)
  • MongoDB example – How to fetch MongoDB collection record count and save to SSIS variable
  • MongoDB example – How to fetch MongoDB collection size and save to SSIS variable
  • MongoDB example – How to get collection names and loop through them using ForEach Loop Task
  • MongoDB example – How to save query output into Recordset variable and loop though records using ForEach Loop Task

Video Tutorial – Update/Delete/Write/Bulk Load data into MongoDB from SQL Server/ORACLE

In below video tutorial you will see how easy it is to load data into MongoDB from any Datasource such as SQL Server, Oracle, MySQL, Flatfile etc. You will also learn how to product nested JSON from multiple tables using JSON Generator Transform. You can also use JSON Source with Output as Document Mode to load raw JSON data.

Video Tutorial – Read data from MongoDB

In the following video tutorial you will learn how to consume data from MongoDB Collection using SSIS MongoDB Source Connector. You will notice that it uses SQL query syntax rather than MongoDB specific native query syntax (JSON query). This makes it super easy to query data without any learning curve. Data stored in MongoDB is in JSON document format but Data coming out from MongoDB Source Connector is Flat table structure (It de-normalize nested nodes). You can also Query inner Hierarchy using JSON Path expression (e.g. query Orders from Customer Document using $.Customer.Orders[*] ) .

SSIS Example - Loading data into MongoDB, Read from MongoDB, Upsert, Upsert, Delete Insert JSON Documents, Execute Shell Commands

SSIS Example – Loading data into MongoDB, Read from MongoDB, Upsert, Upsert, Delete Insert JSON Documents, Execute Shell Commands

MongoDB Insert – Bulk Loading data into MongoDB

Using MongoDB Destination Connector you can easily Insert JSON documents in Bulk. There are two input modes for inserting records into MongoDB.

Simple Mode – Loading data in simple mode (array not allowed)

In simple loading mode you can map source columns to target column in MongoDB destination. If you Insert data using this mode then you cant load complex documents with array.

Document Mode – Loading JSON documents into MongoDB

In this mode you can insert/update/delete documents by mapping inbuilt __DOCUMENT__ input column which appears on mapping tab (Target columns). When you perform Insert you have two options (Insert (i.e. Append) and Reload (Truncate + Insert). When you select Operation=Reload in that case Collection is first Truncated and then new records are loaded.

In JSON document load mode if you don’t supply _id column part of your input JSON then new _id is automatically generated by MongoDB for newly inserted document.

Loading JSON Documents into MongoDB Collection (Map Raw Document Column)

Loading JSON Documents into MongoDB Collection (Map Raw Document Column)

Loading JSON files into MongoDB

Another scenario could be loading JSON files into MongoDB. You can load JSON Files into MongoDB two ways.

  • Use Flat File Source with just one column with DT_NTEXT datatype (use this method when you each JSON documents in one line and they are separated by new line as below screenshot)
  • Use JSON Source Connector with Output as Document option checked. You can map __DOCUMENT__ source column to Target MongoDB collection as below. Advantage of this method is you extract JSON from array too and specify many other advanced options.
    SSIS Extract JSON Documents or Sub Documents from JSON file or JSON Array

    SSIS Extract JSON Documents or Sub Documents from JSON file or JSON Array

Specifying LoadOptions

MongoDB Destination Connector supports following LoadOptions property (see Component Properties Tab on UI) which controls how target data gets modified. LoadOptions property is ignored for Reload and Insert operations.

  • op : this option specify update operators (Valid for Update, Replace and Upsert operation (i.e. UpdateOrInsert) ).  See this link for more information on various Update operators available in MongoDB. Most common operators listed below. Refer official MongoDB help for more information.
    • op=none : Replace entire document
    • op=$set  : Update only those fields specified in mapping. If Specified mapping field not available in target then its added to target document.
    • op=$push : Adds new item to destination array. See following examples (How to insert single or multiple documents into MongoDB Array by supplying JSON into  __DOCUMENT__ column
  • multi : This option controls if you want to Update or Delete multiple records for matching condition for each upstream input record.  For example if you joining by Country and City columns for update and target system finds 5 records then by default they all will be updated if multi=true is set in LoadOptions property. If you don’t want to update multiple records

MongoDB Update

MongoDB Destination Connector supports Batch Update and Batch Upsert Operations. There are few things to remember how Update operation works with MongoDB Destination Connector.

  • To perform Update operation you have to specify Join Criteria for record lookup on target (see ColumnsForLookup property).
  • Specify JOIN columns in ColumnsForLookup property. Use comma to specify multiple columns (e.g. CustomerID,OrderID). If you dont specify columns in this property then any columns mapped other than __DOCUMENT__ will be considered as JOIN column.
  • With MongoDB destination you can only specify simple Update condition by mapping JOIN input columns. Behind the scene it will construct search query for update. For example if you Map 3 input columns (__DOCUMENT__ , Country, State) and you specify LoadOptions : op=$set;multi=true then its similar as below SQL Statement.
  • Join columns can be individually mapped or can be supplied as part of JSON document mapped to __DOCUMENT__ column.
  • If __DOCUMENT__ column is not mapped then you must specify at least one column name in ColumnsForLookup property to identify search column.
  • If __DOCUMENT__ column is mapped and column name is not specified in ColumnsForLookup property then you must map at least one more input column which can treated as JOIN column

Update specific fields  ($set operator)

To update specific fields in target MongoDB collection you can map JOIN column(s) and then map fields you want to update or map __DOCUMENT__ if you suppling JSON document with fields you want to update.

See below screenshot where its updating Customers record with new coordinates  based on City and Country information.

SSIS MongoDB Update Bulk - $set update operator - Update specific fields - Update single or multiple documents

SSIS MongoDB Update Bulk – $set update operator – Update specific fields – Update single or multiple documents

Update single or multiple target records for single input row

If you want to update all matching records in target then set LoadOptions property to op=$set,multi=true
If you want to update only one record per matching criteria in target then set LoadOptions property to op=$set,multi=false

Update entire document (i.e. Overwrite/Replace documents)

If you like to replace entire target document then you can do two ways. Few things you have to change for MongoDB Destination.

Approach#1 

  1. Select Operation=Replace (Rather than Update or Upsert)
  2. Specify Join column in ColumnsForLookup
  3. On the mapping tab Map __DOCUMENT__ (Assuming you have Raw JSON coming from Upstream)

Approach#2

Another approach is use $set operator in LoadOptions property then it will only update specified fields in mappings. But if you want to overwrite entire document (Replace document) then set LoadOptions property to op=none,multi=true or op=none,multi=false (If you want to update one record for matching condition)

Insert single item into MongoDB Array using $push operator

If you want to insert new items into array then set LoadOptions property to op=$push,multi=true or op=$push,multi=false (If you want to update one record for matching condition). You have to map __DOCUMENT__ column on target mappings tab to supply item(s) or document(s) you want to insert into target array.

Insert multiple items into MongoDB Array using $push operator along with $each

Inserting multiple items into array requires use of $each modifier as below. Use above instructions except input document (e.g. data passed to __DOCUMENT__ column) will look like below (see how $each used).

Other MongoDB Update Operators

So far we have seen $set and $push update operators. To learn more about other operators see this link

MongoDB Update Array Items ($pull, $push, $addToSet)

If you wish to update items found inside nested array of document then Check this full length article

MongoDB Upsert – Bulk Update or Insert JSON documents

If you wish to perform Bulk Upsert using MongoDB Destination connector then select Upsert Action from dropdown (Action=UpdateOrInsert). In Upset if document is not found for matching criteria then new document is inserted in MongoDB Collection. If document is found then Update operation occurs. To set specific fields use op=$set in LoadOptions   and if you wish to overwrite entire document then use $op=none in LoadOptions

SSIS MongoDB Upsert - Bulk Update or Insert JSON documents

SSIS MongoDB Upsert – Bulk Update or Insert JSON documents

MongoDB Delete – Bulk delete MongoDB JSON documents

To delete documents from collection you can set operation=Delete on Component properties page. For delete you can simply map _id column on target or just supply columns based on you want to search records you want to delete.

To delete maximum one record for each input row set LoadOptions property to op=none,multi=false or set op=none,multi=false if you want to delete more than one record for matching condition.

MongoDB Collection Reload – Remove All (Truncate collection)

To reload collection set operation to Reload on Component properties page. Reload operation first clears target collection and perform bulk Insert operation

Generate JSON for MongoDB data load

Now lets talk about different ways to produce simple or complex JSON using ZappySys components.  There are 3 ways you can produce JSON which can be loaded into MongoDB Collection. Following section will describe each approach and you can select which one best suited for you.

Generate JSON – Using Export JSON Task

Export JSON Task can export JSON to File or variable which you can consume via JSON Source Connector (Check Output as Document option) and supply to MongoDB Destination Connector (map to __DOCUMEN__).

Check this link for more info :  Export Json from sql server using ssis

Generate JSON – Using JSON Generator Transform

JSON generator transform is useful when you want to Produce JSON inside pipeline from single or multiple input.

Check this link for more info :  JSON generator Transform (Watch Video on this page)

JSON Generator Transform - Test JSON Path Settings

JSON Generator Transform – Generate JSON documents and load into MongoDB

Generate JSON – Using Dummy Data Source (Fake JSON for Testing)

If you need to produce JSON for testing purpose then use Dummy Data Source (Select JsonDocuments Template from dropdown and click ok)

Query/Read data from MongoDB

Visit following links to learn more about reading data from MongoDB.

SSIS MongoDB Source Homepage
MongoDB SQL Query Reference
MongoDB data extract – from nested sub documents and array

Execute MongoDB Shell commands in SSIS (e.g. Create/Drop collection)

MongoDB SOurce and Destination Connectors are great for read/write data. But what if you want to issue ad-hoc MongoDB shell command such as create collection/drop collection ? No worry.. SSIS PowerPack v2.0 released new Task called SSIS MongoDB ExecuteSQL Task This task comes with many examples listed on the UI itself (Look for Examples dropdown menu on UI). You can use SSIS MongoDB ExecuteSQL Task to perform many operations such as

  • Create/Drop MongoDB collections
  • Get collection list as ADO.net recordset so you can loop through using ForEachLoop task
  • Call serverside JavaScript
  • Fetch data as ADO.net recordset and store in SSIS variable so you can loop using ForEachLoop task
  • Drop database
  • Call many shell commands

Download MongoDB Example Files used in this tutorial

To test this package you first need to download SSIS PowerPack
Once SSIS PowerPack installed you can download following SSIS sample project file(s).

Download sample files – SSIS 2008
Download sample files – SSIS 2012
Download sample files – SSIS 2014
Download sample files – SSIS 2016

Conclusion

MongoDB makes it super easy to store JSON documents and gives you whole new way of manipulating documents with ease of use. ZappySys SSIS Connectors and Tasks for MongoDB makes it possible to implement various data integration scenarios with MongoDB. Click here to Download SSIS PowerPack and try it your self.

Keywords:
Loading data into MongoDB | mongodb upsert array | mongodb upsert example | mongodb insert or update | mongo insert or update | mongo upsert | mongo insert JSON | mongo upsert JSON | update data into MongoDB

Posted in SSIS MongoDB Destination and tagged , , , , , , , , , , , , , .