Contents
Introduction
In our previous post, we saw how to read salesforce data using SSIS Salesforce Source (Which uses SOAP API under the hood). This may be slow if you have millions of rows because Salesforce Source paginates every 2000 rows making so its not good option to read millions of rows to iterate on a large salesforce dataset (Table or Query). To increase the read speed you can use BULK API. This article explains how to read salesforce data using BULK API and increase read speed.
Prerequisite
Prerequisites
Before we perform steps listed in this article, you will need to make sure following prerequisites are met:- SSIS designer installed. Sometimes it is referred as BIDS or SSDT (download it from Microsoft site).
- Basic knowledge of SSIS package development using Microsoft SQL Server Integration Services.
- Make sure ZappySys SSIS PowerPack is installed (download it).
- Optional (If you want to Deploy and Schedule ) - Deploy and Schedule SSIS Packages
Step-By-Step
In this section we will see how to query using Salesforce Bulk API in SSIS. It involves 3 simple steps like below.
- Call BULK API – Submit SOQL Query and receive a JobId back (Use SSIS Salesforce API Task)
1https://{your-instance}.salesforce.com/services/data/v50.0/jobs/query - Keep checking Job Status Until Status is JobComplete (Use SSIS Salesforce API Task)
1https://{your-instance}.salesforce.com/services/data/v50.0/jobs/query/{{User::JobId}} - Read Query Result (Use SSIS CSV Source)
1https://{your-instance}.salesforce.com/services/data/v50.0/jobs/query/{{User::JobId}}/results
Now lets look at everything in detail.
Obtain Salesforce Credentials
First step to call Salesforce API is obtain credentials (security token or OAuth ClientId /Secret)
Method#1 – Check this article to use UserId/ Password / SecretToken
Method#2 – Check this article to use OAuth Connection Option
Create Salesforce Connection
Once we have Salesforce Credentials we can create a new Salesforce Connection (You can do from Task UI or below way)
- Open Visual Studio ( Assuming you already have SSIS installed along with ZappySys PowerPack )
- Create New SSIS Package Project.
- Right click in the Connection Manager panel and click on New Connection.
- Select ZS-SALESFORCE Connection Manager from the Connection Managers list and Click on Add Button.
- Now in Connection Manager UI, Enter your SalesForce Credentials (OAuth Client Id/. Secret –or— User Name, Password and Security Token), and leave all other properties as it is.
Call Create Job – Retrieve / Save JobId
Once connection is created we can create our first step. Basically we will call this API to create Job (Bulk Read)
- Here, In Visual Studio, drag and drop the ZS Salesforce API Task in the design surface.
- Double click the Task and configure Request Settings Tab like below
– Select Salesforce Connection (Created in the previous section)
– Check Use direct URL
– Enter URL like below (Replace {your-instance} with your own instance (i.e. na9) . For more information on how to findout my
salesforce instance id check this article
1https://{your-instance}.salesforce.com/services/data/v50.0/jobs/query
– Enter Body like below (You can use your own SOQL query). Keep in mind SQOL must be encoded for JSON format. Means entire SQL must be in one line (replace new lines with \r\n , quotes with \” and tabls with \t)
For example to execute SELECT Id, Name FROM Accounts you can write something like below in BODY.
1234{"operation": "query","query": "SELECT Id, Name\r\nFROM Account"}
12SELECT Id, NameFROM Account - Now Click Test Request (Copy the JobId which we will use in the Variable for Designtime Testing)
- On Response Tab configure like below.
– Select Response content Type = Json
– Enter $.id as expression
– Check Save response option
– Select Save to Variable and click New to create variable. Name it JobId and enter a sample JobId value (copied from previous step)
Do Status Check – Look for JobComplete
Once we create a job and store JobId in the variable we have to keep checking Job status every X seconds. Don’t worry you don’t have to use Loop or timer control for that. ZappySys made it so simple to check such thing without Loop /Timer.
- Drag and drop the ZS Salesforce API Task in the design surface.
- Double click and configure like below. URL will be different this time. Use variable as below and replace your instance ID
1https://{your-instance}.salesforce.com/services/data/v50.0/jobs/query/{{User::JobId}} - Now go to Status Check Tab and configure like below. Basically it will keep looking for below words in the response to decide Success or Failure. If those words not found then it will keep checking every 5 seconds (Maximum 1200 secs).
– Check Enable Status Check option
– For Success Value – Enter JobComplete
-Check Fail task option
– Enter Cancel|Failed|Abort in the text box
Read data (Compressed CSV Format)
Last step is read the result from completed Salesforce Job. Its in CSV format so we will use Advanced SSIS CSV Source which supports API connection to Salesforce. For that we will use same Salesforce API Task used before.
- Drag and Drop SSIS Data Flow Task from SSIS Toolbox.
- Double click on the DataFlow task to see DataFlow designer surface.
- Here, In Visual Studio, drag and drop the ZS CSV Source (Web API or File) in the design panel
- Double click on ZS CSV Source (Web API or File) Configure it like below.
– Enter URL to read result like this (You can supply OPTIONAL maxRecords parameters to control how much data comes in a single response)
123https://{your-instance}.salesforce.com/services/data/v50.0/jobs/query/{{User::JobId}}/results--OR--https://{your-instance}.salesforce.com/services/data/v50.0/jobs/query/{{User::JobId}}/results?maxRecords=50000
– Click Preview (assuming JobId variable has a valid completed JobId) - Now Setup Pagination as below to read all records using automated loop. You will need VERSION Released after Jun 20, 2022 to use Regular Expression feature in Header Value extractionBy default, Salesforce doesn’t send all records in response so we need to setup pagination. In below example we are reading Sforce-Locator attribute from Response and sending it to the next URL as locator=<continue-token> value
– Choose Response Header Contains Continue Token
– Enter below expression in Page Num Indicator
1locator=Sforce-Locator(^((?!null\b).)*$) - Click OK to save and now run entire package like below.
Conclusion
In this article we saw how to use various ZappySys Components / Task to achieve complex Salesforce API calls to read Query data using Bulk API. Download SSIS PowerPack and try yourself and also explore 70+ other tasks /components not discussed in this article.