How to query salesforce data using Bulk API in SSIS

Introduction

In our previous post, we saw how to read salesforce data using SSIS Salesforce Source (Which uses SOAP API under the hood). This may be slow if you have millions of rows because Salesforce Source paginates every 2000 rows making so its not good option to read millions of rows to iterate on a large salesforce dataset (Table or Query). To increase the read speed you can use BULK API. This article explains how to read salesforce data using BULK API and increase read speed.

Prerequisite

Prerequisites

Before we perform the steps listed in this article, you will need to make sure the following prerequisites are met:
  1. SSIS designer installed. Sometimes it is referred to as BIDS or SSDT (download it from the Microsoft site).
  2. Basic knowledge of SSIS package development using Microsoft SQL Server Integration Services.
  3. Make sure ZappySys SSIS PowerPack is installed (download it, if you haven't already).
  4. (Optional step). Read this article, if you are planning to deploy packages to a server and schedule their execution later.

Step-By-Step

In this section we will see how to query using Salesforce Bulk API in SSIS. It involves 3 simple steps like below.

  1. Call BULK API – Submit SOQL Query and receive a JobId back (Use SSIS Salesforce API Task)
  2. Keep checking Job Status Until Status is JobComplete (Use SSIS Salesforce API Task)
  3. Read Query Result  (Use SSIS CSV Source)
Query Salesforce Data using Bulk API in SSIS (read using SOQL query in Bulk mode / large dataset)

Query Salesforce Data using Bulk API in SSIS (read using SOQL query in Bulk mode / large dataset)

Now lets look at everything in detail.

Obtain Salesforce Credentials

First step to call Salesforce API is obtain credentials (security token or OAuth ClientId /Secret)

Method#1 – Check this article to use UserId/ Password / SecretToken
Method#2 – Check this article to use OAuth Connection Option

Create Salesforce Connection

Once we have Salesforce Credentials we can create a new Salesforce Connection (You can do from Task UI or below way)

  1. Open Visual Studio ( Assuming you already have SSIS installed along with ZappySys PowerPack )
  2. Create New SSIS Package Project.
  3. Right click in the Connection Manager panel and click on New Connection.
    SSIS Create New Connection
  4. Select ZS-SALESFORCE Connection Manager from the Connection Managers list and Click on Add Button.
    SSIS Salesforce Connection Manager - ADD
  5. Now in Connection Manager UI, Enter your SalesForce Credentials (OAuth Client Id/. Secret –or— User Name, Password and Security Token), and leave all other properties as it is.
    SSIS Salesforce Connection Manager UI

Call Create Job – Retrieve / Save JobId

Once connection is created we can create our first step. Basically we will call this API to create Job (Bulk Read)

  1. Here, In Visual Studio, drag and drop the ZS Salesforce API Task in the design surface.SSIS Salesforce API Task - Drag and Drop
  2. Double click the Task and configure Request Settings Tab like below
    – Select Salesforce Connection (Created in the previous section)
    – Check Use direct URL
    – Enter URL like below (Replace {your-instance} with your own instance (i.e. na9) . For more information on how to findout my
    salesforce instance id check this article
    – Request Method POST
    – Enter Body like below (You can use your own SOQL query). Keep in mind SQOL must be encoded for JSON format. Means entire SQL must be in one line (replace new lines with \r\n , quotes with \” and tabls with \t)
    For example to execute SELECT Id, Name FROM Accounts you can write something like below in BODY.
    Above is for below (with new line)
    Create Job - Call Bulk API to Read / Query using SOQL

    Create Job – Call Bulk API to Read / Query using SOQL

  3. Now Click Test Request (Copy the JobId which we will use in the Variable for Designtime Testing)
  4. On Response Tab configure like below.
    – Select Response content Type = Json
    – Enter $.id as expression
    – Check Save response option
    – Select Save to Variable and click New to create variable. Name it JobId and enter a sample JobId value (copied from previous step)
    Extract Salesforce JobId save to SSIS Variable

    Extract Salesforce JobId save to SSIS Variable

Do Status Check – Look for JobComplete

Once we create a job and store JobId in the variable we have to keep checking Job status every X seconds. Don’t worry you don’t have to use Loop or timer control for that. ZappySys made it so simple to check such thing without Loop /Timer.

 

  1. Drag and drop the ZS Salesforce API Task in the design surface.SSIS Salesforce API Task - Drag and Drop
  2. Double click and configure like below. URL will be different this time. Use variable as below and replace your instance ID
    Check Salesforce Job Status Periodically (Without Loop + Timer Controls)

    Check Salesforce Job Status Periodically (Without Loop + Timer Controls)

  3. Now go to Status Check Tab and configure like below. Basically it will keep looking for below words in the response to decide Success or Failure. If those words not found then it will keep checking every 5 seconds (Maximum 1200 secs).
    – Check Enable Status Check option
    – For Success Value – Enter JobComplete
    -Check Fail task option
    – Enter Cancel|Failed|Abort in the text box
    Configure Job Status Check Loop using Salesforce API Task

    Configure Job Status Check Loop using Salesforce API Task

Read data (Compressed CSV Format)

Last step is read the result from completed Salesforce Job. Its in CSV format so we will use Advanced SSIS CSV Source which supports API connection to Salesforce. For that we will use same Salesforce API Task used before.

  1. Drag and Drop SSIS Data Flow Task from SSIS Toolbox.
    SSIS Data Flow Task - Drag and Drop
  2. Double click on the DataFlow task to see DataFlow designer surface.
  3. Here, In Visual Studio, drag and drop the ZS CSV Source (Web API or File) in the design panel
    SSIS CSV Source (Web API or File) - Drag and Drop
  4. Double click on ZS CSV Source (Web API or File) Configure it like below.
    – Enter URL to read result like this (You can supply OPTIONAL maxRecords parameters to control how much data comes in a single response)
    – Check Use Credentials and select Salesforce Connection
    – Click Preview (assuming JobId variable has a valid completed JobId)
    Read Salesforce Bulk API Job Result - Using SSIS CSV / API Source

    Read Salesforce Bulk API Job Result – Using SSIS CSV / API Source

  5. Now Setup Pagination as below to read all records using automated loop. You will need VERSION Released after Jun 20, 2022 to use Regular Expression feature in Header Value extractionBy default, Salesforce doesn’t send all records in response so we need to setup pagination. In below example we are reading Sforce-Locator attribute from Response and sending it to the next URL as locator=<continue-token> value
    – Choose Response Header Contains Continue Token 
    – Enter below expression in Page Num Indicator
    Basically above expression will keep looping until we find Sforce-Locator : null in the response header.
    Pagination for Salesforce Bulk Query API (Get Header Value from Sforce-Locator)

    Pagination for Salesforce Bulk Query API (Get Header Value from Sforce-Locator)

  6. Click OK to save and now run entire package like below.
Query Salesforce Data using Bulk API in SSIS (read using SOQL query in Bulk mode / large dataset)

Query Salesforce Data using Bulk API in SSIS (read using SOQL query in Bulk mode / large dataset)

 

Conclusion

In this article we saw how to use various ZappySys Components / Task to achieve complex Salesforce API calls to read Query data using Bulk API. Download SSIS PowerPack and try yourself and also explore 70+ other tasks /components not discussed in this article.

 

 

Posted in SSIS CSV Source, SSIS Salesforce API Task, SSIS Salesforce Connection and tagged , , , , .