Understanding REST API Pagination in SSIS / ODBC Drivers

Contents

Introduction

In this blog post, you will learn concepts and patterns about REST API Pagination (JSON / XML / SOAP or CSV API). We will describe methods and steps to implement REST API Pagination in SSIS PowerPack or  ODBC PowerPack API Drivers without coding. Both Products share similar user interface with minor difference. For example purpose we will show screenshots from SSIS PowerPack UI but concepts should remain same. You will also learn some techniques about how to implement an infinite loop in SSIS. Sometimes your REST API requires you to loop through all pages until no more data found then you can use pattern described in this article (sample package attached at the end).

In this article, we will use SSIS REST API Task and JSON Source Connector to process data from RESTful web service. If you have to process XML (or SOAP) data from REST API then simply use XML Source Connector for REST API, SOAP or File rather than JSON Source.

Debugging REST API Requests

Before we look at various aspects of REST API Pagination, we highly recommend you install FREE Web Debugging Tool Fiddler. Look at this article to understand how to debug REST API requests.

Method-1: REST API Pagination via URL Parameter

If you have paging requirement by specifying page number in URL then you can loop until no more pages found. Your REST API can indicate no more page found by any of the following ways.

Example URL:  http://myserver.com/v1/api/getcustomers/?page=1

If no more data found for specified page number then

  1. server may return an empty response
  2. server may return the response with specific error and response status code

JSON Source Connector comes with flexible paging options to handle this scenario. See below screenshot how to configure this.

REST API Looping/Pagination via URL Page Number Parameter (Loop until last page detected)

REST API Looping/Pagination via URL Page Number Parameter (Loop until last page detected)

URL with offset and limit

Sometimes you may have URL with offset + limit parameters. as below. In this case you can use same technique as above screenshot but just specify Page Num Indicator as offset and Increment by same as your limit parameter (e.g. 1000 in this case). At runtime your offset parameter  will increment and generate URLs like below to fetch next page.

If you don’t specify offset parameter in the initial URL then offset=1 will be assumed as default start. If your offset starts at 0 zen check next section for work around (i.e. Use URL Path Mode with StartAt variable = 0).

Example URLs: 
http://myserver.com/v1/api/getcustomers/?offset=1&limit=1000
http://myserver.com/v1/api/getcustomers/?offset=1001&limit=1000
http://myserver.com/v1/api/getcustomers/?offset=2001&limit=1000

Method-2: REST API Pagination via URL Path

Sometimes your URL may contain page or other identifier using path format.

Example URLs: 
http://myserver.com/v1/api/getcustomers/1/
http://myserver.com/v1/api/getcustomers/2/
http://myserver.com/data/file1.json
http://myserver.com/data/file2.json

If no more data found for specified page number then

  1. server may return 404 error (page not found)
  2. server may return response with specific error and response status code

JSON Source Connector comes with flexible paging options to handle this scenario. See below screenshot how to configure this.

REST API Looping/Pagination via URL Path (Loop until last page detected)

REST API Looping/Pagination via URL Path (Loop until last page detected)

URL with Skip Parameter (Offset starts at zero)

Now let’s look at another common pattern just like offset we discussed earlier but in Skip pattern it starts at zero rather than offset=1. In the previous mode we didn’t have option to supply initial value to start counter but if you use Path mode then you can start counter at some custom value. See below screen show how we can setup pagination with skip method with custom increment.

Example URLs

REST API Pagination with Custom Skip / Increment Settings (Offset Starts from 0)

REST API Pagination with Custom Skip / Increment Settings (Offset Starts from 0)

 

Method-3 : REST API Pagination via Response Attribute (Next Link or Cursor)

This method is not discussed in this article but click here to read more

There are mainly 3 ways you can get next link or cursor.

  1. Full Link (e.g. nextPage : “https://mysite.com/api/v1/1/?filter=AAAA” )
  2. Partial Link  (e.g. nextPage : “/api/v1/getcustomer/1/?filter=AAAA” )
  3. Cursor Continuation Token (e.g.   nextCursor : “xxxxAAAbbbbCccc112233==” )

Next Link Method (Partial or Full URL)

Example response with next URL would be

In this method you don’t specify page number in URL. If response include many records then you will get partial response along with indicator of next link.

For more information about this REST API Pagination please check below links. It explains in depth how to paginate REST API response with next page link found in response.

OData Paging using SSIS – REST API Paging Example

Read Twitter data in SSIS using REST API Task and JSON Source – OAuth2 Protocol

SSIS API Pagination - Next Link from Response Attribute

SSIS API Pagination – Next Link from Response Attribute

Cursor Method

Many API returns cursor rather than link. Cursor method usually used to support a large number of records. Cursor is opened at server side with initial query information you submit and if cursor has more records you may get next cursor indicator in your response. In cursor method you have to supply Suffix for Next URL so after First URL your next URL will contain extra attribute which will include pagination information.

Suffix for next URL

Suffix for next URL can include special placeholders. There are two placeholders allowed. If you wish to append Next URL value in raw text then use <%nextlink%> but if your API needs value in encoded format e.g. A%20B rather than A+B then you have to use <%nextlink_encoded%>

Example URLs for Cursor:

Below is few sample URLs in the log when pagination occurs via Cursor method. As you see the first URL has Cursor=* means its first call, after first request server may send next cursor in the response body (see example)

First URL: http://api.crossref.org/members/311/works?filter=type:journal-article&rows=100&cursor=*
Second URL: http://api.crossref.org/members/311/works?filter=type:journal-article&rows=100&cursor=Aooo/MxxxxxxxxxxxtdA==
Third URL: http://api.crossref.org/members/311/works?filter=type:journal-article&rows=100&cursor=Booo/MxxxxxxxxxxxtdA==
…..
…..

Here is an example of Twitter API configured in Cursor Mode

Twitter REST API - Paging Example -Loop through resultset using cursor

Twitter REST API – Paging Example -Loop through resultset using the cursor

Method-4: Pass Page Number in Body Attribute

Some API allows you to pass your PageSize and Page number inside Request Body (i.e. POST data) in that case you may configure your options like below (We used Google AdWords Api as an example)

REST API Pagination - Passing PageNumber inside POST Body (Example: Google AdWords SOAP Api)

REST API Pagination – Passing PageNumber inside POST Body (Example: Google AdWords SOAP API)

 

Configure REST API Pagination - using Page Number via POST Body Method

Configure REST API Pagination – using Page Number via POST Body Method

Method-5: REST API Pagination via multiple URL parameters (e.g. Start and End)

In the previous section, you saw how to pass page number via URL (Single Parameter: http://abc.com/?page=1). However, sometimes you have pagination requirement where you have to specify multiple parameters which needs increment. See below example URL.

Pagination via Multiple URL Parameters

Pagination via Multiple URL Parameters

Method-6: REST API Pagination via Header (Next Link in Header)

Another popular pagination method is paginate via Header. In this method, each API response includes special Header which indicates next link for your response. In header method there are two approaches. [1] Custom Method [2] RFC 5988 Method. Lets look at both.

Next Link in Custom Header

In this method you have to specify which response header contains next URL. When no more NEXT url found in response reader stops.

Custom Header Pagination Method

Custom Header Pagination Method

Next Link in Header as per RFC 5988

In this method, HTTP response contains a special header (Link) which contains next link for our response. If no link found to read more data then reader stops. This method is described well here in RFC 5988. Check this GitHub API Example to understand more on this pagination method.

Link Header Pagination Method (RFC 5988)

Link Header Pagination Method (RFC 5988)

Method-7: REST API Pagination using Cursor / Continuation Token from Response Header

There will be a time when your API sends some sort of Continuation Token in Response Header and you have to use that token to call next request. Some times these token is referred as “Cursor” too so we will use these both terms in the following section (They both mean same thing).

There are two ways these APIs pass Cursor to the next request

  1. Pass Cursor via Request Header
  2. Pass Cursor via URL Query String Parameter (introduced in v3.0 or later)

Lets see both approaches.

Pass Token via Request Header

Some API like Azure Cosmos DB uses Continuation Token via Header approach. In that case you can enter single header name or multiple headers like below (For multiple headers use comma)

REST API Pagination using Continuation Token via Header (Azure CosmosDB / DocumentDB API Example )

REST API Pagination using Continuation Token via Header (Azure CosmosDB / DocumentDB API Example )

You can use use tools like Fiddler to see paginated requests as below.

First Request

View Paginated Requests in Fiddler (Azure CosmosDB API Example)

View Paginated Requests in Fiddler (Azure CosmosDB API Example)

Second Request

Pass Next Page Token via Header (View Paginated Requests in Fiddler - Azure CosmosDB API Example)

Pass Next Page Token via Header (View Paginated Requests in Fiddler – Azure CosmosDB API Example)

Passed Token via URL Parameter (Query String)

In the new version we added a feature where you can actually read Continuation Token (i.e. Cursor) from Response Header and then pass to next request in the URL rather than Request Header. Here is the example API which uses such method.

First Request

Second Request 

Here is how to configure above pagination pattern

Pass Response Header value to URL (Cursor)

Pass Response Header value to URL (Cursor)

 

Method-8: Pass Pagination Token in Request Body

In previous section (Method-4: Pass Page Number in Body Attribute ) we saw how to pass simple numeric page counter in Body. This is fine for some API but if your API has base64 style text token then counter based method doesn’t work. Also if you have requirement to hide page attribute node in first request and send page counter node in second request body (Changing Request Body Pattern for First vs Next)  then check next two sections.

Next two sections covers two API Pagination patterns to fetch 100 rows (1000 in each page) . Both requires to send token in Body but second section is more complex need (Different Body Attributes in first vs next requests )

Passing Page Token in Body (Blank Token in first request allowed)

Lets look at simple pattern where first request has blank token in Body but all other requests until we find last page we keep sending non-empty token. This approach works as long as API Provider doesn’t mind having Blank Token in first request. If you get error then check next section to handle more complex way.

Here is the example of API which needs token in body

First API call

Second API call

….. many more calls …

…. many more calls ….

Last API call (no more nextToken in response so STOP pagination)

So if you have such pattern then you can set few properties

For such issue you can use Changing few properties like below. Use Property Grid if you cant find on UI

  • Set EnablePageTokenForBody  = True
  • Set PagePlaceholders = header=|;filter=|;
  • Set HttpRequestData =  { searchFor: "Account", nextToken:"[$pagetoken$]" }

Passing Page Token in Body (Blank Token in first request)

Now let’s look at more complex pattern. Here our first request body is different than other requests.

 

First API call

Second API call

….. many more calls …

…. many more calls ….

Last API call (no more nextToken)

So notice few things

  • First Request Body / URL is different than second / third and Nth API call
  • For first request we do not have token attribute but after first request we do have token attribute

We have this article and this article which describes similar pattern.

Now lets look at how to solve both problems

Passing PageToken or Counter in Next API Call Body

For such issue you can use Changing few properties like below. Use Property Grid if you cant find on UI

  • Set EnablePageTokenForBody  = True
  • Set PagePlaceholders = header=|;filter=|;
  • Set HttpRequestData =  [$tag$]
  • Set HasDifferentNextPageInfo =  True
  • Set FirstPageBodyPart =  { searchFor: "Account"}
  • Set NextPageBodyPart =  { token:"[$pagetoken$]" }

 
So notice how we defined Body in two different properies as template. And in HttpRequestData we used [$tag$]

Output Page Number as column

Sometimes you have use case where you want to Start Page Number from the last page loaded successfully. By default, you will see one additional column __FileName in the output. This column contains PageNumber  from URL for each row. You can save this into database so when next time you run your SSIS you can start at MAX(PageNumber) + 1.

Outputting PageNumber option is not visible by default so Use Property Grid and look for OutputFileName property (Default=true)

Last Page Detection

Every API may have different way of telling you that you reached last page or no more data found so you should stop looping. SSIS PowerPack comes with many options so you can implement desired logic for last page detection.

Following approaches supported by last page detection. Below options are not applicable if you are using Method#3 (Response contains Next Link / Cursor)

  1. Detect the last page by row count
  2. Detect the last page by response status code
  3. Detect the last page by error message (substring search)
  4. Detect the last page by response size
  5. Detect the last page based on response value or regular expression pattern

Now lets look at each setting. You should choose correct one based on your API behavior

Detect the last page by row count

If your API returns blank document or blank Array (if its JSON) then you can use this method to detect the last page. So when no content found in Response document it will stop pagination.

See below example of JSON API .. It shows sample first page and Sample last page (Assuming you entered Filter as $.Records[*] to extract data from Records array.

First Page

Last Page

SSIS API Pagination - Detect last page by row count (Blank document or array)

SSIS API Pagination – Detect the last page by row count (Blank document or array)

Detect the last page by response status code

Sometimes you get error with status code when the last page is reached (Most common is 404). So rather than failure, you can treat that status code as last page indicator using this option (see below).

SSIS API Pagination - Detect last page by response status code

SSIS API Pagination – Detect last page by response status code

Detect the last page by the error message

If you cant use above two methods and want to detect the last page based on certain error string in the response then use below approach.

SSIS API Pagination - Detect last page by error / response string search

SSIS API Pagination – Detect the last page by error/response string search

Detect the last page by response size

Sometimes you can detect the last page by response size. For example, your response contains “–” only if no more data found then you cannot use any of the above approaches and you have to detect based on response size (e.g. less than 3 bytes). See below

 

SSIS API Pagination - Detect last page by response size

SSIS API Pagination – Detect last page by response size

Detect the last page based on the response attribute value

There will be a time when you want to detect the last page based on some indicator value from your response. In that case, you can use following settings to detect the last page.

  • Select Paging mode =Response Attribute Mode
  • Select Next Link Attribute (Example: $.next_page )
  • Select Stop Indicator Attribute (Example: $.hasMorePages )
  • Enter a value for Stop Indicator (Example: false )

If you don’t want to use Hardcoded value for stop indicator and want to use expression then see next section.

Detect the last page based on regular expression pattern

There will be a time when you want to detect the last page based on some indicator value from your response. In that case, you can use following settings to detect the last page.

  • Select Paging mode =Response Attribute Mode
  • Select Next Link Attribute (Example: $.next_page )
  • Select Stop Indicator Attribute (Example: $.count )
  • Enter regex pattern (regular expression) for Stop Indicator (Example: regex=^\d{1,3}$ ). Stop when count attribute is less than 1000 in other word less than 4 digits.

Here is an example from Zendesk API (Stop pagination when count attribute from the response is less than 1000)

Pagination for Zendesk Incremental API - Next Link and Last Page detection using Stop Indicator Regular Expression

Pagination for Zendesk Incremental API – Next Link and Last Page detection using Stop Indicator Regular Expression

 

SSIS Infinite Loop

So now you know how to paginate automatically in JSON Source Connector but what if you want to use REST API Task for pagination? Some custom scenarios like web scraping require you to download all HTML pages until no more results found? In that case, you most likely need infinite looping pattern (Loop until the last page detected). To loop through pages until the last page is reached you can perform the following logic.

The last page can be detected based on any of the following ways
1. Check Response StatusCode
2. Check Error Message
3. Check Response Size (If less than X bytes then consider as the last page)

REST API Task support #1 and #2 scenarios. If you want to check bytes then use the expression in Expression Task or Script Task
(e.g. @exitloop = LEN(@responseData) <= 3 )

URL for Rest API Task (On Request Settings Tab)
When you configure REST API Task enter URL as below so its dynamic. You can use PageCounter variable anywhere in the URL.

SSIS REST API Looping - Pass Page number in URL, SSIS Infinite Loop

SSIS REST API Looping – Pass Page number in URL, SSIS Infinite Loop

 

Configure ForEachLoop Task for Infinite Looping (Loop until last page found)

Configure For Loop Task for Infinite Looping (Loop until last page found)

Download Sample SSIS Package

SSIS-Loop-Rest-API-2012.dtsx (Zip file)

Conclusion

It can be time consuming and steep learning curve to write C#/JAVA code to handle your rest api integration but using some Tasks and Connectors listed below can help you to solve big headache and provide clean drag and drop REST API integration in SSIS for virtually any REST API Source (e.g. Twitter, Facebook, Salesforce, office 365). Download SSIS PowerPack and try it out by yourself.

SSIS Components for REST API/JSON/XML/SOAP

Posted in REST Connector, SSIS Components, SSIS PowerPack and tagged , , , , , , , , .

One Comment

  1. Pingback: Calling SOAP Web Service in SSIS (XML Source) | ZappySys Blog

Comments are closed.