How to do REST API Pagination in SSIS / ODBC Drivers

Contents hide

Introduction

In our previous blog, we saw how to call REST in SSIS including concepts of Authentication / Error Handling. Now in this post, we will cover API Pagination. You will learn concepts and patterns about REST API Pagination (for JSON / XML / SOAP or CSV API).

We will describe methods and steps to implement REST API Pagination in SSIS PowerPack or  ODBC PowerPack API Drivers (for apps like Excel, Power BI, Informatica…) without coding. Both Products share a similar user interface with minor differences. For example purpose, we will show screenshots from SSIS PowerPack UI but concepts should remain same. You will also learn some techniques about how to implement an infinite loop in SSIS. Sometimes your REST API requires you to loop through all pages until no more data found then you can use pattern described in this article (sample package attached at the end).

In this article, we will use SSIS REST API Task and JSON Source Connector to process data from RESTful web service. If you have to process XML (or SOAP) data from REST API then simply use XML Source Connector for REST API, SOAP or File rather than JSON Source.

Debugging REST API Requests

Before we look at various aspects of REST API Pagination, we highly recommend you install FREE Web Debugging Tool Fiddler. Look at this article to understand how to debug REST API requests.

Method-1: REST API Pagination via URL Parameter

If you have paging requirement by specifying page number in URL then you can loop until no more pages found. Your REST API can indicate no more page found by any of the following ways.

Example URL:  http://myserver.com/v1/api/getcustomers/?page=1

If no more data found for specified page number then

  1. server may return an empty response
  2. server may return the response with specific error and response status code

JSON Source Connector comes with flexible paging options to handle this scenario. See below screenshot how to configure this.

Pagination Method 1 URL parameter mode

REST API Looping/Pagination via URL Page Number Parameter (Loop until last page detected)

URL with offset and limit

Sometimes you may have URLs with offset + limit parameters. as below. In this case, you can use the same technique as the above screenshot but just specify Page Num Indicator as offset and Increment by the same as your limit parameter (e.g. 1000 in this case). At runtime, your offset parameter will increment and generate URLs like below to fetch the next page.

If you don’t specify the offset parameter in the initial URL then offset=1 will be assumed as the default start. If your offset starts at 0 then refer to the next section for a workaround (i.e. Use URL Path Mode with StartAt variable = 0). Also if your increment for offset is other than the default page size then you must include the limit / pagesize parameter in the URL as below (e.g. limit=100 which is rows per page)

Example URLs: 
http://myserver.com/v1/api/getcustomers/?offset=1&limit=1000
http://myserver.com/v1/api/getcustomers/?offset=1001&limit=1000
http://myserver.com/v1/api/getcustomers/?offset=2001&limit=1000

Method-2: REST API Pagination via URL Path

Sometimes your URL may contain a page or other identifier using path format.

Example URLs: 
http://myserver.com/v1/api/getcustomers/1/
http://myserver.com/v1/api/getcustomers/2/
http://myserver.com/data/file1.json
http://myserver.com/data/file2.json

If no more data is found for the specified page number then

  1. the server may return a 404 error (page not found)
  2. server may return a response with a specific error and response status code

JSON Source Connector comes with flexible paging options to handle this scenario. See the below screenshot of how to configure this.

Pagination Method 2

REST API Looping/Pagination via URL Path (Loop until last page detected)

URL with Skip Parameter (Offset starts at zero)

Now let’s look at another common pattern just like the offset we discussed earlier but in the Skip pattern, it starts at zero rather than offset=1. In the previous mode, we didn’t have the option to supply the initial value to start the counter but if you use Path mode then you can start the counter at some custom value. See the below screen to show how we can setup pagination with the skip method with custom increment.

If your increment for offset is other than the default page size then you must include page size parameter in the URL as below (e.g. pagesize=100 which is rows per page. Some APIs name this parameter in different way e.g. count=1000 or limit=100. Refer to the API documentation to know exact names)

Example URLs

Pagination method 2 URL with Skip Parameter

REST API Pagination with Custom Skip / Increment Settings (Offset Starts from 0)

 

Method-3 : REST API Pagination via Response Attribute (Next Link or Cursor)

This method is not discussed in this article but click here to read more

There are mainly 3 ways you can get next link or cursor.

  1. Full Link (e.g. nextPage : “https://mysite.com/api/v1/1/?filter=AAAA” )
  2. Partial Link  (e.g. nextPage : “/api/v1/getcustomer/1/?filter=AAAA” )
  3. Cursor Continuation Token (e.g.   nextCursor : “xxxxAAAbbbbCccc112233==” )

Next Link Method (Partial or Full URL)

Example response with next URL would be

In this method you don’t specify page number in URL. If response include many records then you will get partial response along with indicator of next link.

For more information about this REST API Pagination please check below links. It explains in depth how to paginate REST API response with next page link found in response.

OData Paging using SSIS – REST API Paging Example

Read Twitter data in SSIS using REST API Task and JSON Source – OAuth2 Protocol

Method-3: Next Link Method (Partial or Full URL)

SSIS API Pagination – Next Link from Response Attribute

Next Link – Detect the last page based on empty rows (Regular Expression)

Now let’s look at a little bit different scenario where the Next Link is never empty even on the last page. This can cause an infinite loop if we configure it like the last section. So in such a case (see below example) when the last page still has the next link URL but the data row array is empty then use settings like the screenshot below.

First Page Example

Last Page Example

Example: Last Page

Basically, configure the following options

  • Set Stop indicator attribute to the same node which contains data rows (without array brackets). E.g. $.items
  • Set Stop indicator value / Regular Expression to something like []  assuming the last page has the same way. Check Fiddler Trace for exact text on the last page.
Method-3 Detect the last page based on empty rows

Last Page with non empty next link and empty data array (Detect Last Page based on zero row count)

Cursor Method

Many API returns cursor rather than link. Cursor method usually used to support a large number of records. Cursor is opened at server side with initial query information you submit and if cursor has more records you may get next cursor indicator in your response. In cursor method you have to supply Suffix for Next URL so after First URL your next URL will contain extra attribute which will include pagination information.

Suffix for next URL

Suffix for next URL can include special placeholders. There are two placeholders allowed. If you wish to append Next URL value in raw text then use <%nextlink%> but if your API needs value in encoded format e.g. A%20B rather than A+B then you have to use <%nextlink_encoded%>

Example URLs for Cursor:

Below is few sample URLs in the log when pagination occurs via Cursor method. As you see the first URL has Cursor=* means its first call, after first request server may send next cursor in the response body (see example)

First URL: http://api.crossref.org/members/311/works?filter=type:journal-article&rows=100&cursor=*
Second URL: http://api.crossref.org/members/311/works?filter=type:journal-article&rows=100&cursor=Aooo/MxxxxxxxxxxxtdA==
Third URL: http://api.crossref.org/members/311/works?filter=type:journal-article&rows=100&cursor=Booo/MxxxxxxxxxxxtdA==
…..
…..

Method-4: Pass Page Number in Body Attribute

Some API allows you to pass your PageSize and Page number inside Request Body (i.e. POST data) in that case you may configure your options like below (We used Google AdWords Api as an example)

Method-4 Pass Page Number in Body Attribute

Configure REST API Pagination – using Page Number via POST Body Method

Method-4: Pass Page Number in Body Attribute example

REST API Pagination – Passing PageNumber inside POST Body (Example: Google AdWords SOAP API)

Method-5: REST API Pagination via multiple URL parameters (e.g. Start and End)

In the previous section, you saw how to pass page numbers via URL (Single Parameter: http://abc.com/?page=1). However, sometimes you have a pagination requirement where you have to specify multiple parameters that need increment. See below example URL.

Method-5: REST API Pagination via multiple URL parameters (e.g. Start and End)

Pagination via Multiple URL Parameters

Method-6: REST API Pagination via Header (Next Link in Header)

Another popular pagination method is paginate via Header. In this method, each API response includes special Header which indicates next link for your response. In header method there are two approaches. [1] Custom Method [2] RFC 5988 Method. Lets look at both.

Next Link in Custom Header

In this method you have to specify which response header contains next URL. When no more NEXT url found in response reader stops.

Method 6 REST API Pagination via Headers

Custom Header Pagination Method

Next Link in Header as per RFC 5988

In this method, HTTP response contains a special header (Link) which contains next link for our response. If no link found to read more data then reader stops. This method is described well here in RFC 5988. Check this GitHub API Example to understand more on this pagination method.

Method 6 Next Link in Header as per RFC 5988

Link Header Pagination Method (RFC 5988)

Method-7: REST API Pagination using Cursor / Continuation Token from Response Header

There will be a time when your API sends some sort of Continuation Token in Response Header and you have to use that token to call next request. Some API needs this token passed as query string in Next URL and in some API you have to pass as request header. We will cover both cases in the following section.

Your API might call these tokens as “Cursor” too so we will use these both terms in the following section (They both mean same thing).

There are few ways these APIs pass Cursor to the next request

  1. Pass Cursor via Request Header with same header name
  2. Pass Cursor via Request Header with different header name (introduced in v4.2.3)
  3. Pass multiple Response Headers via Request Headers (Azure DocumentDB style usecase)
  4. Pass Cursor via URL Query String Parameter (introduced in v3.0 or later)

Lets see all approaches.

Pass Token via Request Header (Same header name)

Consider the API pagination use case in which you get some sort of continuation token in the response header and you take it and pass in the next request via request header (same header name).

Example:

First Request / Response:

Second Request / Response:

……

……

Last Request / Response:

For above pagination pattern  that just enter the response header name in Page Num Indicator field as below

Method 7 Pass Token via Request Header

Pagination using response header continuation token – Pass as Request Header

All possible formats for Page Num Indicator is below

x-page-token
x-token1, x-token2
RequestUrlQueryName=ResponseHeaderName
header::RequestHeaderName=ResponseHeaderName
RequestUrlQueryName=ResponseHeaderName(regular-expression)

Pass Token via Request Header (Different header name)

Consider the API pagination use case just like previous one but only different is request header name is not same as response header name. For example rather than x-token in request we need to call it x-next-token.

Example:

First Request / Response:

Second Request / Response:

……

……

Last Request / Response:

For above pagination pattern  that just enter the response header name in Page Num Indicator field as below

Method 7 Pass Token via Request Header

Pagination using response header continuation token – Pass as Request Header (Different name)

Pass multiple Response Headers via Request Headers (Azure DocumentDB style usecase)

Some API like Azure Cosmos DB uses Continuation Token via Header approach. In that case you can enter single header name or multiple headers like below (For multiple headers use comma)

Method 7 Pass multiple Response Headers via Request Headers

REST API Pagination using Continuation Token via Header (Azure CosmosDB / DocumentDB API Example )

You can use use tools like Fiddler to see paginated requests as below.

First Request

Method 7 fiddler example 1

View Paginated Requests in Fiddler (Azure CosmosDB API Example)

Second Request

Method 7 fiddler reuqest 2

Pass Next Page Token via Header (View Paginated Requests in Fiddler – Azure CosmosDB API Example)

Pass Token via URL Parameter (Query String)

In the new version we added a feature where you can actually read Continuation Token (i.e. Cursor) from Response Header and then pass to next request in the URL rather than Request Header. Here is the example API which uses such method.

First Request

Second Request 

Here is how to configure above pagination pattern

Method 7 Pass Token via URL Parameter (Query String)

Pass Response Header value to URL (Cursor)

Pass Token via URL Parameter (Query String) – with custom Regular expression (Usecase: Salesforce BULK query API)

Now consider previous scenario but with slightly different requirement. By default previous pagination pattern stops when specified response header is missing or response header value is blank. However is some cases header value contains custom string to indicate last page. For example x-token: NULL. In such case we need to use Page Num Indicator like below (must use v4.2.3 or higher in SSIS PowerPack / 1.5.2 or higher in ODBC PowerPack)

For example to stop pagination in below case you have to enter Page Num Indicator as below. It says stop pagination when  “NULL” word is detected

 

First Request

Second Request 

……

……

Last Request 

Here is how to configure above pagination pattern

Method 7 with custom Regular expression

Pass Response Header value to URL (Cursor)

Method-8: Pass Pagination Token in Request Body

In previous section (Method-4: Pass Page Number in Body Attribute ) we saw how to pass simple numeric page counter in Body. This is fine for some API but if your API has base64 style text token then counter based method doesn’t work. Also if you have requirement to hide page attribute node in first request and send page counter node in second request body (Changing Request Body Pattern for First vs Next)  then check next two sections.

Next two sections covers two API Pagination patterns to fetch 100 rows (1000 in each page) . Both requires to send token in Body but second section is more complex need (Different Body Attributes in first vs next requests )

Passing Page Token in Body (Blank Token in first request allowed)

Lets look at simple pattern where first request has blank token in Body but all other requests until we find last page we keep sending non-empty token. This approach works as long as API Provider doesn’t mind having Blank Token in first request. If you get error then check next section to handle more complex way.

Here is the example of API which needs token in body

First API call

Second API call

….. many more calls …

…. many more calls ….

Last API call (no more nextToken in response so STOP pagination)

So if you have such pattern then you can set few properties

For such issue you can use Changing few properties like below. Use Property Grid if you cant find on UI

  • Set EnablePageTokenForBody  = True
  • Set PagePlaceholders = header=|;filter=|;
  • Set HttpRequestData =  { searchFor: "Account", continueToken:"[$pagetoken$]" }
Method 8 Pass Passing Page Token in Body 1

Setting Body with [$pagetoken$] to implement dynamic pagination

Method 8 advanced pagination tab

Setting Page Placeholders in Advanced Pagination tab.

Passing Page Token in Body (Blank Token in first request)

Now let’s look at more complex pattern. Here our first request body is different than other requests.

First API call

Second API call

….. many more calls …

…. many more calls ….

Last API call (no more nextToken)

So notice few things

  • First Request Body / URL is different than second / third and Nth API call
  • For first request we do not have token attribute but after first request we do have token attribute

We have this article and this article which describes similar pattern.

Now lets look at how to solve both problems

Passing PageToken or Counter in Next API Call Body

For such issue you can use Changing few properties like below. Use Property Grid if you cant find on UI

  • Set EnablePageTokenForBody  = True
  • Set PagePlaceholders = header=|;filter=|;
  • Set HttpRequestData =  [$tag$]
  • Set HasDifferentNextPageInfo =  True
  • Set FirstPageBodyPart =  { searchFor: "Account"}
  • Set NextPageBodyPart =  { token:"[$pagetoken$]" }

So notice how we defined Body in two different properies as template. And in HttpRequestData we used [$tag$]

Output Page Number as column

Sometimes you have use case where you want to Start Page Number from the last page loaded successfully. By default, you will see one additional column __FileName in the output. This column contains PageNumber  from URL for each row. You can save this into database so when next time you run your SSIS you can start at MAX(PageNumber) + 1.

Outputting PageNumber option is not visible by default so Use Property Grid and look for OutputFileName property (Default=true)

Last Page Detection

Every API may have different way of telling you that you reached last page or no more data found so you should stop looping. SSIS PowerPack comes with many options so you can implement desired logic for last page detection.

Following approaches supported by last page detection. Below options are not applicable if you are using Method#3 (Response contains Next Link / Cursor)

  1. Detect the last page by row count
  2. Detect the last page by response status code
  3. Detect the last page by error message (substring search)
  4. Detect the last page by response size
  5. Detect the last page based on response value or regular expression pattern

Now lets look at each setting. You should choose correct one based on your API behavior

Detect the last page by row count

If your API returns blank document or blank Array (if its JSON) then you can use this method to detect the last page. So when no content found in Response document it will stop pagination.

See below example of JSON API .. It shows sample first page and Sample last page (Assuming you entered Filter as $.Records[*] to extract data from Records array.

First Page

Last Page

Method 8 Detect the last page by row count

SSIS API Pagination – Detect the last page by row count (Blank document or array)

Detect the last page by response status code

Sometimes you get error with status code when the last page is reached (Most common is 404). So rather than failure, you can treat that status code as last page indicator using this option (see below).

Method 8 Detect the last page by response status code

SSIS API Pagination – Detect last page by response status code

Detect the last page by the error message

If you cant use above two methods and want to detect the last page based on a certain error string in the response then use below approach.

Method 8 Detect the last page by the error message

SSIS API Pagination – Detect the last page by error/response string search

Detect the last page by response size

Sometimes you can detect the last page by response size. For example, your response contains “–” only if no more data is found then you cannot use any of the above approaches and you have to detect based on response size (e.g. less than 3 bytes). See below

Method 8 Detect the last page by response size

SSIS API Pagination – Detect last page by response size

Detect the last page based on the response attribute value

There will be a time when you want to detect the last page based on some indicator value from your response. In that case, you can use the following settings to detect the last page.

  • Select Paging mode =Response Attribute Mode
  • Select Next Link Attribute (Example: $.next_page )
  • Select Stop Indicator Attribute (Example: $.hasMorePages )
  • Enter a value for Stop Indicator (Example: false )

If you don’t want to use Hardcoded value for stop indicator and want to use expression then see next section.

Detect the last page based on regular expression pattern

There will be a time when you want to detect the last page based on some indicator value from your response. In that case, you can use the following settings to detect the last page.

  • Select Paging mode =Response Attribute Mode
  • Select Next Link Attribute (Example: $.next_page )
  • Select Stop Indicator Attribute (Example: $.count )
  • Enter regex pattern (regular expression) for Stop Indicator (Example: regex=^\d{1,3}$ ). Stop when count attribute is less than 1000 in other word less than 4 digits.

Here is an example from Zendesk API (Stop pagination when count attribute from the response is less than 1000)

Method 8 Detect the last page based on regular expression pattern

Pagination for Zendesk Incremental API – Next Link and Last Page detection using Stop Indicator Regular Expression

Handling API Rate Limit - Retry on too many API requests

Many API restricts you from calling too many API calls too soon (just to save their servers from being overwhelmed) - Example. Let's think this way if you fetching a million rows via paginated response and the very last page fails then you won't be happy because now you have to fetch everything again. To solve this ZappySys provides retry options on HTTP / OAuth and REST API connection managers.

Generally, when you exceed the rate limit, the server sends you error code 429 but it can be something else too or multiple codes are returned. In the below example, we use 503. Check your API documentation to know the exact code you should retry.

To set the retry options go to connection settings (e.g. HTTP / OAuth / REST connection). The below example is for OAuth connection.

To retry on multiple codes use Pipe as a delimiter. Example: 429|503

Retry Settings for API Rate Limit Exceeded Error (Status code 429)

Retry Settings for API Rate Limit Exceeded Error (Status code 429 or other)

Performance Tips

You can improve performance of your API calls by doing a few changes (again not every API may support this)

  • Use the maximum page size possible (e.g. if you doing pagination you can set max possible page size – refer to your API documentation whats the default size and what is the max page size you can supply)
    e.g.  /myapi?pagesize=100  (parameter name can be something else like limit=100)

 

SSIS Infinite Loop

So now you know how to paginate automatically in JSON Source Connector but what if you want to use REST API Task for pagination? Some custom scenarios like web scraping require you to download all HTML pages until no more results found? In that case, you most likely need infinite looping pattern (Loop until the last page detected). To loop through pages until the last page is reached you can perform the following logic.

The last page can be detected based on any of the following ways
1. Check Response StatusCode
2. Check Error Message
3. Check Response Size (If less than X bytes then consider as the last page)

REST API Task support #1 and #2 scenarios. If you want to check bytes then use the expression in Expression Task or Script Task
(e.g. @exitloop = LEN(@responseData) <= 3 )

URL for Rest API Task (On Request Settings Tab)
When you configure REST API Task enter URL as below so its dynamic. You can use PageCounter variable anywhere in the URL.

SSIS Infinite Loop part 1

SSIS REST API Looping – Pass Page number in URL, SSIS Infinite Loop

SSIS Infinite Loop part 2

Configure For Loop Task for Infinite Looping (Loop until last page found)

Download Sample SSIS Package

SSIS-Loop-Rest-API-2012.dtsx (Zip file)

Video Tutorial – Reading data from API in SSIS

In case you missed our previous article about calling REST API is SSIS then check below video.

 

Conclusion

It can be time consuming and steep learning curve to write C#/JAVA code to handle your rest api integration but using some Tasks and Connectors listed below can help you to solve big headache and provide clean drag and drop REST API integration in SSIS for virtually any REST API Source (e.g. Twitter, Facebook, Salesforce, office 365). Download SSIS PowerPack and try it out by yourself.

SSIS Components for REST API/JSON/XML/SOAP

Posted in REST Connector, SSIS Components, SSIS PowerPack and tagged , , , , , , , , .

One Comment

  1. Pingback: Calling SOAP Web Service in SSIS (XML Source) | ZappySys Blog

Comments are closed.