How to read / write Google SpreadSheet using SSIS

Introduction

UPDATE: ZappySys has released a brand new API Connector for Google Sheets Online which makes it much simpler to Read/Write Google Sheets Data in SSIS compared to the steps listed in this article. You can still use steps from this article but if you are new to API or want to avoid learning curve with API then use newer approach.

Please visit this page to see all Pre-Configured ready to use API connectors which you can use in SSIS API Source / SSIS API Destination OR API ODBC Driver (for non-SSIS Apps such as Excel, Power BI, Informatica).

In this post you will learn how to get data from Google SpreadSheet (REST API) using SSIS. We will use drag and drop REST API connectors from SSIS PowerPack. No need to download any SDK or learn programming language (e.g. JAVA, C#, Ruby, Python) when you use SSIS PowerPack Connectors. We will use Google Drive API and Google Sheets API to get file list and export SpreadSheet as CSV file in few clicks (Using OAuth 2.0 connection in SSIS).

In this tutorial we will use REST API Task to call some ad-hoc API (e.g. get File List from Google Drive) and save output into Variable or File. We will use OAuth connection along with ZappySys JSON Source connector to read data from Google SpreadSheet (Use Drive API) and load into SQL Server (Export Google SpreadSheet to CSV).

Create Google API Project

First step to access any Google API is create an API Project in Google Console. If you don’t want to go through this then Skip this Step-1 and in the next section select Default OAuth App option on OAth Connection Manager (This is the most easiest option for now unless you want to use your own OAuth App).

Check this article for step-by-step instructions. When you follow these instructions make sure you enable Google Drive API (In the article it shows how to enable YouTube API as an example but you will need to enable Drive API for this article).

Create OAuth Connection Manager in SSIS

Once you create Google API project and obtained Client ID and Client Secret your next step is to create OAuth Connection Manager in SSIS. ZappySys OAuth connection manager comes with many predefined OAuth Providers (e.g. Facebook, Twitter, Google etc) but you can also define custom OAuth settings for any OAuth enabled API.

To create SSIS OAuth 2.0 Connection for Google API perform following steps.

  1. Download and Install SSIS PowerPack
  2. Create new SSIS Package
  3. Right click in Connection Manager Area and Click “New Connection”
  4. When prompted select ZS-OAUTH connection type
  5. On the OAuth Connection Manager Select Provider=Google Sheets + Drive.
  6. You can keep “Use Default App” selected or choose Custom App. Click here to learn – how to register Custom Google App
  7. If you choose the Custom App option then enter the below scopes (or click Select Scopes). If you choose the default app then it’s not required.

    This will allow read / write access to Drive Files or Sheet (E.g. Reading file content / Export file to CSV). You can use Scope browser to see many other available permissions.

    Click Generate Token button.

    SSIS OAuth Connection Manager - Access Google Drive API using OAuth 2.0

    SSIS OAuth Connection Manager – Access Google Drive API using OAuth 2.0

  8. You will see UI as below… Click Accept (You may have to scroll to see that button sometimes)
  9. Click Test to see connection is working.

NOTE: If you don’t want to use Default OAuth App provided by ZappySys then select “Use Custom OAuth App” option (App created in previous section) specify your ClientID, ClientSecret.

Find Google Sheet ID for API call

Before we can read or write Sheet data in SSIS we need to know Google Driver File ID. We will use this ID in next few sections.

Easiest way to find File ID is look at the URL like below. If you need to know ID at runtime dynamically then refer last section.

 

Get Google Drive File ID for API call (Sheet ID)

Get Google Drive File ID for API call (Sheet ID)

Reading Google SpreadSheet Data in SSIS

Basically there are two ways you can export / read Google SpreadSheet data.

  • Method-1 : Read Google SpreadSheet using SSIS JSON / REST API Source
  • Method-2 : Read Google SpreadSheet using Flat File Source (as CSV)
    • Another approach is export SpreadSheet as CSV format. Then use Flat File Source to read that data and load into Target (E.g. SQL Server)

Method-1 : Read Google SpreadSheet using SSIS JSON / REST API Source

Now lets look at real steps to read Google SpreadSheet. To get values from specific tab and specific cell-row range you have to use Google SpreadSheet.Values API

In this example we will load below Google Spreadsheet into SQL Server

Step-By-Step : Loading Google SpreadSheet into SQL Server using SSIS

  1. Drag and drop data flow on the control flow designer surface
  2. Double click data flow and you will see SSIS Toolbox refreshed with new components
  3. Drag ZS JSON Source from SSIS Toolbox onto Data flow designer surface
  4. Double click JSON SOurce and configure below setting
    1. Enter your Spreadsheet URL in the following format
      For example purpose we will use this public SpreadSheet provided by Google. If spreadsheet fileid is 1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms (see previous section how to get fileid) and your tab name is Class Data and you like to extract data from A to F columns starting from 2nd row then your Cell range in the URL can be ‘Class Data’!A2:F
      Here is the actual URL we will use for demo
      — OR —
      You can also enter like below (If you omit Tab name then first table is used)
      You can also enter like below, specify first row and last row
    2. Check Use credentials and select OAuth connection we created in previous section
    3. Click on Select Filter and select values node and click OK  or you can directly enter following filter.
      Configure SSIS JSON/REST API Source - Read data from Google SpreadSheet API

      Configure SSIS JSON/REST API Source – Read data from Google SpreadSheet API

    4. Now click on array Transformation tab. Now we will show you two different ways to configure column names.
    5. Method-1: Auto Detect Column Names (added in version 3.1.2)
      In recent version we added column less array transformation where we have option to detect first record as column name.

      1. Select Transform Type as Column less array
      2. Check option First line has column names like below example (Ignore direct Sample JSON – We provided for demo only. In your case it will be URL)
        JSON With 2D Array Pattern - Column names in First Row (Google Sheets API Pattern)

        JSON With 2D Array Pattern – Column names in First Row (Google Sheets API Pattern)

    6. Method-2: Manually Enter Column Names (For old version)
      1. Select Transform simple 2D array option from dropdown
      2. Check Specify columns list manually
      3. Enter column names (Must match the order and count based on range you selected)
        Configure SSIS JSON/REST API Source 2D Array Transformation for Google SpreadSheet API

        Configure SSIS JSON/REST API Source 2D Array Transformation for Google SpreadSheet API

    7. Now click Preview to see the data
      Configure SSIS JSON/REST API Source - Preview Google SpreadSheet Data

      Configure SSIS JSON/REST API Source –
      Preview Google SpreadSheet Data

    8. Click OK to save UI
    9. Attach your JSON Source to target (e.g. OLEDB Destination for SQL Server Table).
    10. Configure Destination
      Configure SQL Server destination for Google SpreadSheet to SQL Data load

      Configure SQL Server destination for Google SpreadSheet to SQL Data load

    11. Execute SSIS Package to load Google SpreadSheet data int SQL Server
      SSIS Example -Loading data from Google SpreadSheet into SQL Server Table

      SSIS Example -Loading data from Google SpreadSheet into SQL Server Table

Method-2 : Read Google SpreadSheet using Flat File Source (as CSV)

Once we have Google Drive fileId we can perform following actions to export data into CSV file format. After that you can easily read CSV using SSIS FlatFile Source and load into SQL Server or other target.

Export Google SpreadSheet as CSV file using SSIS REST API Task

Here are the steps to export Google Drive SpreadSheet to CSV file

  1. Just like previous section configure REST API Task
  2. Enter following URL (See we used FileId from SSIS variable. You can hardcode too.
  3. Goto response tab. Check Save response and select “Save to file option”.
  4. Enter file path.
  5. Response character set select utf-8 (This will allow Unicode characters in response)
  6. Click Test to see preview
    Export Google SpreadSheet to CSV file format using SSIS - Call Google Drive API to Export

    Export Google SpreadSheet to CSV file format using SSIS – Call Google Drive API to Export

  7. Click OK to save
NOTE: New version now includes CSV Source for API. Use that if you want to consume Google Spread Sheet data in Data Flow. However we still recommend JSON Source (As per previous section to read data. Use CSV export method only if JSON API is not working for you for some reason.

Load Google SpreadSheet data into SQL Server using SSIS (REST API Call)

Once you export SpreadSheet to CSV file you can easily consume it to load into SQL Server or any Target (e.g. Oracle, MySQL) using Native SSIS FlatFile Source and OLEDB/ADO.net Destination.

NOTE: Using FlatFile Source is not covered in Article. You can find many articles online which shows how to use SSIS FlatFile Source.

Write data to Google Spread Sheet

Now lets look at how to write to Google SpreadSheet. To write multiple rows in a single request you may use batchUpdate API. Also check this link to learn more.

Below is sample request to write data to columns and some data to rows (set majorDimension = COLUMNS to write vertical and use ROWS to write horizontally. You can mix multiple ways just like below example)

Write data to Google SpreadSheet using REST API Task (OAuth 2.0 Connection)

Write data to Google SpreadSheet using REST API Task (OAuth 2.0 Connection)

 

Update Google Sheet Cells – Single Range (Set Values for multiple cells)

Now let’s look at the below example of updating multiple cells in google Sheet. For more information, you can see the example here. For parameter detail review this API documentation.

You can use SSIS REST API Task or Web API Destination to achieve Spread Sheet Update scenario. For simple example lets use REST API Task.

  1. Drag and Drop ZS REST API Task on the control flow designer
  2. Edit Task and select URL from Connection Method
  3. Select OAuth Connection from the dropdown (Same connection we created in the previous section)
  4. Enter the URL as below. Change Sheet ID, Range designator and API URL Parameters as per your need.
    https://sheets.googleapis.com/v4/spreadsheets/[Sheet-ID-Here]/values/[Range-Here]?[Parameters-Here] Example URL (some part masked with xxxxxxxx):  
  5. Change Request Method to PUT
  6. Change Request Content type to application/json
  7. Enter Request Body as below (Assuming we need to update 4 cells)
  8. Click Test Request see it works.
    Calling Google Sheet API to Update Multiple Cell Values

    Calling Google Sheet API to Update Multiple Cell Values

Update Google Sheet Cells – Multiple Ranges

If you like to update values which needs to define multiple ranges then you need to use batchUpdate API. Import changes are you can’t include Range in the URL like previous example of Single Range Update. Also you have to use POST method and body with multiple ranges.

Example of sheet update with multiple range

 

Write SQL Server data to Google Sheet

Now let’s look at real world example. How to write data from SQL Server table or other source and send to Google Sheet.

  1. Drag and Drop SSIS Data Flow Task from SSIS Toolbox
    SSIS Data Flow Task - Drag and Drop
  2. Drag source (e.g. OLEDB Source) and configure it to read from source like SQL Server Table
    OLE DB Source - Drag and Drop

    OLE DB Source – Drag and Drop

  3. Drag ZS JSON Generator.
    1. Connect Source to JSON Generator (blue arrow)
    2. Double click it to configure
    3. Select Output Mode as Single Dataset Array option for Mode. Enter Some Batch value (e.g. 500) this will make sure we send 500 rows at a time to Google API call. This way if we have many rows its not rejected by API call.
    4. Click Add element (Select Static Element) . Name: majorDimension and Value: ROWS
    5. Now click Add Document Array icon, name as values
    6. check Treat as 2D array option and click OK to save
    7. Now under values node click Add elements > Select Multiple Columns and Add columns you like to add.
    8. Click OK to save JSON GeneratorBelow is just an example How you can use Batch Settings to create JSON with N rows in each JSON document.
      Using SSIS JSON Generator Transform with Batch Option (Multiple Records in a single document)
  4. Now drag ZS Web API destination
    1. Connect JSON Generator to Web API destination (blue arrow)
    2. Double click it to configure
    3. Select same OAuth connection manager created in earlier section (Assuming you had all scopes with write permission)
    4. Now select Input column for Body (JSON Generator Output)
    5. Enter API call URL as below (Replace ID with your own id)
    6. Select Request Method as PUT
    7. Select content type as application/json
    8. You can enter sample Body As below.
    9. Click Test Request / Response button. Check your Sheet in browser after it see it updates?
    10. If you see 200 OK response means we are good to go.
    11. Run package and see your real data updated in Sheet.

Format Cells using Google Sheet API

Now let’s look at an example to format cell using Google Sheet API (e.g. Change Font, Color, Border, Background). For more information review this API documentation.

To format Sheet you first need to get SheetID  (keep in mind this is not same as spreadsheetId which you used in URL). SheetId is basically numeric ID of your Tab in WorkSheet.

Get SheetID

Request

Response

Notice that in above response we see “sheetId”: 1896331083. This ID we will use in the Update API to format Cells.

Update Borders Example

Here is the example of Google Sheet Formatting API updateBorders Request

Response

 

Update Cell Font Style (Bold, Style, Color, Background)

Now let’s look at how to use repeateCellRequest to apply certain type of formatting to multiple cells. This may be color, font style, background color.

In above example we have applied textFormat and backgroundColor but you can use any of below sections.

{
“numberFormat”: {object(NumberFormat) },
“backgroundColor”: { object(Color) },
“borders”: { object(Borders) },
“padding”: { object(Padding) },
“horizontalAlignment”: enum(HorizontalAlign),
“verticalAlignment”: enum(VerticalAlign),
“wrapStrategy”: enum(WrapStrategy),
“textDirection”: enum(TextDirection),
“textFormat”: { object(TextFormat) },
“hyperlinkDisplayType”: enum(HyperlinkDisplayType),
“textRotation”: { object(TextRotation) }
}

Set Currency Formatting

Here is an example request to set currency formatting

Set Cell Validation

You can also set cell validation by calling following API call. This will restrict user’s ability to enter wrong data in certain cells.

 

Dynamically obtain File ID for Google Drive REST API call

Once OAuth connection is defined you can now call any valid API from Google based on Permission you requested in OAuth Connection. OAuth connection takes care of renewing you Token and passing Authorization as per OAuth standard. All these protocol complexity is hidden from you.

First step to get data from Google SpreadSheet using REST API call is get fileId of the Google Drive File you want to read. The easiest way to know file ID is call following API. Below API call returns name, ID and other information about file(s). You can get only fileID and use that in Next Section to export Google SpreadSheet to CSV format.

OR

NOTE: For more information about searching files in Google drive / getting list of files using search expression language check this link. Make sure expression is URL encoded (e.g. name=’Customers’ should be passed as name%3D’Customers’)

Above API returns response as below. Notice the name and fileId. We will use fileId when calling export API in next section.

If you want to automate retrieval of fileId from document name then perform the following steps.

  1. In the SSIS package Control flow. Drag and drop ZS REST API Task from SSIS toolbox
  2. In the Request Access mode, select “URL from Connection” option
  3. Select OAuth connection manager (Created in Section-2)
  4. Enter URL as below (It will return any file name equals to Customers (%3D is encoded value for = sign)
  5. Click Test (You will see file information)
    Get Google Drive File List in SSIS - Call Google Drive API. Search file by name or expression

    Get Google Drive File List in SSIS – Call Google Drive API to search file by name or expression

  6. If you want to extract only fileID from full JSON then go to response tab
  7. Select Format=JSON, Enter following JSONPath expression to extract only fileId from first record
  8. Check save response content, select SSIS variable where you want to save fileId as below. Now click Test to see preview of response. You will notice now only fileId will be returned from full JSON response.
    Get Google Drive fileid from file name using api call in SSIS

    Get Google Drive fileid from file name using API call in SSIS

  9. Click OK to save task

Conclusion

Google Drive API and Google Sheets API provides a great way to automate file related functionality. However, to call Google API  you have to use SDK / coding approach (e.g. C#, Java, Python, Ruby). Luckily ZappySys SSIS PowerPack provides a great way to integrate any Google API call via simple drag and drop approach without coding. Try out yourself see how long it takes to call virtually any REST API.

Posted in Google API, REST API Integration, SSIS REST API Task and tagged , , , , , , , , , .