SSIS Geocoding with Google Maps API

Introduction

In this tutorial, we will cover the topics of how to perform geocoding on the addresses and reverse geocoding on the location coordinates using SSIS and Google Maps API. So what is geocoding, exactly? Geocoding is the process of translating an address (e.g. a street address) or a place to coordinates on the Earth’s surface. Simply put, geocoding takes a description of a location and gives back latitude and longitude values as a result. For example,  1600 Amphitheatre Parkway, Mountain View, CA 94043, USA  gets translated to  (37.4224764, -122.0842499). Reverse geocoding, on the other hand, is the exact opposite process. In it, latitude and longitude values pair gets translated into a human-readable form. To take the same example above,   (37.4224764, -122.0842499) would translate back to  1600 Amphitheatre Parkway, Mountain View, CA 94043, USA. So, if you need to perform any of the lookups, you can use Google Maps Geocoding API. Actually, Google Maps Geocoding API offers a bit more – you can use it for address validation, cleansing, and formatting.

NOTE: If it happens that you need to use other geocoding provider and perform the lookups in SSIS, you may still benefit from this article (the examples of usage of Google Maps Geocoding API can be replaced by another provider).

To accomplish our goal we will use an add-on for SSIS ZappySys SSIS PowerPack. It contains many additional SSIS Tasks and Data Flow Components that lets you accomplish what you cannot do with standard SSIS connectors. We will use these ZappySys SSIS PowerPack connectors:

Let’s start!

Prerequisites

Before we perform the steps listed in this article, you will need to make sure the following prerequisites are met:
  1. SSIS designer installed. Sometimes it is referred to as BIDS or SSDT (download it from the Microsoft site).
  2. Basic knowledge of SSIS package development using Microsoft SQL Server Integration Services.
  3. Make sure ZappySys SSIS PowerPack is installed (download it, if you haven't already).
  4. (Optional step). Read this article, if you are planning to deploy packages to a server and schedule their execution later.

Step-by-Step – SSIS Geocoding with Google Maps API

Before we start, let’s consider the use-case for our tutorial. Suppose, you have a CRM system filled with user-entered addresses of your leads. Since they are user-entered, it means that they can be incomplete, malformed, or just plain rubbish. Also, you have a marketing campaign and you want to send letters with offers to your leads so that they become your clients. Before sending them any letters, you have to make sure the addresses are valid and well-formed. You are also interested in the geographical location of your leads that you could do some business intelligence analysis on the coordinates of your leads and maybe future clientele. Here is where Geocoding API kicks in.

Thus in this example, we will take various street addresses – sometimes invalid, incomplete or unformatted – and submit them to Google Geocoding API for processing. As a result, we will get nicely formatted, cleansed, full addresses and their coordinates. We will take a look on how to determine if lookup worked successfully and to what extent you should trust Geocoding API’s response. Finally, we will save retrieved addresses and their individual components (e.g. street number, street name, city, country, etc.) to a SQL Server database table.

Create a Google project

First, to use a Google API, you need to have a Google project. Create one, if you don't have any or want to use a new project:
  1. Go to Google Console and click on the projects list:
  2. Once a window opens, click "New Project":
  3. Give a name for the project and click "Create":
  4. Wait a couple of minutes for the project to create.
  5. Again, click on the projects drop-down list:
  6. Finally, select the project you've just created:

Enable billing for the project

Not all Google APIs are free. So you'll have to add billing for the project you have just created:
NOTE: As of September 21, 2018, Google offers 300 USD credit for free (but you will still need to create a billing account).
  1. Visit Billing on Google Console page.
  2. Once it's opened, select the project:
  3. Once a list of projects appears, select the one you created.
  4. Then you will be prompted to create a billing account. Proceed to create one:

Enable Google API for your project

To use any Google API you must enable it for your project:
  1. Visit Google APIs Dashboard page.
  2. Make sure your project is selected and then click "Enable APIs and services":
  3. Then in the box which appears, search for API you are interested in, e.g. "geocoding":
  4. Finally, click on the Google API (e.g. "Geocoding API") tile:
  5. Finally finally, enable Google API:

Create Geocoding API Key

Once you have Google API enabled, you are ready to create a Google API Key, which you will use in all HTTP requests:
  1. Go to Google APIs Dashboard.
  2. Make sure you have your project selected, then select "Credentials" menu item and create a new API Key:
  3. Copy the newly created API Key to a safe place; we will use it later.

Load addresses from a SQL Server database table

We prepared the Google Geocoding API part, now we are ready to get to the place where the rubber meets the road – we will create an SSIS package and load addresses from a database table:

  1. Create a new SSIS package.
  2. Drag and drop Data Flow from SSIS Toolbox onto the Control Flow.

    Dragging and dropping Data Flow Task into Control Flow

  3. Drag and drop OLE DB Source onto the Data Flow.
  4. Configure OLE DB Source to retrieve addresses from your table. We will use a table from Microsoft’s Northwind’s modified database “Marketing” (download it):
Loading user-entered addresses to be looked up in Google Geocoding API using SSIS.

Loading user-entered addresses to be looked up in Google Geocoding API using SSIS.

Construct lookup URL

  1. Drag and drop Derived Column component onto the Data Flow.
  2. Add a new column and construct Geocoding lookup URL in this format (just like it says in Geocoding API documentation):
Adding Google Geocoding API lookup URL as Derived Column to be used for geocoding in SSIS.

Adding Google Geocoding API lookup URL as Derived Column to be used for geocoding in SSIS.

Configure the Geocoding API request

  1. Drag and drop Web API Destination onto the Data Flow and connect it with Derived Column. We will use Web API Destination to make HTTP requests to Geocoding API.
  2. Then double-click on Web API Destination and create a new HTTP connection:
    Setting up HTTP Connection Manager for Web API Destination to make HTTP requests to Google Geocoding API using SSIS.

    Setting up HTTP Connection Manager for Web API Destination to make HTTP requests to Google Geocoding API using SSIS.

  3. Finally, configure Input Column for URL, HTTP Request Method and input into URL field a sample lookup URL:
    Configuring Web API Destination to make geocoding lookups in Google Geocoding API using SSIS.

    Configuring Web API Destination to make geocoding lookups in Google Geocoding API using SSIS.

    Once you are done, click Test Request/Response button. This will actually make a geocoding lookup for Microsoft headquarters:

  4. In the Response (Raw) tab you see the actual response from Geocoding API. Within it –  cleansed, formatted and full Microsoft headquarters address together with geographical coordinates (even we looked it up using just the part of it). Now copy-paste the response body somewhere, so that we can use it in the next step while parsing:
    Checking Google Geocoding API HTTP response in Web API Destination using SSIS.

    Checking Google Geocoding API HTTP response in Web API Destination using SSIS.

NOTE: When Google Geocoding API retrieves only a partial result, it adds a new property in JSON named partial_match, as described in Geocoding API documentation. This node is not present in the above sample response.

Parsing Geocoding API JSON response into columns

  1. Drag and drop JSON Parser Transform onto the Data Flow and connect it to Web API Destination.
  2. Then open it and configure Input JSON Column.
  3. After that, set the Filter to $.results[*].
  4. Continue by setting Sample JSON string to the one you saved in the previous step. Actually, before doing that let’s modify it and add partial_match node next to place_id so that exact and partial matches could be told apart (otherwise this property won’t be parsed):
    Configuring JSON Parser Transform to parse looked up addresses in Google Geocoding API HTTP JSON response using SSIS.

    Configuring JSON Parser Transform to parse looked up addresses in Google Geocoding API HTTP JSON response using SSIS.

  5. To proceed and parse individual address components from JSON response (e.g. number, street, city, country, etc.), you will need to specify columns for each address component manually, just like portrayed in Parse multi-dimensional JSON array in SSIS or Drivers article. Just open 2D Array Transform tab and select Multiple columns using expressions as Transform Type and configure them:
    Configuring JSON Parser Transform to parse individual address components from Google Geocoding API HTTP JSON response using SSIS.

    Configuring JSON Parser Transform to parse individual address components from Google Geocoding API HTTP JSON response using SSIS.

    We specify these columns in our example:

  6. Finally, be sure to set Include all upstream columns to downstream (quite important), so that CustomerID is passed downstream and which we can save into a destination table. Later it can be a useful column to join the source table and the destination table on. In that way, you will be able to map user-entered addresses with looked up, fully formatted addresses (won’t be demonstrated in this tutorial).

Save geocoding results into a SQL Server database table

  1. We are now ready to add OLE DB Destination, execute the package, make the lookups, and save the results into the database:
    Creating a database table for the Google Geocoding API looked up addresses in SSIS.

    Creating a database table for the Google Geocoding API looked up addresses in SSIS.

  2. Execute the package!
    SSIS geocoding package execution and its results.

    SSIS geocoding package execution and its results.

The results of SSIS geocoding

Viewing the results of SSIS geocoding in SQL Server destination table.

Viewing the results of SSIS geocoding in SQL Server destination table.

Once you query the table of looked up results, you will notice several things:

  1. Perhaps there are more rows in the results table than in the source table
  2. partial_match column is either 1 or NULL
  3. geometry.location_type column has several values, e.g. ROOFTOP, RANGE_INTERPOLATED, APPROXIMATE, etc.

So to what extent should you trust the results? The answer is to take a look at the documentation and see what these values mean.

Basically, when partial_match is equal to 1, according to Google:

  • Partial matches most often occur for street addresses that do not exist within the locality you pass in the request.
  • Partial matches may also be returned when a request matches two or more locations in the same locality. For example, “21 Henr St, Bristol, UK” will return a partial match for both Henry Street and Henrietta Street. Note that if a request includes a misspelled address component, the geocoding service may suggest an alternative address. Suggestions triggered in this way will also be marked as a partial match.

While geometry.location_type column tells how accurate the lookup was. Again, according to Google:

  • “ROOFTOP” indicates that the returned result is a precise geocode for which we have location information accurate down to street address precision.
  • “RANGE_INTERPOLATED” indicates that the returned result reflects an approximation (usually on a road) interpolated between two precise points (such as intersections). Interpolated results are generally returned when rooftop geocodes are unavailable for a street address.
  • “GEOMETRIC_CENTER” indicates that the returned result is the geometric center of a result such as a polyline (for example, a street) or polygon (region).
  • “APPROXIMATE” indicates that the returned result is approximate.

Simply put, “ROOFTOP” is the most accurate lookup, while “APPROXIMATE” is the least accurate.

Another valuable piece of information is how many individual address components are filled. The more NULLs are there, the less accurate is the address. So if you have NULL in the Number column and in the Street column, most likely this is not a very accurate lookup:

Individual address components of looked up addresses in Google Geocoding API using SSIS.

Individual address components of looked up addresses in Google Geocoding API using SSIS.

Concluding, in the figure above, you can see how user-entered address can be validated, cleansed and formatted.

SSIS reverse geocoding with Google Maps API

The concepts presented above about “forward” geocoding will help you to make reverse geocoding requests too. The only difference is the request URL:

For reference and help visit official Google Geocoding API documentation.

Things have gone bad: Error handling & debugging

Incidentally, bad things can happen. A remote server may go offline or your server may go out of memory. In any case, you may want to know when that happens and take actions accordingly. For that purpose, you have to redirect bad rows to some other destination. For this example, we will take and use Web API Destination, but basically, you can use any SSIS component:

Handling errors

  1. Add a Derived Column above Web API Destination with expression "(DT_WSTR,4000)ZS_JSON_OUT" and name it "JsonAsString". This will let you see what JSON you are actually passing.
  2. Then add a database or file destination or use another Trash Destination for debugging purposes and redirect the bad rows (red arrow) from Web API Destination into it. Don't forget to set Redirect row option for both, Error and Truncation columns:
    Redirect bad rows from <em>Web API Destination</em> to <em>Trash Destination</em> when load from SQL Server to Elasticsearch is failing. Add derived column JsonAsString to be able to read JSON you are using.

    Redirected failed requests from Web API Destination to a desired destination when loading from SQL Server to REST API Service is failing. Derived Column JsonAsString added to be able to read JSON which was passed to Elasticsearch

  3. Finally, add a Data Viewer for the red path, if you want to debug the flow. You will be able to see URL, JSON and the error message for each record. You may want to copy-paste ErrorMessage to Notepad if you want it to be more readable:
    Use Data Viewer to view HTTP requests that failed to be fulfilled in Elasticsearch

    Use Data Viewer to view HTTP requests that failed to be fulfilled.

NOTE: You can read more about redirecting rows in SSIS Error Handling (Redirect bad rows) article.

Debugging HTTP requests

A common thing you have to do when working with HTTP requests is to debug those requests; e.g. to check what headers, body or URL was passed. To test how things look behind the scenes we strongly suggest to use Fiddler - a popular web debugging tool.

Inside it, you can double-click the URL entry (Right side) to see Request and Response Panels. The top panel is Request (URL, Headers, Body) and Bottom Panel is Response. For https:// (secure URL) make sure you enable HTTPS option in Fiddler (Tools > Options > HTTPS > Check Decrypt https request):

Debugging Web API call using Fiddler in SSIS

Debugging Web API call using Fiddler in SSIS

Common errors

Truncation related error

The most common error you may face when you run an SSIS package is truncation error. During the design time only 300 rows are scanned from a source (a file or a REST API call response) to detect datatypes but at runtime, it is likely you will retrieve far more records. So it is possible that you will get longer strings than initially expected. For detailed instructions on how to fix common metadata related errors read an article "How to handle SSIS errors (truncation, metadata issues)".

Authentication related error

Another frequent error you may get is an authentication error, which happens when you deploy/copy a package to another machine and run it there. Check the paragraph below to see why it happens and how to solve this problem.

Parameterization and deployment to production 

In SSIS package sensitive data such as tokens and passwords are by default encrypted by SSIS with your Windows account which you use to create a package. So SSIS will fail to decrypt tokens/passwords when you run it from another machine using another Windows account. To circumvent this when you are creating an SSIS package which uses authentication components (e.g. an OAuth Connection Manager or an HTTP Connection Manager with credentials, etc.), consider using parameters/variables to pass tokens/passwords. In this way, you won’t face authentication related errors when a package is deployed to a production server.

Check our article on how to configure packages with sensitive data on your production or development server.

Download a sample package

Be sure to download a sample SQL Server 2012 SSIS package, in case you want to try it right away (you can upgrade it to a higher version). If you want to use the database that was used in this tutorial, don’t forget to create the Marketing database from the script (a modified Microsoft’s Northwind database).

Conclusion

In this tutorial, we step-by-step covered how to make geocoding lookups using Geocoding API, part of Google Maps API. We learned how to cleanse, validate and format address just by submitting it to Geocoding API and parsing the results. We moved forward and looked at how to interpret the results and what geocoding results to trust. Then we touched the point that the same process could be applied to reverse geocoding. Concluding, we took a look at how to handle common errors and how to parameterize an SSIS package and deploy it to production.

Posted in Google API, REST API Integration, SSIS Components, SSIS JSON Parser Transform, SSIS PowerPack, SSIS WEB API Destination and tagged , , , , , .