Ascertaining Geographic Data within an SSIS Import

So, You probably had it before… A client asks you to plot their addressing data on a map of some sort. Its not an issue you tell them, make sure that within the the data you have the appropriate latitude/longitude data for the address and we will do the rest.. And thats where the awkward silence starts, they haven’t got this data, they NEVER have and data costs!

There are of course ways around this with Google maps of course being top of the list of possible non paying solutions. I thought I’d share my experiences of this with you…. We needed to implement an address look up on an initial import of data using our ETL tool (SSIS). Whilst not being major surgery this is slightly more than ten minutes work…..

Our client has a spreadsheet with a list of companies that we were tasked with bringing into a SQL server database, its the standard kind of thing a Company name with a city and country (in this instance we have no physical address but this would present no further problems). Heres what we did to arrive at Geo Locations for all imported records:-

1. Performed initial read of the data using the Excel Driver Component

2. Using a ‘Derived Column’ Component we created two new columns on  the dataset named simply ‘Latitude’ and ‘Longitude’. These were formatted as decimal (38,10) and primed with null

3.  In order to NOT execute the same searches for the same town more than once I aggregated my data using the ‘Aggregate Component’ grouping on Country,City Latitude and Longitude.

4. This aggregated dataset was then fed into a Script Component (transformative) with the four columns exposed, only the  Latituide and Longitude were marked as writeable.

5. The following code files were added to the script component:-

  1. IGeographicLocation.cs – Interface file
  2. GeographicLocation.cs – implementation of IGeographicLocation.cs
  3. GeoCodeTranslator.cs – Code used to control the translation process

Please find the code snippets below for each file:-

IGeographicLocation

using System;
using System.Collections.Generic;
using System.Text;

namespace DSCallards{

    public interface IGeographicLocation
    {
        double Latitude { get; set; }
        double Longitude { get; set; }
        string AddressParam { get; set; }
    }
}

GeographicLocation

using System;
using System.Collections.Generic;
using System.Text;
using System.Data;
using System.Net;
using System.Web;
using System.Xml;

namespace DSCallards
{

    public struct GeographicLocation : IGeographicLocation
    {
        private double _Latitude;
        private double _Longitude;
        private string _AddressParam;

        public GeographicLocation(double latitude, double longitude, string addressParam)
        {
            _Latitude = latitude;
            _Longitude = longitude;
            _AddressParam = addressParam;
        }

        public double Latitude
        {
            get { return _Latitude; }
            set { _Latitude = value; }
        }

        public double Longitude
        {
            get { return _Longitude; }
            set { _Longitude = value; }
        }

        public string AddressParam
        {
            get { return _AddressParam; }
            set { _AddressParam = value; }
        }
    }

}

GeoCodeTranslator

using System;
using System.Collections.Generic;
using System.Text;
using System.Data;
using System.Net;
using System.Web;
using System.Xml;
using System.Globalization;
namespace DSCallards
{
    public class GeoCodeTranslator:IDisposable
    {
        private const string googleApi = "http://maps.googleapis.com/maps/api/geocode/xml?address={0}&sensor=false";
        private const string NODE_STATUS = "status";
        private const string NODE_RESULT = "result";
        private const string NODE_LOCATION = "geometry/location";
        private const string NODE_LATITUDE = NODE_LOCATION + "/lat";
        private const string NODE_LONGITUDE = NODE_LOCATION + "/lat";

        private const string OK = "OK";
        private Uri GetApiURL(string addressToQuery)
        {
            return new Uri(String.Format(googleApi, HttpUtility.UrlEncode(addressToQuery)));
        }

        public GeographicLocation translatePhysicalAddressToGeographicLocation(string addressToQuery)
        {
            double latitude = 0.0;
            double longitude = 0.0;
            /* Get the Web Client */

            using (WebClient wc = new WebClient())
            {

            /* Now build  the google query according to the address that we are trying to locate */

                Uri uri = GetApiURL(addressToQuery);

                try
                {

                    /* Get the Data into an XML document */

                    XmlDocument xmlDoc = new XmlDocument();
                    xmlDoc.LoadXml(wc.DownloadString(uri));
                    XmlElement xmlElement = xmlDoc.DocumentElement;

                    /* Now check the Status before proceeding.....*/

                    string status = xmlElement.SelectSingleNode(NODE_STATUS).InnerText;
                    if (status.ToLower() != OK.ToLower()){
                        return new GeographicLocation(0.0, 0.0, addressToQuery);
                    }

                    /* OK. All looks good so.... Get the Result Node*/

                    XmlNodeList nodeCol = xmlElement.SelectNodes(NODE_RESULT);

                    /* Now Get the Latitude and Longitude into variables ready for use... */

                    foreach (XmlNode node in nodeCol){
                        latitude = Convert.ToDouble(node.SelectSingleNode(NODE_LATITUDE).InnerText, CultureInfo.InvariantCulture);
                        longitude = Convert.ToDouble(node.SelectSingleNode(NODE_LONGITUDE).InnerText, CultureInfo.InvariantCulture);
                    }

                    /* Return ..... */

                    return new GeographicLocation(latitude, longitude, addressToQuery);
                }
                catch (Exception e){
                    return new GeographicLocation(0.0, 0.0, addressToQuery);
                }
            }

        }

        #region IDisposable Members

        public void Dispose(){
        }

        #endregion
    }
}

6. Make sure you add a project reference to System.Web

7. This just then leaves the small job of wiring in the main class to these GeoTransalation classes, the code for this is deascribed below:

/* Microsoft SQL Server Integration Services Script Component
*  Write scripts using Microsoft Visual C# 2008.
*  ScriptMain is the entry point class of the script.*/

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.Threading;
using DSCallards;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{

    public override void PreExecute()
    {
        base.PreExecute();
        /*
          Add your code here for preprocessing or remove if not needed
        */
    }

    public override void PostExecute()
    {
        base.PostExecute();
        /*
          Add your code here for postprocessing or remove if not needed
          You can set read/write variables here, for example:
          Variables.MyIntVar = 100
        */
    }

    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {

        /* Pause before execution begins.... Google doesn't like requests to come in 
         *  too fast......*/

        Thread.Sleep(1000);

        using (GeoCodeTranslator translator = new GeoCodeTranslator()){
            GeographicLocation geolocation = translator.translatePhysicalAddressToGeographicLocation(Row.Location + "," + Row.Country);
            Row.Latitude = Convert.ToDecimal(geolocation.Latitude);
            Row.Longitude = Convert.ToDecimal(geolocation.Longitude);
        }
    }

}

8. We now can bring our aggregated data back together using a merge join dataset and voila…. This data can be saved to our database and all of our location data can now be plotted on a map.

The nice thing about this is that these three classes can be taken and dropped into a standard .NET library too if you ever feel the need to use them from within an application.

Of course, there is always a caveat…. Google does place certain limitations on the use of their api… As you can see from the code too many requests in a row from the same IP and they will no doubt think that a DOS attack is underway so they just drop your request. In addition unless you have a ‘Google Maps API for Business’ account you are limited to a very small amount of requests per 24hr period, currently 2,500. Subscribing as a business user give you 100,000 requests per 24 hour period. Please read the more extensive documentation here.

I hope this helps…