Using Gecoding to Find the 'Closest City'

Outside of fetching elevation information for each point on the upcoming waterfall refactor I decided to add a new feature. Listing the county and exact latitude/longitude is helpful for explaining where a waterfall is located (as well as showing it on a map, of course), but listing the 'closest city' would really help a reader narrow things down geographically. One of Google's web services is a geocoding API, which includes the ability to do reverse geocoding, which seemed like a great way to find nearby city information.

What is geocoding? It's a simple way to translate human-readable addresses (street/city/state/country) into an exact latitude/longitude coordinate. This is actually a very tricky process, considering that addresses are not standardized across the globe (or even across the United States). Populated areas may reference the metropolitan (like Detroit) or the local city (like Dearborn). Clustered areas may have odd numbering conventions to define a street address. Wisconsin uses 'town' to designate places that are 'unincorporated', while Michigan uses 'civil township'. Addresses are tricky.

Side note: I just found out that Port Hope, where I grew up, is actually a 'village'. I thought it was a proper town. Michigan doesn't even have towns. Childhood = ruined.

So, if geocoding is turning a street address into a coordinate, reverse geocoding is the opposite: it turns a latitude/longitude into an address. Or, at least, it tries to. Not only is this a tricky step because of the non-standard formatting of addresses, but what happens when the coordinate is in the middle of the forest? Or, in my case, along a river that may be hundreds of yards from a road? Well, there was one way to find out.

The first step was to build a script to reach out to Google's web service and perform reverse geocoding. After reviewing the documentation it became very obvious that Google had done a very good job at making their API calls consistent. Building a brand new script just to perform geocoding requests was silly. Instead, I abstracted out some logic from my last GitHub repository (elevation call) and added a new service. Thanks to this method it will be painless over the next week or so to add the additional Google map services (distance matrix, directions, and time zones) to the new library. The enhanced Github repository can be found here: Google map services.

After building the new class for the service I could loop through my data, perform a reverse geocode request, and see what would return. My code to do this was fairly primitive, as this would be a limited run (only about 160 individual calls).

  1. include_once 'google-map-services/src/service/GeocodeService.php';

  2. $mysqli = new mysqli (SERVER, USER, PASSWORD, DATABASE);

  3. $query = "

  4. SELECT

  5. waterfall.id,

  6. waterfall.name,

  7. coordinate.latitude,

  8. coordinate.longitude

  9. FROM

  10. waterfall

  11. INNER JOIN coordinate ON

  12. waterfall.location = coordinate.id

  13. WHERE

  14. waterfall.nearest_town = ''";

  15. $result = $mysqli->query($query);

  16. while ($row = $result->fetch_object()) {

  17. $request = new GoogleMapAPI\Service\GeocodeService();

  18. $request->setCoordinate($row->latitude, $row->longitude);

  19. $response = $request->fetchJSON();

  20. $json = json_decode($response);

  21. if ($json->status != 'OK') {

  22. exit("We have a problem! {$json->status} Stopped at {$row->id}.");

  23. }

  24. $town = '';

  25. foreach ($json->results[0]->address_components as $component) {

  26. if (in_array('locality', $component->types)) {

  27. $town = $component->long_name;

  28. break;

  29. } else if (in_array('political', $component->types)) {

  30. $town = $component->long_name;

  31. break;

  32. }

  33. }

  34. if ($town == '') {

  35. exit("We have a problem! Could not identify a town for {$row->id}.");

  36. }

  37. echo $row->name . ' -- ' . $town . '
    ';

  38. $query = 'UPDATE waterfall SET nearest_town = ? WHERE id = ?';

  39. $statement = $mysqli->prepare($query);

  40. $statement->bind_param('si', $town, $row->id);

  41. $statement->execute();

  42. }

Note - storing the town in the waterfall table as plain text is temporary. I have every intention of pulling that out into a reference table in the near future.

One weird thing that you may notice from lines 30 - 38 is a lookup to the json fields in the 'address_components' field… 'locality' and 'political'. While Google is nice enough to return the address in a parsed manner (breaking out the different components instead of returning one long string) there is still the problem of varied address standards. Google actually supports over twenty different address component types, all of which are subject to change over time. After a few experiments I realized that the components are loosely ordered in specificity and that the first instance of 'locality' would normally hold the village/city information that I wanted. If the address was too remote for that, though, than the first 'political' field was the next best chance.

After a few attempts to figure out this special case I was able to run through all 160 points and located the closest city in a few short seconds. Well, with less than 50% accuracy. Thanks to odd political boundaries and far-too-many missing local names from Google's index (Dodge City, Huron Bay, and even Eagle Harbor, to name a few) I had to go through and correct over a hundred entries. As quick and convenient reverse geocoding can be it still can't beat local knowledge and manual checks. At least it helped point me in the right direction as well as give me a reason to expand the repository's functionality.