Notes: Setting Up OSRM & Using OSM Filter for Machine Learning

To figure out the distance between 2 points on the globe measured as the crow flies, use trig.

import math
def deg_2_rad(deg):
    ''' input = degrees, output = radians '''

    return (deg * math.pi) / 180

def distance_between_nodes(start_coords, end_coords):
    ''' node inputs are stings of longitude / latitude geo coords
        example node1 = '-71.0727793,42.4436791'
        example node2 = '-71.0756129,42.4446184' '''

    lat1 = float(start_coords.split(',')[1])
    lat2 = float(end_coords.split(',')[1])
    lon1 = float(start_coords.split(',')[0])
    lon2 = float(end_coords.split(',')[0])
    
    inner = math.cos(deg_2_rad(90-lat1)) * math.cos(deg_2_rad(90-lat2)) + math.sin(deg_2_rad(90-lat1)) * math.sin(deg_2_rad(90-lat2)) * math.cos(deg_2_rad(lon1-lon2))

    return math.acos(inner) * 3958.786

print(distance_between_nodes('-71.0727793,42.4436791','-71.0756129,42.4446184'))
0.158383615684 # this is miles between those node

Typically, however, you can’t run or drive — which is usually how we are consuming geospatial data — as the crow flies… so we need data and a service that will route a path between points A and B. Microsoft routing service? Google Maps API? I’m using Open Source Routing Machine that uses map data from the OpenStreetMap project.

What am I doing here? I am going to set up this routing service* and run it locally so I can pull the route and the distance between geographic coordinates at a pretty rapid rate (versus hitting the Google API 1000+ times per day when I’m using it to train a reinforcement learning algorithm). I want to route and plot a path as if I’m running (which would route me through trails of a park rather than around the park on roads) so we need to make a parameter adjustment in the setup.

https://github.com/Project-OSRM/osrm-backend

It takes a few steps to get OSRM running. A) get the OSM file for the area in which I want to route, B) install a container service where I can run the routing service, C) build the service and D) run the service so I can make API calls. So, clone the repo above and do this…

  • cd into that directory and get the map where you want to call routes. For example, a osm data file for the state of Massachusetts in the United States paste this in the terminal: wget http://download.geofabrik.de/north-america/us/massachusetts-latest.osm.pbf. You can obtain other maps from https://www.geofabrik.de/.
  • Install Docker. https://www.docker.com/. You can install this anywhere on your machine you please.
  • Build the routing system. Make sure you are in the OSRM directory and use the following commands (note that we use the /opt/foot.lua option so that the routes we build are walking based – otherwise it would route us around the trails rather than through the trails).
    • docker run -t -v $(pwd):/data osrm/osrm-backend osrm-extract -p /opt/foot.lua /data/massachusetts-latest.osm.pbf
    • docker run -t -v $(pwd):/data osrm/osrm-backend osrm-contract /data/massachusetts-latest.osrm
  • Use this command to run the routing service locally — when we see the message in the second bullet we are ready to roll:
    • docker run -t -i -p 5000:5000 -v $(pwd):/data osrm/osrm-backend osrm-routed /data/massachusetts-latest.osrm
      [info] running and waiting for requests

We now have this thing running on our machine. We can send some information about the locations for which we want to route and based on that information it will send us back data about that route. What do those inputs look like… and what does the data we get back look like?

# define start and end locations, make request, and parse the JSON object
start ='-71.0727793,42.4436791'
end = '-71.0756129,42.4446184'
# the address is our local maching port 5000, we want walking directions, and annotations=nodes will give us all the node id's along the route (so we can get the lat/lon for those nodes to plot)
url = 'http://127.0.0.1:5000/route/v1/walking/{};{}?annotations=nodes'.format(start,end)
# use python requests package to call our API
import requests
r = requests.get(url)
# parse the data with python json package
import json
data = r.json()
# view the returned data object
print(data)
# returned JSON object looks like this:
{u'code': u'Ok',
 u'routes': [{u'distance': 296.8,
   u'duration': 213.6,
   u'geometry': u'_x`bGzkxpLORFLB^IxDLd@?h@_@`BK^u@|@iAl@OC',
   u'legs': [{u'annotation': {u'nodes': [66566291,
       1915819599,
       1915819596,
       1915819595,
       1915819593,
       1915819592,
       66559505,
       66578251,
       66576355,
       603331797,
       616527794,
       616527795,
       616527801,
       616527803,
       2288086575,
       616527806,
       616527807,
       616527761,
       616527765,
       616527770,
       616527773,
       616527776,
       616527782,
       616527786,
       616527810,
       616527812,
       616527814,
       616527815,
       616527818]},
     u'distance': 296.8,
     u'duration': 213.6,
     u'steps': [],
     u'summary': u'',
     u'weight': 213.6}],
   u'weight': 213.6,
   u'weight_name': u'duration'}],
 u'waypoints': [{u'hint': u'N3QFgE2SEYAAAAAAVgAAAAAAAAAAAAAAAAAAAFYAAAAAAAAAAAAAAAEAAAD1g8P7n6OHAvWDw_ufo4cCAAAPAOzgMiA=',
   u'location': [-71.072779, 42.443679],
   u'name': u'Washington Street'},
  {u'hint': u'EGUPgBJlD4AAAAAARAAAAOYEAAB-AQAAAAAAAEQAAADmBAAAfgEAAAEAAADjeMP7SqeHAuN4w_tKp4cCEQAPAOzgMiA=',
   u'location': [-71.075613, 42.444618],
   u'name': u'Cascade Trail'}]}

The first piece of data we want to capture is the distance from line 3 in the output above. By default, it returns distance in meters (296.8) so if we want miles… multiply by 0.000621371. The distance as the crow flies for the example above is 0.158 miles and via this route, it is 296.8 x 0.00062 = 0.184 miles. We also have all the nodes this route passes through in starting on line 6. The first few nodes are 66566291 to 1915819599 to 1915819596 and so on. Now, to plot the route we need geocoords, not node ids. Get that information from the osm file — since these files contain lots of data we don’t need such as the nodes that represent buildings and structure boundaries we can use osm filter (a command line tool) to take only roads and trails which will speed up the process of getting geocoords. Here’s how to scale it down…

# in terminal install osm filter...
wget -O - http://m.m.i24.cc/osmfilter.c |cc -x c - -O3 -o osmfilter

# test to make sure install worked...
./osmfilter --help

# be in the same directory as the osm file you want to filter and...
./osmfilter map --keep="highway= and name=" --drop="highway=service">streets_to_run.osm

Now parse that file and grab the geocoords.

# <a href="https://docs.python.org/2/library/xml.etree.elementtree.html">xml.etree.cElementTree lets us traverse our osm file</a>
import xml.etree.cElementTree as ET # to parse OSM file

# Define file name
filename = 'streets_to_run.osm'

# Parse the smaller OSM file
tree = ET.parse(filename)
root = tree.getroot()

# function will take osm node id and return the geocoords
def get_geocoords(nodeid):
	''' parses osm file and returns lat lon coords
		example input = '66566291'
		example output = '-71.0727793,42.4436791' '''

	for node in root.findall(".//node[@id='"+nodeid+"']"):
		return node.get('lon') + ',' + node.get('lat')

# send geocoords to final_path
the_nodes = data['routes'][0]['legs'][0]['annotation']['nodes']
final_path = []
for node in the_nodes:
    temp_coords = get_geocoords(str(node))
    final_path.append(temp_coords)
    
# write final_path to csv so we can visualize in r # I know, I know.. I should do this in python...
import csv
with open ('temp_route.csv', 'wb') as csvfile:
	routewriter = csv.writer(csvfile)
	routewriter.writerow(["LAT", "LON"])
	for el in final_path:
		temp = el.split(',')
		routewriter.writerow([temp[1], temp[0]])

Note that you should figure out how to do this all in python: https://blog.alookanalytics.com/2017/02/05/how-to-plot-your-own-bikejogging-route-using-python-and-google-maps-api/. For now, here’s the visual produced in RStudio.

library(ggmap)
library(dplyr)

setwd("/path_to_route.csv")

data <- read.csv("temp_route.csv")

fells_entrance <- c(-71.07397, 42.44374)
MelroseMap <- qmap(location = fells_entrance, zoom = 17)#, color = 'bw')
showme <- seq(0, nrow(data), 5)
MelroseMap +
  geom_path(aes(x = LON, y = LAT),
            colour="#1E2B6A", data=data, alpha=0.6, size=1) +
  geom_label(data=data[c(showme),],
            aes(x=LON, y=LAT, label=rownames(data[c(showme),])),
            size=1.5
            #position=position_jitter()
            ) +
  geom_label(data=data[nrow(data),],
             aes(x=LON, y=LAT,
                 label=rownames(data[nrow(data),])),
                 size=2,
                 color="red") +
  geom_label(data=data[1,],
             aes(x=LON, y=LAT,
                 label=rownames(data[1,])),
             size=2,
             color="blue")

In the map below, the route starts at the blue label and ends at the red label. Each 5th point gets a label. Not terribly exciting, but this is the bare minimum for plotting all kinds of routes.

* If you want to learn about APIs, this guy offers a good intermediate intro: https://www.udemy.com/rest-api-flask-and-python/learn/v4/t/lecture/5960174?start=0

Leave a Reply

Your email address will not be published. Required fields are marked *