Easily Mapping Customer Locations

Occasionally, I need a map. I’m not talking about a beautiful, polished work of art. I just need something that gives me some spacial context and that will be acceptable in a customer presentation. Most frequently I need to see points on a map… sometimes lots of them.

If you Google ‘plot points on a map’ you will have dozens of options — http://multiplottr.comhttps://batchgeo.com, and https://www.mapcustomizer.com are the first few I see pop up. These will work, especially if you only need to plot a few points. However, when I experiment with these I find myself getting frustrated. Instead, I use r. FYI, there is ArcGIS if you need to do some heavy duty mapping work…

I have a list of zip codes and a corresponding number of orders shipped to that zip code over the past year. The top of the file looks like this:

id zip orders
1 10001 2
2 10002 344
3 10003 4
4 10004 5
5 10005 98
6 10006 5

Before I map anything, I need to do two things; fix zip codes that read into r as four digit zips because the first identifier is 0 (for instance Holtsville, NY with its zip code 00501) and add the lat / lon coordinates of each zip code to each row.

library(dplyr) # helps us do sql-type joins (will also use to filter data later)
library(zipcode) # gives us lat / lon and city / state names for each zipcode
data(zipcode) # loads the lat / lon data

# read in your data
orig <- read.csv("order.csv")

# fix zip codes that start with zero read in as 4 digit zip codes
# note that I run this more than once in case the zip is preceded by more than 1 zero
orig$zip <- lapply(orig$zip, function(x)
 ifelse(nchar(x) < 5, paste0("0",x), x))

# convert zipcode to categorical variable
orig$zip <- as.character(orig$zip)

# combine your zip code data with lat / lon and city / state name data
data <- left_join(orig, zipcode, by="zip")

Great. Now the top of our data looks like this…

id zip orders city state latitude longitude
1 10001 62 New York NY 40.75074 -73.99653
2 10002 34 New York NY 40.71704 -73.98700
3 10003 42 New York NY 40.73251 -73.98935
4 10004 5 New York NJ 40.69923 -74.04118
5 10005 18 New York NY 40.70602 -74.00858
6 10006 5 New York NY 40.70790 -74.01342

Well, that’s strange… New York, NJ? I had to detour and look at Google maps. The coordinates lands on Ellis Island which sits in the Hudson River between New York and New Jersey. In fact, on the map, it says both New York and New Jersey — dual ownership. I still feel good about the data. Now, I always start with a basic, basic map leveraging r’s maps package…

library(maps) # basic r map package (basic is a misleading word bc it's still pretty amazing)

# plot a map of US states (no dots yet), with gray lines, and don't fill the map with any color
map(database="state", col=gray(0.5), fill=FALSE)

# now plot a single dark green point at each latitude / longitude location 
points(data$longitude, data$latitude, pch=20, col="dark green", cex=0.1)

Rplot01

And that’s not a terrible start. LA, San Francisco, Portland, and Seattle on the west coast have decent order volume… and the Northeast is saturated. Often, that’s good enough for what I do… but I can never resist getting into the ggplot package. Next, I’ll make it look just a little bit prettier and vary the size of the dot by the relative order volume for each zip code…

library(ggplot2) # helps us plot beautiful vizualizations

# set up the basic ggplot US state map with grid lines and all
states <- map_data("state")
p <- ggplot() + coord_fixed(1.3) + xlab("") + ylab("")
base_state_map <- p + geom_polygon(data=states, aes(x=long, y=lat, group=group), 
                                   colour="black", fill="steelblue")

# create a theme to strip out background shading and gridlines
cleanup <- theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), 
                 panel.background = element_rect(fill = 'white', colour = 'white'), 
                 axis.line = element_line(colour = "white"), legend.position="none",
                 axis.ticks=element_blank(), axis.text.x=element_blank(),
                 axis.text.y=element_blank())

states_map <- base_state_map + cleanup

Rplot02

Pretty neat. Let’s add the orders. I’m going to ignore Alaska and Hawaii for now because it pushes the continental US into the bottom right hand corner of the window…

# filter out orders to the state of AK or HI
data <- filter(data, state != "AK")
data <- filter(data, state != "HI")

# plot dots on states_map created above. The alpha parameter makes the dots somewhat transparent
map_with_orders <- states_map + 
                   geom_point(data=data, 
                              aes(x=longitude, y=latitude, size=value), 
                              colour="yellow";, pch=20, alpha=I(0.5)) +
                              scale_size_continuous(range = c(0.01, 3.0))

Rplot04
Works for me. As you can imagine, from here it is pretty easy to zoom in and create a map for a particular state or county.

One of my favorite things about working with ggplot is the facet grid feature that allows us to create visual comparisons. For demo purposes, I took my zip code data and replicated over 4 years so that the top of my raw data resembled this…

id zip  2012 2013 2014 2015
1 10001 2 20 25 25
2 10002 344 300 350 400
3 10003 4 4 14 14
4 10004 5 15 50 45
5 10005 98 150 102 90
6 10006 5 5 5 5

Before creating four plots each representing a calendar year, we need to use the reshape package to put the data in a ‘use-able’ format.


# assuming zip code data already loaded...
# read in your revised data, fix the zip codes, rename the columns
annual_data <- read.csv("annual_orders.csv")
annual_data$zip <- lapply(annual_data$zip, function(x)
  ifelse(nchar(x) < 5, paste0("0",x), x))
annual_data$zip <- as.character(annual_data$zip)
names(annual_data) <- c("zip", "2012", "2013", "2014", "2015")

# melting the data will create one row per zipcode for each year
# then I marry it up with the zip code library data
library(reshape)
temp <- melt(annual_data, id=c("zip"))
ann_map_data <- left_join(temp, zipcode, by="zip")

Now the top of our data looks like this. Notice the new column called ‘variable’ which represents the year for the data.

id zip variable orders city state latitude longitude
1 10001 2012 62 New York NY 40.75074 -73.99653
2 10002 2012 34 New York NY 40.71704 -73.98700
3 10003 2012 42 New York NY 40.73251 -73.98935
4 10004 2012 5 New York NJ 40.69923 -74.04118
5 10005 2012 18 New York NY 40.70602 -74.00858
6 10006 2012 5 New York NY 40.70790 -74.01342

Finally, we remove AK and HI again and plot using ggplot’s facet grid feature to split the data on our ‘variable’ variable.

# filter out orders to the state of AK or HI
ann_map_data <- filter(ann_map_data, state != "AK")
ann_map_data <- filter(ann_map_data, state != "HI")

# plot one map per year. Smaller maps require smaller dots... so reduced top end of scale_size_continuous
new_map_with_orders <- states_map + geom_point(data=my_data, 
                                               aes(x=longitude, y=latitude, size=value), 
                                               colour = "yellow", pch=20, alpha=I(0.5)) +
                                               scale_size_continuous(range = c(0.01, 1.0)) +
                                               facet_wrap(~variable)

Rplot
And now it’s easy for me to see how growth occurred between 2012 and 2015. More orders going to the same places (oversimplified). I manipulated this data on purpose to make the year over year growth very noticeable in the four maps above, but I find all sorts of uses in real life for the facet_grid feature.

If you occasionally need a quick map — to either plot customer locations or visualize comparisons (r is also great with choropleths) — r is a great option. If you want to try some of the stuff above, but can’t figure it out.. I’d love to help. franciscorrigan3 AT g mail.

Leave a Reply

Your email address will not be published.