The motivation to want to draw on map tiles directly within R is twofold.
While the R environment boasts a long history of spatial analysis tools and libraries, the plots have traditionally been created without any spatial context that a map would provide.
The meuse data set is the “Hello world” equivalent of the sp package. I quote:
It gives locations and topsoil heavy metal concentrations, along with a number of soil and landscape variables at the observation locations, collected in a flood plain of the river Meuse, near the village of Stein (NL). Heavy metal concentrations are from composite samples of an area of approximately 15 m x 15m.
Note the “strange” coordinate system
#require(sp)
data(meuse)
head(meuse[,1:8])
## x y cadmium copper lead zinc elev dist
## 1 181072 333611 11.7 85 299 1022 7.909 0.00135803
## 2 181025 333558 8.6 81 277 1141 6.983 0.01222430
## 3 181165 333537 6.5 68 199 640 7.800 0.10302900
## 4 181298 333484 2.6 81 116 257 7.655 0.19009400
## 5 181307 333330 2.8 48 117 269 7.480 0.27709000
## 6 181390 333260 3.0 61 137 281 7.791 0.36406700
coordinates(meuse) <- c("x", "y")
#par( mar=c(2,2,2,2), cex.main=1.5,cex.lab=2)
#bubble(meuse, "zinc", main = "zinc concentrations (ppm)",#maxsize=5,
# key.entries = 100+ 100 * 2^(0:4),key.space = "inside")
LevCols = rev(heat.colors(5))[cut(meuse$lead,breaks=c(0,100,200,300,400,Inf),labels=FALSE)]
plot(meuse, bg=LevCols, col="grey",main="Lead Distribution", pch=21, axes=TRUE)
There are times when such a plot will be the preferred choice: a clean and simple graph of the locations of interest with no clutter and no distractions. The analyst can focus on the pattern itself and the marker attributes.
However, a lot of the modern massive data sets gathered such as location information from mobile devices, surveys, crime data, vehicle tracks, demographic data, etc. require a map based spatial context for even the most basic data explorations.
In those settings, the somewhat narrow or ``blind’’ exploration of spatial data on a blank background can be rather limiting and often leads to less insight than would be possible had the data been graphed on a map canvas.
Anecdotally, we dare to speculate that the British phyisican John Snow might have been slower to identify contaminated drinking water as the cause of the cholera epidemic in 1854, had he not used a dot map to illustrate the cluster of cholera cases around one of the pumps in London.
This tutorial uses two crime data set as examples for spatiotemporal data:
A data set that contains all of the police incidents (except homicide and manslaughter) that were reported in San Francisco in 2012. The full data set is freely available for download ( https://data.sfgov.org/ ) and contains the date, time of day, and day of week that each incident was reported, along with the incident category, a brief description, and location (as police district, lat-long coordinates, and address to the nearest 100 block).
A lightly cleaned Houston crime data set from January 2010 to August 2010 geocoded with Google Maps
Let us first load and inspect the San Francisco data set:
load("data/SFincidents2012.rda")#incidents
head(incidents)
## IncidntNum Category Descript DayOfWeek
## 1 120000499 ROBBERY ROBBERY, BODILY FORCE Sunday
## 2 120000938 NON-CRIMINAL DEATH REPORT, CAUSE UNKNOWN Sunday
## 3 120001936 SUICIDE SUICIDE BY JUMPING Sunday
## 4 120002235 VEHICLE THEFT STOLEN AUTOMOBILE Sunday
## 5 120003186 NON-CRIMINAL DEATH REPORT, CAUSE UNKNOWN Monday
## 6 120000041 ASSAULT BATTERY Sunday
## Date Time PdDistrict Resolution Location
## 1 2012-01-01 01:50 CENTRAL JUVENILE BOOKED VALLEJO ST / POWELL ST
## 2 2012-01-01 06:40 NORTHERN NONE 800 Block of OFARRELL ST
## 3 2012-01-01 14:45 NORTHERN NONE 1200 Block of GOUGH ST
## 4 2012-01-01 17:00 MISSION NONE 1600 Block of BRYANT ST
## 5 2012-01-02 08:30 CENTRAL NONE 1200 Block of STOCKTON ST
## 6 2012-01-01 00:25 SOUTHERN NONE 200 Block of MARKET ST
## X Y violent censusBlock
## 1 -122.4105 37.79837 TRUE 06075010700
## 2 -122.4183 37.78516 FALSE 06075012202
## 3 -122.4244 37.78436 FALSE 06075016000
## 4 -122.4105 37.76562 FALSE 06075017700
## 5 -122.4084 37.79679 FALSE 06075010700
## 6 -122.3974 37.79244 TRUE 06075011700
We defined the following crimes as violent: as assault, robbery, rape, kidnapping, and purse snatching.
Let us take a look at the data using the lat/lon coordinates.
It is easy enough to directly create a 2D scatterplot of the data color coding violence:
#for speed let us plot just a sample:
i = sample(nrow(incidents),10000)
ViolCol=c("green","red")[as.numeric(incidents[i,"violent"])]
plot(Y ~ X, data = incidents[i,], col = ViolCol,
cex = 0.5, pch=20,xlab="lon", ylab="lat")
If you are familiar with San Francisco you can almost discern the city boundaries as well as the major street grid from this plot !
If you go back to the meuse data and check out the example code you find snippets as follows:
crs = CRS("+init=epsg:28992") # set original projection
data("meuse")
coordinates(meuse) <- ~x+y
proj4string(meuse) <- crs
merc = CRS("+init=epsg:3857")
WGS84 = CRS("+init=epsg:4326")
#this funcion requires rgdal to be installed!
meuse2 = spTransform(meuse, WGS84)
cbind(head(coordinates(meuse)),head(coordinates(meuse2)))
## x y x y
## 1 181072 333611 5.758536 50.99156
## 2 181025 333558 5.757863 50.99109
## 3 181165 333537 5.759855 50.99089
## 4 181298 333484 5.761746 50.99041
## 5 181307 333330 5.761863 50.98903
## 6 181390 333260 5.763040 50.98839
plot(meuse2,
bg=rev(heat.colors(5))[cut(meuse2$lead,breaks=c(0,100,200,300,400,Inf),labels=FALSE)],
col="grey",main="Lead Distribution", pch=21, axes=TRUE)
Gauss’s Theorema Egregium proved that a sphere’s surface cannot be represented on a plane without distortion. The same applies to other reference surfaces used as models for the Earth. Since any map projection is a representation of one of those surfaces on a plane, all map projections distort. Every distinct map projection distorts in a distinct way. The many choices of map projections is the result of different people/problems/contexts wanting to preserve different metrics such as
The Mercator projection exaggerates areas far from the equator. For example:
The Mercator projection portrays Greenland as larger than Australia; in actuality, Australia is more than three and a half times larger than Greenland.
The classic way of showing the distortion inherent in a projection is to use Tissot’s indicatrix.
LA <- c(-118.40, 33.95)
NY <- c(-73.78, 40.63)
data(wrld)
plot(wrld, type='l')
gc <- greatCircle(LA, NY)
lines(gc, lwd=2, col='blue')
How can we quickly find the lat/lon coordinates of cities?
Web interfaces, such as http://www.gpsvisualizer.com/geocode
Or the built-in function getGeoCode() from the RgoogleMaps package.
How do we compute the distance between two points on the surface of the earth?
#LA-NY example from above (in km):
cat("The geodesic distance between LA and NY is ", distHaversine(LA,NY)/1000,"km \n")
## The geodesic distance between LA and NY is 3977.614 km
Note that the order is (lon,lat) for the points passed to distHaversine!
#first geocode the two cities:
The origin is at (lon=0,lat=80)
#destPoint(LA, b=65, d=100000)
circle=destPoint(c(0,80), b=1:365, d=1000000)
circle2=destPoint(c(0,80), b=1:365, d=800000)
circle3=destPoint(c(0,80), b=1:365, d=100000)
plot(circle, type='l')
polygon(circle, col='blue', border='black', lwd=2)
polygon(circle2, col='red', lwd=2, border='red')
polygon(circle3, col='white', lwd=2, border='black')
rect(circle2[61,"lon"],circle2[61,"lat"],circle2[61,"lon"]+4,circle2[61,"lat"]+4,
border="orange",lwd=2.5)
Hint: The lower left corner of the rectangle is lon=44.04516, lat=80.975, and the latitude somewhere in the middle of it is lat=82.77
lat=80
pnt=c(0,lat)
circle2=destPoint(pnt, b=1:365, d=800000)
#find the touching tangent:
line <- rbind(circle2[61,], circle2[61,]+c(0,4))
d = dist2Line(pnt, line)
#plot( makeLine(line), type='l')
circle1=destPoint(pnt, b=1:365, d=d[,1])
plot(circle, type='n');grid()
# points(pnt, col='purple', pch='x',cex=2)
# plot(c(-5,5),c(79.25,80.75), type='n');grid()
# points(matrix(c(0,80),nrow=1), col='purple', pch='x',cex=2)
points(0,lat, col='purple', pch='x',cex=1.25)
xy.coords(c(0,80))
## $x
## [1] 1 2
##
## $y
## [1] 0 80
##
## $xlab
## [1] "Index"
##
## $ylab
## NULL
#points(0,80, col='purple', pch='x',cex=1.25)
polygon(circle2, col=rgb(1,0,0,0.1), lwd=2, border='red')
polygon(circle1, col=rgb(0,0,1,0.1), border='black', lwd=2)
rect(circle2[61,"lon"],circle2[61,"lat"],circle2[61,"lon"]+4,circle2[61,"lat"]+4,
border="orange",lwd=2.5)
points(line)
points(d[,2], d[,3], col='green', pch='x')
gc1 <- greatCircle(pnt, d[,2:3])
lines(gc1, lwd=1, col='green')
gc2 <- greatCircle(pnt, circle2[61,])
lines(gc2, lwd=1, col='blue')
#match(T, circle1[,"lat"]<= circle2[61,"lat"])
Although the Mercator projection significantly distorts scale and area (particularly near the poles), it has two important properties that outweigh the scale distortion:
It’s a conformal projection, which means that it preserves the shape of relatively small objects. This is especially important when showing aerial imagery, because we want to avoid distorting the shape of buildings. Square buildings should appear square, not rectangular.
It’s a cylindrical projection, which means that north and south are always straight up and down, and west and east are always straight left and right.
At the lowest level of detail (Level 1), the map is 512 x 512 pixels. At each successive level of detail, the map width and height grow by a factor of 2: Level 2 is 1024 x 1024 pixels, Level 3 is 2048 x 2048 pixels, Level 4 is 4096 x 4096 pixels, and so on.
For example, at level 3, the pixel coordinates range from (0, 0) to (2047, 2047), like this:
Pixel coordinates at level 3
To optimize the performance of map retrieval and display, the rendered map is cut into tiles of \(256 x 256\) pixels each. As the number of pixels differs at each level of detail, so does the number of tiles:
map width = map height = \(2^{level}\) tiles
Each tile is given XY coordinates ranging from (0, 0) in the upper left to \((2^{level}-1, 2^{level}-1)\) in the lower right.
Quadtree structure
If you want to read more about the efficient use of Quadkeys:
https://msdn.microsoft.com/en-us/library/bb259689.aspx
Now that we have a basic understanding of issues surrounding maps, let us move to the 2nd part: