Temporal Data in R

Let us first load and inspect the San Francisco data set:

load("data/SFincidents2012.rda")#incidents  
head(incidents) 
##   IncidntNum      Category                    Descript DayOfWeek
## 1  120000499       ROBBERY       ROBBERY, BODILY FORCE    Sunday
## 2  120000938  NON-CRIMINAL DEATH REPORT, CAUSE UNKNOWN    Sunday
## 3  120001936       SUICIDE          SUICIDE BY JUMPING    Sunday
## 4  120002235 VEHICLE THEFT           STOLEN AUTOMOBILE    Sunday
## 5  120003186  NON-CRIMINAL DEATH REPORT, CAUSE UNKNOWN    Monday
## 6  120000041       ASSAULT                     BATTERY    Sunday
##         Date  Time PdDistrict      Resolution                  Location
## 1 2012-01-01 01:50    CENTRAL JUVENILE BOOKED    VALLEJO ST / POWELL ST
## 2 2012-01-01 06:40   NORTHERN            NONE  800 Block of OFARRELL ST
## 3 2012-01-01 14:45   NORTHERN            NONE    1200 Block of GOUGH ST
## 4 2012-01-01 17:00    MISSION            NONE   1600 Block of BRYANT ST
## 5 2012-01-02 08:30    CENTRAL            NONE 1200 Block of STOCKTON ST
## 6 2012-01-01 00:25   SOUTHERN            NONE    200 Block of MARKET ST
##           X        Y violent censusBlock
## 1 -122.4105 37.79837    TRUE 06075010700
## 2 -122.4183 37.78516   FALSE 06075012202
## 3 -122.4244 37.78436   FALSE 06075016000
## 4 -122.4105 37.76562   FALSE 06075017700
## 5 -122.4084 37.79679   FALSE 06075010700
## 6 -122.3974 37.79244    TRUE 06075011700

We defined the following crimes as violent: as assault, robbery, rape, kidnapping, and purse snatching.

1. Exercises to temporally explore the data set:

  1. How would you compute the “HourOfWeek” from the timestamp? You may use the columns Time and DayOfWeek.
    What ambiguity needs to be first agreed upon so that you can compare your results to the data given.
#base method    
  1. Compute and graph the integer weekhour pattern of violent crime rate as a barplot. The function table will be useful. An example is shown below.

  2. Load and explore the library mgcv. Try to recreate a graph similar to the one below. (Hint, the bam function is much faster than gam)

  3. Using the commands table() and as.POSIXct() build an hourly time series of counts of (all) crimes. (Hint: as.POSIXct(“2012-01-02 08”, format=“%Y-%m-%d %H”))

  4. Plot that time series.

Extra Credit: Interactive Chart

  1. Use the package googleVis and look up the help file for gvisAnnotatedTimeLine and execute the first example. Then create an interactive chart for our hourly counts.
#cht = gvisAnnotatedTimeLine(CrimeCounts, "date", "NumCrimes")  
  1. Use the package dygraphs and look up the documentation at http://rstudio.github.io/dygraphs/
    Try to understand the pipe operator %>% and create another interactive time series chart with a range selector!
#CrimeCounts=xts(NumCrimes=as.numeric(h),as.POSIXct(names(h)))  
    
    
#dygraph(CrimeCounts)   
  1. The * %>% *command is a so called “pipe operator” which is super useful in that it passes the output from one command to another. Simply left to right! In base R we would normally achieve this by chaining functions, e.g. in class we often nest commands such as
    round(mean(x),2), which in pipe notation would then be written as mean(x) %>% round(,2)

Try to rewrite the chained command from above we used to create the Hour variable using pipes: