Has Vision Zero Made NYC Safer?
Editor’s note: Opinions expressed in this post do not necessarily reflect the views of #ODSC.
Oh, glorious, New York City. Concrete jungle and busy people. Although every New Yorker can give you a different version of “Why New York City is the Best City in the World,” there is one thing that is very unlikely to appear on that long list:
Driving in New York City is an absolute nightmare. Non-stop construction, countless one-ways, bus drivers,
professional racers taxi drivers, the no-turn sign and camping cop combo, and headlights rendering cyclists and pedestrians blind…… OH my.
Just another busy day in NYC.
The real danger caused by these bad traffic conditions is large number of traffic accidents. In the year 2013 alone there were 297 people killed in traffic accidents within New York City. Therefore, in order to make the roads safer in this city, the city adopted a public safety program, Vision Zero, aimed to reduce traffic fatalities. Now, in July 2016, two and half years after the kick-off of this program, let’s use data science to find out whether Vision Zero has made New Yorkers safer on the roads.
About Vision Zero
Vision Zero originally started in Sweden in October 1997. Since then, countries around the world adopted the model.
Mayor Bill de Blasio launched NYC’s Vision Zero Plan in January 2014, two weeks after his succeeding Michael Bloomberg as the Mayor of New York City. Following the announcement, there was a long list of initiatives released by New York City agencies; for example, reducing the citywide street speed limit from 30 MPH to 25 MPH, installing more red light cameras, and increasing NYPD’s law enforcement against moving violations. Also as a part of this plan, a dataset has been released to the public and has been regularly updated by New York Police Department, which contains the records of each motor vehicle collision that happened within all five boroughs of NYC. This dataset has made it possible to quantify and review how Vision Zero has affected traffic safety in New York City.
Available at NYC OpenData Website, the motor vehicle collision records within this dataset were surprisingly well structured and relatively clean. As of June 30 2016, there were more than 830,000 records within the dataset, along with 29 variables that describe different aspects of each accident – date, time, location, vehicle type, contributing factors toward accidents, etc. However, before we start to work on numbers, there are some little issues with this dataset that we need solve.
First of all, let’s load data and R libraries for this analysis:
#### Load libraries for the project #### library(data.table) # Data Loading library(dplyr) # Data Manupulation library(caTools) # Data Manupulation library(ggplot2) # Visualization library(ggmap) # Visualization library(maps) # Visualization library(RColorBrewer) # Visualization
#### Load Data Set #### # You need to download this data by yourself. # See https://data.cityofnewyork.us/Public-Safety/NYPD-Motor-Vehicle-Collisions/h9gi-nx95 collision <- fread('NYPD_Motor_Vehicle_Collisions.csv', stringsAsFactors = F) # Transform Date & Time (rounded to hour) work_dt <- collision %>% mutate(FULLDATE = as.POSIXct(paste(DATE, TIME), format = '%m/%d/%Y %H:%M', tz = 'EST') ) %>% mutate(YEAR = year(FULLDATE), MONTH = month(FULLDATE), DAY = mday(FULLDATE), HOUR = hour(round(FULLDATE, 'hour')) ) %>% as.data.table
Records in this dataset trace back to July 1st, 2012, which means in order to do a yearly comparison based on calendar years we’ll have to exclude 2012 and 2016. Those two years don’t have same time scale as 2013 – 2015 (See Figure 1).
# Let's visualize original time range g <- ggplot(data = sample_set(work_dt), aes(x = factor(month(FULLDATE)), y = YEAR)) g + geom_tile(aes(fill = factor(YEAR)), color = 'white') + # ggtitle('Time Range of NYC Motor Vehicle Collision Data') + labs(x = 'Month')+ theme(legend.position = 'none', axis.text=element_text(size=12), axis.title=element_text(size=14,face="bold") )
Figure 1. Time Range of NYC Motor Vehicle Collision Dataset
Fortunately, there are 48 consecutive months of data in this dataset; therefore we can break the records into four 12-month periods, where each period contains same time range and still covers a full cycle of twelve consecutive calendar months (See Figure 2).
# Data group for yearly comparison (Each period July - June) work_dt[MONTH < 7, PERIOD := paste(YEAR - 1, YEAR, sep = '-')] work_dt[MONTH >= 7, PERIOD := paste(YEAR, YEAR + 1, sep = '-')] # Remove 2016.7, since it falls into period 2016-2017 work_dt <- work_dt[PERIOD != '2016-2017'] # Group by 12-month periods for analysis g + geom_tile(aes(fill = PERIOD), color = 'white') + labs(x = 'Month') + # ggtitle('12 Month Periods For Yearly Comparison') + theme(legend.position = 'top', legend.title = element_text(face="bold"), axis.text=element_text(size=12), axis.title=element_text(size=14,face="bold"))
Figure 2. Customized 12-Month Periods For Yearly Comparison
Since the target of this analysis is to evaluate the efficiency of Vision Zero, not all 29 variables within the dataset are useful for this purpose. Therefore, number of accidents, fatalities, and injuries were selected as key factors to be used in this analysis.
#### Build subsets for each topic #### # Temp data table for table construction right_dt <- work_dt %>% select(-starts_with('CONTRIBUTING FACTOR'), -starts_with('VEHICLE TYPE CODE'), -matches('KILLED|INJURED'), ID = `UNIQUE KEY`) # Injuries injur_dt <- melt_count(work_dt, 'ends', 'INJURED', 'stat') %>% merge(right_dt, by = 'ID') # Deaths kill_dt <- melt_count(work_dt, 'matches', 'KILLED', 'stat') %>% merge(right_dt, by = 'ID')
Cold Numbers And Colorful Charts
Time to work on statistics!
Total Number of Accidents
Let’s look at the total number of accidents first. Figure 3 below shows the total number of accidents in NYC by each period:
# Total count by period accident_n_dt <- work_dt %>% group_by(PERIOD) %>% summarise(N = n()) ggplot(data = accident_n_dt) + geom_bar(aes(x = PERIOD, y = N, fill = PERIOD), stat = 'identity') + labs(y = NULL, title = 'Number of Motor Vehicle Collisions for Each 12-Month Period') + theme(legend.position = 'none', axis.text=element_text(size=12), axis.title=element_text(size=14,face="bold")) + scale_y_continuous(labels = scales::comma) + geom_text(aes(x = PERIOD, y = N - 6000, label = N), size = 5)
Figure 3. Total Number of Accidents Each Period
Hmm…not very informative. It shows an increase of total numbers in each period, but given the large absolute numbers in each period, the changes are not very clear. Let’s try to pivot this chart into growth rates for a better view:
# Accident count percentage change accident_p_vec <- c(0, tail(accident_n_dt$N, nrow(accident_n_dt) - 1) / head(accident_n_dt$N, nrow(accident_n_dt) - 1) - 1) # Number of registered vehicles in NYC # https://dmv.ny.gov/about-dmv/statistical-summaries reg_nyc <- data.table(Year = seq(2012, 2015), Reg_num = c(1978392, 2016158, 2057433, 2107321)) # Registration percentage change reg_p_vec <- c(0, tail(reg_nyc$Reg_num, nrow(reg_nyc)-1) / head(reg_nyc$Reg_num, nrow(reg_nyc)-1) - 1) temp <- data.frame(PERIOD = rep(sort(unique(work_dt$PERIOD)), 2), TYPE = rep(c('Motor Vehicle Collision', 'Vehicle Registration in NYC'), each = 4), VALUE = c(accident_p_vec, reg_p_vec)) # Visualize changes ggplot(data = temp, aes(x = PERIOD, y = VALUE, color = TYPE, group = TYPE)) + geom_point(size = 3) + geom_line(size = 2, alpha = .8) + scale_y_continuous(labels = scales::percent) + theme(legend.position = 'none') + labs(title = 'Yearly Growth Rate', y = '% Growth Since Last Year') + facet_wrap( ~ TYPE, nrow = 2)
Figure 4. Change Rates of Car Accidents and Vehicle Registration in NYC by Each Period
Figure 4 looks much better! Now look at the upper part of this chart, the red line illustrates the growth rate of the total number of car accidents in each period (growth in period 2012-2013 was 0% because it has no prior numbers to be compared with). It seems the number has been increasing in each period. Only period 2014-2015, which covers the time point of Vision Zero’s announcement, shows a slower growth.
Now, does this suggest Vision Zero has been failing? Not necessarily. Look at the lower part of this chart, the blue line shows the percentage growth of vehicle registration in New York City (growth rates calculated based on data from NY DMV), and it has been growing at a similar rate as accident growth! Maybe that explains why accidents have been increasing consistently?
Let’s explore more about other aspects of accidents.
# Injury count by type injur_n_dt <- injur_dt %>% group_by(PERIOD, INJURED) %>% summarise(Injury = sum(INJURED_V)) %>% filter(!grepl('PERSONS', INJURED)) %>% # Num. of persons is the total of # other three stats arrange(desc(Injury)) %>% mutate(INJURED = INJURED %>% gsub('NUMBER OF ', '', .) %>% gsub(' INJURED', '', .) %>% factor(levels = c('CYCLIST', 'PEDESTRIANS', 'MOTORIST'))) # Injury total by period injur_n_label_dt <- injur_n_dt %>% group_by(PERIOD) %>% summarise(N = sum(Injury)) ## Visualization g <- ggplot(data = injur_n_dt) + geom_bar(aes(x = PERIOD, y = Injury, fill = reorder(INJURED, Injury)), stat = 'identity') + scale_fill_manual(values = c("#468966", "#77C4D3", "#FFB03B")) + guides(fill = guide_legend(ncol = 3, title = 'Injury Type')) + theme(legend.position = 'top', axis.text=element_text(size=12), axis.title=element_text(size=14,face="bold")) # Injury per Period - Total g + geom_text(data = injur_n_label_dt, aes(x = PERIOD, y = N + 2000, label = N), size = 5) # Break down g + facet_wrap(~ INJURED)
Figure 5. Total Number of Injuries in NYC Per Period
Figure 6. Total Number of Injuries in NYC Per Period by Types
# Yearly Average Injury per Accident temp <- injur_n_label_dt %>% mutate(injur_rate = N / accident_n_dt$N) ggplot(data = temp, aes(x = PERIOD, y = injur_rate)) + geom_line(group = 1, size = 2, alpha = .5, color = 'red') + geom_label(aes(label = round(injur_rate, 3))) + labs(x = NULL, y = 'Injury Per Accident') + theme(legend.position = 'none', axis.text=element_text(size=12), axis.title=element_text(size=14,face="bold")) + coord_cartesian(ylim = c(0, .5))
Figure 5 – 7 summarized total injuries caused by traffic accidents in each period. Interestingly… although total accidents are growing, injuries are relatively stable — except in 2014-2015 period, which had a significant drop in injury count.
What about each injury type?
Injuries broken down by category show a mixed pattern: Cyclist injuries, although small in absolute numbers, has been consistently climbing over the years; Pedestrian injuries stayed low after the big drop in 2014-2015; Motorist injuries, as the most common injury type, followed the exact same drop-bounce pattern as overall numbers.
Increasing accidents and fluctuating injuries. Things are not looking good well so far. Before making a conclusion to this analysis, the last but most important statistical figure needs to be checked – traffic fatalities.
# Data death_n_dt <- kill_dt %>% group_by(PERIOD, KILLED) %>% summarise(Death = sum(KILLED_V)) %>% filter(!grepl('PERSONS', KILLED)) %>% # Num. of persons is the total of # other three stats arrange(desc(Death)) death_n_dt$KILLED <- death_n_dt$KILLED %>% gsub('NUMBER OF ', '', .) %>% gsub(' KILLED', '', .) death_n_label_dt <- death_n_dt %>% group_by(PERIOD) %>% summarise(N = sum(Death)) # Plotting g <- ggplot(data = death_n_dt) + geom_bar(aes(x = PERIOD, y = Death, fill = reorder(KILLED, Death)), stat = 'identity') + scale_fill_manual(values = c("#468966", "#77C4D3", "#FFB03B")) + guides(fill = guide_legend(ncol = 3, title = 'Killed Type')) + theme(legend.position = 'top', axis.text=element_text(size=12), axis.title=element_text(size=14,face="bold")) g + geom_text(data = death_n_label_dt, aes(x = PERIOD, y = N + 20, label = N), size = 5) # Break down g + facet_wrap(~ KILLED)