Friday, November 14, 2014

A first step in accident analysis

Predicting Santiam Pass Accidents from Twitter

Summary

The Oregon Dept of Transportation regularly publishes, as a public service, live road reports via Twitter. Using this data we can recontruct both the times and locations of accidents on sections of road and also correlate them with road conditions.

The analysis focuses on a specific location, US Highway 20 at Santiam Pass, a 4800 foot (1450 meters) mountain pass in the Cascade Range (milepost 79). As the main route from Bend, Oregon to the Willamette Valley cities of Portland, Eugene, and Salem, it has high traffic year round and is the site of frequent accidents.

This analysis establishes the accident frequency and density for a 11 mile stretch of road to be correlation of snow and ice to traffic accidents on the pass. I show there is a high rate of accidents on the Pass (about one every two weeks). I use Bayesian analysis to show a high correlation to snowy road conditions.

The data analyzed cover the dates from 2013-04-08 to 2014-11-11. There are 900 tweets during this period.

Filtering on the crtieria: Santiam Pass Summit and crash and taking complete cases reduces the number of ’crash data points to 50. During the same period there were 86 days with snow.

summary statistics

There were 0.6055363 accidents per week. Since the distance over which the accidents occured is 11 miles, the accident density is 2.8625354 accident/year/mile.

Timeline of accidents

The graph below shows a timeline of accidents, with the location of the accident (measured in distance from the summit) represented by the y-axis. Red data points represent crash data, while blue data points represent days when snow was reported in the feed.

Histogram of distance from Santiam Pass Summit

This plot shows the distribution of accidents as measured by the distance from Santiam Pass Summit. In this specific case, while the number of accidents on the West side outweigh those on the East, there appears to be a strong bias to the distribution if one takes the mode as the center of the distribution.

The median distance of accidents is 2 Miles West of the Summit with a standard deviation of 2.7771664 about the mean 1.96. The data show a skewed distribution.

Probabilities

We can calculate rough probabilities by noting that there were 0.0859107 crashes per day and the probability of snow was 0.1477663 during the same period.

Conclusion

This is a first analysis of this data. Will compare it to other road segments.

No comments:

Post a Comment