Analyzing Minnesota Interstate-94 Traffic Volume and Weather

Author: Claudia Otero

Introduction

On the westbound side of Minnesota Interstate-94, how does the measured traffic volume change depending on changes to the observed weather? Through an inference on means, I discovered that some weather observations were indeed correlated to changes in the observed traffic volume, while other weather types proved to be insignificant in having any effect. Specifically, I analyzed the relationship between snowy and clear weather as well as partly sunny and cloudy conditions by performing Welch’s T-tests on the measured traffic volume corresponding to these conditions. Surprisingly, I found that there was an insignificant difference between the mean traffic volume observed on all snowy days and clear days. Conversely, I found that there was a significant difference in the mean traffic volume between partly sunny and cloudy conditions.

Background

Context

The typical American driver loses approximately 51 hours per year to traffic, with some drivers experiencing as much as 155 hours per year in total losses (INRIX). Aside from lost time, traffic can be costly. For instance, Chicago drivers annually pay an estimated ~$2,618 in residual gas expenses and common vehicle maintenance requirements due to traffic, according to INRIX, a private transportation analytics company. With urban areas expected to see expansive population increases in the coming decades, traffic and roadway congestion will continue to increase. This underscores the importance of measuring, characterizing, and understanding these metrics, which can be used to make improvements to our critical infrastructure and interstate systems.

Description of Data

To look at the effect that weather has on traffic volume, I used the dataset, ‘Metro Interstate Traffic Volume’, which contains 48,204 observations and 8 distinct variables. There are 5 quantitative variables, which measure the ambient air temperature, snowfall (mm), rainfall (mm), cloud coverage (%), and observed hourly traffic volume (vehicles per hour). 3 categorical variables display the specific date of observation, and short and long descriptions relating to the observed weather conditions. The data was retrieved from the westbound side of Interstate-94 at the MnDOT ATR Station 301 - a site located between Minneapolis and St. Paul, Minnesota. The station used subsurface sensor technology to track passing traffic on a continual basis between October 2, 2012 and September 30, 2018.

In order to keep my analysis accurate and concise, I first created a vector corresponding to the unique weather types of the dataset. I then extracted the mean traffic volume for each weather type by grouping across this vector. Subsequently, I performed my statistical analyses using this more organized data.

I acknowledge there may be other effects on the dataset, for instance, variations in weekday versus weekend traffic patterns, holidays, local events, construction projects, etc. These factors could have drastically altered traffic flow regardless of weather conditions, potentially skewing the data related to normal traffic patterns.

Hypothesis

My project analyzed the effect of weather on traffic volume along the westbound side of Minnesota Interstate-94. I hypothesized that the mean traffic volume measured during snowy conditions would differ significantly from the mean traffic volume recorded during clear conditions, given snow would be conducive of disruptive driving conditions causing greater traffic. Additionally, I hypothesized that there would be no significant difference between the mean traffic volume observed on partly sunny days, when compared to cloudy days, as road quality is typically unaffected by changes to cloud coverage.

A Look at the Types of Weather in the Dataset

weather_main n
Clear 13391
Clouds 15164
Drizzle 1821
Fog 912
Haze 1360
Mist 5950
Rain 5672
Smoke 20
Snow 2876
Squall 4
Thunderstorm 1034

From the table, I observed the different weather types and their respective number of occurrences. I can see that the types “clouds” and “clear” occurred the most often, while “squall” and “smoke” occurred the least.

Analysis

Looking at these histograms, I can see there is some variation in the distributions corresponding to each weather type. Some weather types were observed fewer times, such as squalls and smoke, while others were observed countless times. For those weather types that occurred in greater numbers, there appear to be some congruencies between the different distributions. For example, types “clouds”, “clear”, “rain”, “drizzle”, and “haze” all resemble a similar shape visually. This suggests an underlying invariability between the different weather conditions and their effect on traffic volume.

Traffic Volume Distribution by Weather Type Using a Boxplot

Evaluating the boxplots, here are some key observations I found: the whiskers indicate a wide range in traffic volume (vehicles per hour) across all weather types except “Squall” and “Smoke”, suggesting great variability within each category. As I observed, “Squall” and “Smoke” are rarer weather types, which might explain their compact range. I see that “Fog” shows the widest range, which could suggest that heavy fog might lead to significantly reduced traffic volume. Finally, “Clouds” and “Haze” have the highest median traffic volume, indicating that these conditions are more likely to have higher traffic volumes during the observed times.

Snowy Weather & Traffic Volume Vector - Deriving the Vectors

Evaluating the side-by-side boxplot, there is a no discernible difference in the average traffic volume (vehicles per hour) observed on clear and snowy days, as indicated by the white dots on the distribution. Similarly, the whiskers of the boxplots are quite similar, indicating a comparable range in traffic volume for both clear and snowy weather types.

Statistical Model

\[ H_0: µ_{\text{clear}} = µ_{\text{snow}} \\ H_a: µ_{\text{clear}} \neq µ_{\text{snow}} \]

Conduction of Test

## 
##  Welch Two Sample t-test
## 
## data:  clear_traffic_volume and snow_traffic_volume
## t = 0.99214, df = 4332.8, p-value = 0.3212
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -38.12902 116.25820
## sample estimates:
## mean of x mean of y 
##  3055.909  3016.844

Interpretation of Statistical Test

The evidence is consistent with there being no significant difference in the mean traffic volume based on the weather being snowy or clear (p-value = 0.3212, two-sided t-test, df = 4332.8).

Traffic Volume Based on Cloud Coverage

Evaluating the side-by-side boxplot, there is a noticeable difference in the average traffic volume observed on partly sunny and cloudy days, as indicated by the white dots on the distribution. The whiskers of the boxplots are nearly equivalent in length, indicating a similar range in traffic volume (vehicles per hour) for both partly sunny and cloudy weather types.

Statistical Model

\[ H_0: µ_{\text{cloudy}} = µ_{\text{partly sunny}} \\ H_a: µ_{\text{cloudy}} \neq µ_{\text{partly sunny}} \]

Conduction of test

## 
##  Welch Two Sample t-test
## 
## data:  x and y
## t = -9.2734, df = 47633, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -203.5359 -132.5097
## sample estimates:
## mean of x mean of y 
##  3172.461  3340.484

Interpretation of statistical test

The evidence is consistent with there being a significant difference in the mean traffic volume based on the weather being partly sunny or cloudy (p-value = 2.2e-16, two-sided t-test, df =47,633).

Discussion

My analysis of traffic on Minnesota Interstate-94 revealed unexpected results. I hypothesized that snow would reduce traffic volume, but given the results of the Welch’s Two-Sample T-test, there is not a statistically significant difference in the average traffic volume between clear and snowy days (p-value = 0.3212). A potential explanation for this maintained traffic volume might be Minnesota drivers’ familiarity with snowy conditions and the effectiveness of road services and maintenance.

With respect to cloud coverage, there was a statistically significant difference in the average traffic volume given partly sunny and cloudy conditions (p-value = 2.2e-16). This result contradicts my hypothesis that cloud cover would not have a significant impact on traffic volume. The data shows that cloudy conditions were associated with reduced traffic volumes, and the statistical significance of this correlation suggests that the observed difference in traffic volumes is not due to random chance. This suggests that driver behavior may be influenced by a myriad of other factors, rather than just the physical road conditions. Some potential explanations are that higher percentages of cloud coverage reduce sunlight and lead to lower road visibility, accompany other weather events like rain (even if not captured in the immediate dataset), or times with higher cloud coverage may coincide with social patterns like rush hours, when traffic patterns are different. I believe there are multifaceted reasons for this traffic volume pattern that should be investigated further.

In the future, it would be interesting to look at more granular data trends, such as the difference in traffic volume on an hourly basis. For instance, looking at the average cloud coverage on an hourly basis could show if the percentage of cloudiness is associated with a certain time of day, like rush hour. Another point of interest could be analyzing the relationship between specific holidays and their impact on travel. This was a variable I chose to exclude from our analysis, given its irrelevance to weather conditions.