3 Results

3.1 Regression Analysis

To test our variables effects on Life Expectancy, we created five different models. First, we created an Ordinary Least Squares (OLS) model where we are not controlling for year, county, or state. We use the OLS model as a baseline model, as the other four models we use will involve various levels of control for the year, county, and state. In our County Fixed Effects Model, we are controlling for the unexplained variability of each county. In doing so, our model creates a dummy variable for every county, which allows the model to control for every county’s unexplained variability but at the cost of overfitting the data. This offers an explanation for the poor \(R^2\) and \(Adjusted R^2\) values the model generates. Conversely for the County Fixed Effects Model, the high level of controlling allows the demographic variables to explain life expectancy, whereas in all other models, they do not contribute nearly as much to our effort in explaining life expectancy. For example, if we hold all the other variables constant and we control for the unexplained variability in each county, then we predict an increase of 0.141 years of life expectancy for every additional percentage point in the white population of a county, on average. This is statistically significant at the 1 percent level. However, for every other model, the effect of each additional percentage point in a county’s white population negatively affects life expectancy.
In the County Fixed Effects Model, the percent of a county’s population being female has a coefficient of -0.218, and is also statistically significant at the 1 percent level. That coefficient is by far the largest coefficient out of the rest of the models, with the rest of its coefficients being -0.058 or more. Some possible explanations suggest that demographics may play a pivotal role in shaping socioeconomic outcomes at the county level, potentially influenced by varying economic activities, differing access to social policies and support systems, and diverse cultural attitudes and norms across different counties.

Regression Models
Dependent variable:
lifeExp
OLS panel
linear
OLS County FE Year FE County Year FE State Year FE
pctWhite -0.014*** 0.141*** -0.009*** -0.062*** -0.013***
p = 0.000 p = 0.000 p = 0.000 p = 0.001 p = 0.000
pctFemale -0.015 -0.218*** -0.026** 0.004 -0.058***
p = 0.179 p = 0.0002 p = 0.018 p = 0.937 p = 0.00000
medHincome 0.00004*** -0.00002*** 0.00004*** 0.00001*** 0.00004***
p = 0.000 p = 0.000 p = 0.000 p = 0.0005 p = 0.000
adltSmoking -0.290*** 0.016*** -0.328*** -0.011* -0.440***
p = 0.000 p = 0.002 p = 0.000 p = 0.077 p = 0.000
uninsured -0.012** 0.018* -0.007 0.006 0.114***
p = 0.016 p = 0.085 p = 0.116 p = 0.541 p = 0.000
drvAlone -0.040*** 0.025*** -0.043*** -0.025*** -0.033***
p = 0.000 p = 0.0003 p = 0.000 p = 0.00005 p = 0.000
excDrinking 0.139*** -0.007 0.123*** 0.0003 0.101***
p = 0.000 p = 0.245 p = 0.000 p = 0.958 p = 0.000
homicides -0.170*** -0.161*** -0.152*** -0.101*** -0.128***
p = 0.000 p = 0.000 p = 0.000 p = 0.000 p = 0.000
Constant 84.206***
p = 0.000
Observations 5,180 5,180 5,180 5,180 5,180
R2 0.786 0.210 0.805 0.068 0.755
Adjusted R2 0.786 -0.097 0.805 -0.294 0.744
Note: P-values reported in parentheses, *p<0.1;**p<0.05;***p<0.01

As the name suggests, we control for the unexplained variability of each year in the Year Fixed Effects Model. Even though this might explain the madness that ensued during 2020, controlling for the year alone did not reveal anything worth discussing. However, when we controlled for both year and county (County and Year Fixed Effects), some of our variables become so statistically insignificant that it is worth noting. With the variables of the percentage of females in a county’s population and excessive drinking each having p-values greater than 0.9 and the percentage of the population that is uninsured to have a p-value of 0.541. These high values can be explained by the sheer number of dummy variables that are created in this model, but we think it is more likely that they are just not useful when we control for year and county. Furthermore, the variables that are statistically significant should be noted because they are important at predicting life expectancy. However, this model should not be used for any predictions as it is too reactive on county differences. When we control for year, demographics seem to play a less important role compared to their effect in the County Fixed Effects Model, but still more than the rest of the models. This could be explained by the almost minimal change in the demographics over each year and the overwhelming number of dummy variables used in the County Year Fixed Effects Model.

In our final model, we control for the unexplained variability of each state as well as controlling for the year. From our model analysis, we feel the best way to explain what factors influence life expectancy is through the State-Year Fixed Effects model. As seen in the model output, each variable is statistically significant at the 1 percent level. The percentage of the population that is uninsured has a coefficient of 0.114, which is its largest coefficient by far. This might be because when we control for states, we are controlling for differing state laws, economic status of a state, and other unexplained variability that differs from state to state.

Four of the variables are statistically significant at the 1 percent level for all models: pctWhite, medHincome, drvAlone, and homicides. We have already discussed pctWhite, but to further elaborate on its significance, given that more than half of the population is comprised of non-Hispanic whites, it makes sense that it is statistically significant for all our models. The coefficient on median household income is the same for three of the five models 0.00004. These models predict that for every 1 dollar increase in a household’s income, we expect to see an increase of 0.00004 years of life expectancy on average, this might seem insignificant but that actually is not the case. When holding all other variables constant and controlling for State and Year, if median household income increases by 15,000 dollars, which is about 1 Standard Deviation higher, then we predict that the life expectancy will increase by 0.6 years, on average. This is statistically significant at the 1 percent level for all three models. It makes sense for the percentage of the workforce that drives alone to work to be statistically significant at the 1 percent level. The stress of driving, potential lack of physical activity, and negative affects of loneliness and isolation from social interaction could be reasons why driving alone might negatively affect life expectancy. The number of homicides in each county seems to have the most consistently negative impact on life expectancy. For all five models, the coefficient for homicides was consistently negative, with a coefficient ranging between -0.17 and -0.101. These results should make sense; areas with higher numbers of homicides should anticipate lower life expectancy. In addition to the existing hazards of life, one must also consider the possibility of random acts of deadly violent crime.

The percentage of adults who smoke is not particularly useful when we hold county fixed, but for the other three models, every additional percentage point substantially decreases life expectancy. The largest coefficient comes from when we hold states fixed, -0.44, which is statistically significant at the 1 percent level. It is well known that smoking is detrimental to one’s health, and one would expect that the percentage of the population in a county who excessively drinks would also be negatively correlated with life expectancy. However, when we examine the three models where excDrinking is statistically significant at the 1 percent level, it surprisingly shows a positive coefficient greater than 0.1. To try to explain why we found this, let us take a look at how County Health Rankings data collects information on excessive drinking. According to their website description, they define heavy drinking as “a woman drinking more than one drink on average per day or a man drinking more than two drinks on average per day.” With this definition in mind, it changes the perspective from solely focusing on adults with drinking problems to encompassing possibilities such as party towns, college environments, and individuals who may have two glasses of wine at dinner every evening.

In conclusion, we found that when we controlled for state, all of the coefficients were statistically significant at the 1 percent level, suggesting that controlling for state enables us to gain the best insight into the relationship between life expectancy and our explanatory variables at the state level. We also found that when we controlled for county, the demographic variables were more important than in the other models. Regardless of what we controlled for, the number of homicides always seems to have a negative effect on the average life expectancy.