5 Appendix

5.1 Additional Table:

This table shows the 3 OLS models we used when exploring whether or not to include homicides.

Regression Models
Dependent variable:
lifeExp
OLS 1 Ols 2 Ols 3
pctWhite 0.020*** -0.014*** 0.019***
p = 0.000 p = 0.000 p = 0.000
pctFemale -0.144*** -0.015 -0.151***
p = 0.000 p = 0.179 p = 0.000
medHincome 0.00005*** 0.00004*** 0.0001***
p = 0.000 p = 0.000 p = 0.000
adltSmoking -0.331*** -0.290*** -0.309***
p = 0.000 p = 0.000 p = 0.000
uninsured 0.016*** -0.012** 0.006
p = 0.0001 p = 0.016 p = 0.344
drvAlone -0.070*** -0.040*** -0.083***
p = 0.000 p = 0.000 p = 0.000
excDrinking 0.165*** 0.139*** 0.156***
p = 0.000 p = 0.000 p = 0.000
homicides -0.170***
p = 0.000
Constant 89.247*** 84.206*** 90.374***
p = 0.000 p = 0.000 p = 0.000
Observations 12,285 5,180 7,105
R2 0.605 0.786 0.519
Adjusted R2 0.605 0.786 0.518
Note: P-values reported in parentheses, *p<0.1;**p<0.05;***p<0.01

5.2 Use of Artifical Intelligence:

For some areas in this paper, we used ChatGPT to assist us.

ME:

How do i only get 3 significant figures in r? This is what I have:

merged_data <- merged_data %>% mutate(loss = Totpop_total - Totpop_loss, percent_loss = (loss / Totpop_total) * 100) %>% options(scipen = 3) merged_data

ChatGPT: merged_data <- merged_data %>% mutate(loss = Totpop_total - Totpop_loss, percent_loss = round((loss / Totpop_total) * 100, 3))

merged_data

ME:

how do I do this without creating new variables?

panel_data <- panel_data %>% 
  mutate(
    (v009_rawvalue) * 100,  
    (v049_rawvalue) * 100, 
    (v085_rawvalue) * 100, 
    (v004_rawvalue) * 100,
    (v147_rawvalue) * 100, 
    (v063_rawvalue) * 100, 
    (v015_rawvalue) * 100, 
    (v023_rawvalue) * 100, 
    (v069_rawvalue) * 100, 
    (v124_rawvalue) * 100, 
    (v067_rawvalue) * 100,
    (v051_rawvalue) * 100,
    (v081_rawvalue) * 100, 
    (v056_rawvalue) * 100, 
    (v126_rawvalue) * 100, 
    (v054_rawvalue) * 100,
    (v057_rawvalue) * 100
  )

ChatGPT:

To avoid creating new variables and perform the desired operation in the same data frame mutation, you can use mutate_at() or mutate(across()) functions in dplyr. Here’s how you can rewrite your code using mutate(across()):

r Copy code library(dplyr)

panel_data <- panel_data %>% mutate(across( starts_with(“v”), ~ . * 100 )) In this code:

starts_with(“v”) selects all columns that start with “v”.
. * 100 applies the transformation (multiplying each selected column by 100) using a lambda function. This way, you avoid creating new variables and perform the operation directly within the mutate() function.

ME:

how do I skip a var that starts with v?

panel_data <- panel_data %>% mutate(across( starts_with(“v”), ~ . * 100 ))

ChatGPT:

If you want to skip variables that start with “v” while using mutate(across()), you can use the - operator in combination with starts_with() to negate the selection. Here’s how you can do it:

r Copy code panel_data <- panel_data %>% mutate(across( .cols = -starts_with(“v”), .fns = ~ . * 100 )) In this code:

.cols = -starts_with(“v”) specifies that you want to apply the function to all columns except those that start with “v”
. * 100 applies the transformation (multiplying each selected column by 100) using a lambda function. This way, the transformation will be applied to all columns except those starting with “v”. Adjust it according to your specific requirements and column names.

ME:

how do I get a vector of the nas in each colmun of the data?

ChatGPT:

To get a vector of the number of missing values (NAs) in each column of your data, you can use various programming languages such as Python or R. Here’s how you can do it in both languages:

Python (using Pandas): python Copy code import pandas as pd

Assuming your data is stored in a DataFrame called df missing_values_per_column = df.isnull().sum() print(missing_values_per_column) This will give you a Pandas Series where the index represents the column names and the values represent the number of missing values in each column.

R: r Copy code Assuming your data is stored in a data frame called df missing_values_per_column <- colSums(is.na(df)) print(missing_values_per_column) This will give you a numeric vector where each element represents the number of missing values in the corresponding column of your data frame.

Choose the code snippet according to the programming language you’re using.

ME:

I wanna subract the total from these two codes and see the total for each year

PopLossByHomi <- panel_data %>% filter(!is.na(homicides)) %>% group_by(year) %>% summarize(Totpop = sum(totPop, na.rm = TRUE)) PopLossByHomi

Totalpop <- panel_data %>% group_by(year) %>% summarize(Totpop = sum(totPop, na.rm = TRUE)) Totalpop

ChatGPT:

To subtract the total population obtained from the first code block (PopLossByHomi) from the total population obtained from the second code block (Totalpop) for each year, you can merge these two data frames by the ‘year’ column and then calculate the difference. Here’s how you can do it:

R Copy code library(dplyr)

Calculate total population without NA homicides PopLossByHomi <- panel_data %>% filter(!is.na(homicides)) %>% group_by(year) %>% summarize(Totpop = sum(totPop, na.rm = TRUE))

Calculate total population including NA homicides Totalpop <- panel_data %>% group_by(year) %>% summarize(Totpop = sum(totPop, na.rm = TRUE))

Merge the two data frames by the ‘year’ column merged_data <- merge(PopLossByHomi, Totalpop, by = “year”, suffixes = c(“_loss”, “_total”))

Calculate the difference between the total population and population without NA homicides merged_data <- merged_data %>% mutate(diff = Totpop_total - Totpop_loss)

View the resulting data frame print(merged_data) This code will give you a data frame containing the year, the total population without NA homicides, the total population including NA homicides, and the difference between the two for each year.