The World Happiness Report 2020 is the eighth World Happiness Report, an annual publication which interprets a wide variety of data, primarily from the Gallup World Poll, about self-reported happiness and social, economic, and environmental factors in 156 countries (Helliwell, Layard, Sachs, and De Neve, 2020). In this paper, I will explore some of the raw data used in the The World Happiness Report. The main variables I will focus on are:
Outcome/Dependent Variable
(Helliwell, Layard, Sachs, and De Neve, 2020, p. 19)
Predictor/Independent Variables
Logged GDP per capita: The natural log of GDP per capita in terms of Purchasing Power Parity (PPP) adjusted to constant 2011 international dollars. Since GDP data for 2019 was not available at the time of the report, country-specific forecasts of GDP growth were used after adjusting for population growth.
Social support: The national average of binary responses (0 = no, 1 = yes) to the Gallup World Poll question, “If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not?”
Healthy life expectancy: The national average expected number of years of life spent in good health from birth.
Freedom to make life choices: The national average of binary responses to the Gallup World Poll question, “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?”
Perceptions of corruption: The national average of binary answers to two Gallup World Poll questions, “Is corruption widespread throughout the government or not?” and “Is corruption widespread within businesses or not?” Where data for government corruption are missing, the perception of business corruption is used as the overall corruption-perception measure instead.
(Helliwell, Layard, Sachs, and De Neve, 2020, p. 22)
I will construct a multiple linear regression line and test for significant predictors to find out which of these variables, if any, significantly affect the ladder score.
happydata <- read_csv("2020.csv")
variables_only_data_frame <- data.frame(Ladder_score = happydata$Ladder_score, Logged_GDP_per_capita = happydata$Logged_GDP_per_capita, Social_support = happydata$Social_support, Healthy_life_expectancy = happydata$Healthy_life_expectancy, Freedom_to_make_choices = happydata$Freedom_to_make_life_choices, Perceptions_of_corruption = happydata$Perceptions_of_corruption)
summary_stats <- data.frame(t(basicStats(variables_only_data_frame)[c("Mean", "Stdev", "Minimum", "Median", "Maximum", "nobs"),]))
pander(summary_stats)
Mean | Stdev | Minimum | Median | Maximum | nobs | |
---|---|---|---|---|---|---|
Ladder_score | 5.473 | 1.112 | 2.567 | 5.515 | 7.809 | 153 |
Logged_GDP_per_capita | 9.296 | 1.202 | 6.493 | 9.456 | 11.45 | 153 |
Social_support | 0.8087 | 0.1215 | 0.3195 | 0.8292 | 0.9747 | 153 |
Healthy_life_expectancy | 64.45 | 7.058 | 45.2 | 66.31 | 76.8 | 153 |
Freedom_to_make_choices | 0.7834 | 0.1178 | 0.3966 | 0.7998 | 0.975 | 153 |
Perceptions_of_corruption | 0.7331 | 0.1752 | 0.1098 | 0.7831 | 0.9356 | 153 |
ggplot(data = happydata, mapping = aes(x = happydata$Ladder_score)) +
geom_histogram(bins = 40, color = "black", fill = "lightgray") +
xlab("Ladder Score") +
scale_x_continuous(breaks = (seq(min(0), max(10), by = 1))) +
theme_bw()
The mean (standard deviation) ladder score is 5.47 (1.11). The median ladder score is 5.52.
ggplot(data = happydata, mapping = aes(x = happydata$Logged_GDP_per_capita)) +
geom_histogram(bins = 25, color = "black", fill = "lightgray") +
xlab("Logged GDP per Capita") +
theme_bw()
The mean (standard deviation) logged GDP per capita is 9.3 (1.2). The median logged GDP per capita is 9.46.
ggplot(data = happydata, mapping = aes(x = happydata$Healthy_life_expectancy)) +
geom_histogram(bins = 25, color = "black", fill = "lightgray") +
xlab("Healthy Life Expectancy (in years)") +
theme_bw()
The mean (standard deviation) healthy life expectancy is 64.45 (7.06) years. The median healthy life expectancy is 66.31 years.
ggplot(data = happydata, mapping = aes(x = happydata$Freedom_to_make_life_choices)) +
geom_histogram(bins = 25, color = "black", fill = "lightgray") +
xlab("Freedom to Make Life Choices Score") +
theme_bw()
The mean (standard deviation) freedom to make life choices score is 0.78 (0.12). The median freedom to make life choices score is 0.8.
ggplot(data = happydata, mapping = aes(x = happydata$Perceptions_of_corruption)) +
geom_histogram(bins = 25, color = "black", fill = "lightgray") +
xlab("Perceptions of Corruption Score") +
theme_bw()
The mean (standard deviation) perceptions of corruption score is 0.73 (0.18). The median perceptions of corruption score is 0.78.
happy_model <- lm(Ladder_score ~ Logged_GDP_per_capita + Social_support + Healthy_life_expectancy + Freedom_to_make_life_choices + Perceptions_of_corruption, data=happydata)
happy_coef <- coefficients(happy_model)
happy_anova <- anova(happy_model)
happy_summary <- summary(happy_model)
happy_t <- as_tibble(happy_summary[[4]])
happy_ci <- as_tibble(confint(happy_model, level=0.95))
The regression model is: \[ \hat{y} = -1.94 + 0.21_\mbox{Logged_GDP_per_capita} + 2.74_\mbox{Social_support} + 0.03_\mbox{Healthy_life_expectancy} + 1.92_\mbox{Freedom_to_make_life_choices} -0.73_\mbox{Perceptions_of_corruption}\]
Hypotheses
\(H_0: \ \beta_1 = \beta_2 = \beta_3 = \beta_4 = \beta_5 = 0\)
\(H_1: \ \mbox{at least one } \beta_i \ne 0\)
Test Statistic
\(F_0 = 86.25\).
p-value
\(p < 0.0001\).
Rejection Region
Reject if \(p < \alpha\), where \(\alpha=0.05\).
Conclusion and Interpretation
Reject \(H_0\). There is sufficient evidence to suggest that the regression line is significant.
Predictor | Estimate of \(\beta\) | 95% CI for \(\beta\) | p-value |
---|---|---|---|
Logged GDP per Capita | 0.21 | (0.05, 0.37) | p = 0.0094 |
Social Support | 2.74 | (1.43, 4.05) | p = 0.0001 |
Healthy Life Expectancy | 0.03 | (0.01, 0.06) | p = 0.0084 |
Freedom to Make Life Choices | 1.92 | (0.97, 2.88) | p = 0.0001 |
Perceptions of Corruption | -0.73 | (-1.33, -0.13) | p = 0.0182 |
GDP_p <- ggplot(happydata, aes(x = happydata$Logged_GDP_per_capita, y = happydata$Ladder_score)) +
geom_point(alpha = 0.5) +
xlab("Logged GDP per Capita") +
ylab("Ladder Score") +
theme_bw()
GDP_p
Hypotheses
\(H_0: \ \beta_1 = 0\)
\(H_1: \ \beta_1 \ne 0\)
Test Statistic
\(t_0 = 2.63\).
p-value
\(p = 0.0094\).
Rejection Region
Reject if \(p < \alpha\), where \(\alpha=0.05\).
Conclusion and Interpretation
Reject \(H_0\). There is sufficient evidence to suggest that logged GDP per capita is a significant predictor of ladder score.
healthy_p <- ggplot(happydata, aes(x = happydata$Healthy_life_expectancy, y = happydata$Ladder_score)) +
geom_point(alpha = 0.5) +
xlab("Healthy Life Expectancy") +
ylab("Ladder Score") +
theme_bw()
healthy_p
Hypotheses
\(H_0: \ \beta_3 = 0\)
\(H_1: \ \beta_3 \ne 0\)
Test Statistic
\(t_0 = 2.67\).
p-value
\(p = 0.0084\).
Rejection Region
Reject if \(p < \alpha\), where \(\alpha=0.05\).
Conclusion and Interpretation
Reject \(H_0\). There is sufficient evidence to suggest that healthy life expectancy is a significant predictor of ladder score.
freedom_p <- ggplot(happydata, aes(x = happydata$Freedom_to_make_life_choices, y = happydata$Ladder_score)) +
geom_point(alpha = 0.5) +
xlab("Freedom to Make Life Choices Score") +
ylab("Ladder Score") +
theme_bw()
freedom_p
Hypotheses
\(H_0: \ \beta_4 = 0\)
\(H_1: \ \beta_4 \ne 0\)
Test Statistic
\(t_0 = 3.97\).
p-value
\(p = 0.0001\).
Rejection Region
Reject if \(p < \alpha\), where \(\alpha=0.05\).
Conclusion and Interpretation
Reject \(H_0\). There is sufficient evidence to suggest that freedom to make life choices is a significant predictor of ladder score.
corruption_p <- ggplot(happydata, aes(x = happydata$Perceptions_of_corruption, y = happydata$Ladder_score)) +
geom_point(alpha = 0.5) +
xlab("Perceptions of Corruption") +
ylab("Ladder Score") +
theme_bw()
corruption_p
Hypotheses
\(H_0: \ \beta_5 = 0\)
\(H_1: \ \beta_5 \ne 0\)
Test Statistic
\(t_0 = -2.39\).
p-value
\(p = 0.0182\).
Rejection Region
Reject if \(p < \alpha\), where \(\alpha=0.05\).
Conclusion and Interpretation
Reject \(H_0\). There is sufficient evidence to suggest that perceptions of corruption is a significant predictor of ladder score.
The confidence interval for logged GDP per capita is (0.05, 0.37).
The confidence interval for social support is (1.43, 4.05).
The confidence interval for healthy life expectancy is (0.01, 0.06).
The confidence interval for freedom to make life choices is (0.97, 2.88).
The confidence interval for perceptions of corruption is (-1.33, -0.13).
\(R^2_\mbox{adj}=0.75\); that is, approximately 75% of the variance in ladder score is explained by the current model (logged GDP per capita, social support, healthy life expectancy, freedom to make life choices, and perceptions of corruption).
almost_sas(happy_model)
The top left graph shows no clear pattern with a curved but approximately horizontal line, satisfying the equal variance assumption. In the top right, the Q-Q plot shows that the data follows an approximate 45 degree line, satisfying the normality assumption. The histogram, in the bottom right corner, has a roughly normal shape but is a little skewed to the left. The bottom left graph shows that density also has a roughly normal shape.
The hypothesis test for significance of the regression line suggests that the regression line is significant, and furthermore, hypothesis tests for significance of individual predictors suggest that each predictor is also significant. Higher logged GDP per capita, level of social support, healthy life expectancy, and freedom to make life choices, and lower perceptions of corruption, are each associated with a higher ladder score. These predictors make up a significant portion of the current model, with approximately 75% of the variance in ladder score explained by these predictors. This data suggests that money, family and friends, good health, freedom, and trustworthy leaders are all important factors of happiness.
Helliwell, John F., Richard Layard, Jeffrey Sachs, and Jan-Emmanuel De Neve, eds. 2020. World Happiness Report 2020. New York: Sustainable Development Solutions Network.
Social Support
The mean (standard deviation) social support score is 0.81 (0.12). The median social support score is 0.83.