Introduction

   The World Happiness Report 2020 is the eighth World Happiness Report, an annual publication which interprets a wide variety of data, primarily from the Gallup World Poll, about self-reported happiness and social, economic, and environmental factors in 156 countries (Helliwell, Layard, Sachs, and De Neve, 2020). In this paper, I will explore some of the raw data used in the The World Happiness Report. The main variables I will focus on are:

Outcome/Dependent Variable

  • Ladder (happiness) score: Survey participants were asked to imagine their current position on a ladder with steps numbered from 0 to 10, with the best possible life for themselves represented at the top (step 10) and and the worst possible life for themselves represented at the bottom (step 0). The national average of the responses is used for each country.

   (Helliwell, Layard, Sachs, and De Neve, 2020, p. 19)

Predictor/Independent Variables

  • Logged GDP per capita: The natural log of GDP per capita in terms of Purchasing Power Parity (PPP) adjusted to constant 2011 international dollars. Since GDP data for 2019 was not available at the time of the report, country-specific forecasts of GDP growth were used after adjusting for population growth.

  • Social support: The national average of binary responses (0 = no, 1 = yes) to the Gallup World Poll question, “If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not?”

  • Healthy life expectancy: The national average expected number of years of life spent in good health from birth.

  • Freedom to make life choices: The national average of binary responses to the Gallup World Poll question, “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?”

  • Perceptions of corruption: The national average of binary answers to two Gallup World Poll questions, “Is corruption widespread throughout the government or not?” and “Is corruption widespread within businesses or not?” Where data for government corruption are missing, the perception of business corruption is used as the overall corruption-perception measure instead.

   (Helliwell, Layard, Sachs, and De Neve, 2020, p. 22)

   I will construct a multiple linear regression line and test for significant predictors to find out which of these variables, if any, significantly affect the ladder score.


happydata <- read_csv("2020.csv")

variables_only_data_frame <- data.frame(Ladder_score = happydata$Ladder_score, Logged_GDP_per_capita = happydata$Logged_GDP_per_capita, Social_support = happydata$Social_support, Healthy_life_expectancy = happydata$Healthy_life_expectancy, Freedom_to_make_choices = happydata$Freedom_to_make_life_choices,  Perceptions_of_corruption = happydata$Perceptions_of_corruption) 


Summary Statistics

summary_stats <- data.frame(t(basicStats(variables_only_data_frame)[c("Mean", "Stdev", "Minimum", "Median", "Maximum", "nobs"),]))

pander(summary_stats)
  Mean Stdev Minimum Median Maximum nobs
Ladder_score 5.473 1.112 2.567 5.515 7.809 153
Logged_GDP_per_capita 9.296 1.202 6.493 9.456 11.45 153
Social_support 0.8087 0.1215 0.3195 0.8292 0.9747 153
Healthy_life_expectancy 64.45 7.058 45.2 66.31 76.8 153
Freedom_to_make_choices 0.7834 0.1178 0.3966 0.7998 0.975 153
Perceptions_of_corruption 0.7331 0.1752 0.1098 0.7831 0.9356 153

Ladder Score

ggplot(data = happydata, mapping = aes(x = happydata$Ladder_score)) +
  geom_histogram(bins = 40, color = "black", fill = "lightgray") +
  xlab("Ladder Score") +
  scale_x_continuous(breaks = (seq(min(0), max(10), by = 1))) +
  theme_bw() 

   The mean (standard deviation) ladder score is 5.47 (1.11). The median ladder score is 5.52.

Logged GDP per Capita

ggplot(data = happydata, mapping = aes(x = happydata$Logged_GDP_per_capita)) +
  geom_histogram(bins = 25, color = "black", fill = "lightgray") +
  xlab("Logged GDP per Capita") +
  theme_bw() 

   The mean (standard deviation) logged GDP per capita is 9.3 (1.2). The median logged GDP per capita is 9.46.

Social Support

ggplot(data = happydata, mapping = aes(x = happydata$Social_support)) +
  geom_histogram(bins = 25, color = "black", fill = "lightgray") +
  xlab("Social Support Score") +
  theme_bw() 

   The mean (standard deviation) social support score is 0.81 (0.12). The median social support score is 0.83.

Healthy Life Expectancy

ggplot(data = happydata, mapping = aes(x = happydata$Healthy_life_expectancy)) +
  geom_histogram(bins = 25, color = "black", fill = "lightgray") +
  xlab("Healthy Life Expectancy (in years)") +
  theme_bw() 

   The mean (standard deviation) healthy life expectancy is 64.45 (7.06) years. The median healthy life expectancy is 66.31 years.

Freedom to Make Life Choices

ggplot(data = happydata, mapping = aes(x = happydata$Freedom_to_make_life_choices)) +
  geom_histogram(bins = 25, color = "black", fill = "lightgray") +
  xlab("Freedom to Make Life Choices Score") +
  theme_bw() 

   The mean (standard deviation) freedom to make life choices score is 0.78 (0.12). The median freedom to make life choices score is 0.8.

Perceptions of Corruption

ggplot(data = happydata, mapping = aes(x = happydata$Perceptions_of_corruption)) +
  geom_histogram(bins = 25, color = "black", fill = "lightgray") +
  xlab("Perceptions of Corruption Score") +
  theme_bw() 

   The mean (standard deviation) perceptions of corruption score is 0.73 (0.18). The median perceptions of corruption score is 0.78.


Multiple Linear Regression Line

happy_model <- lm(Ladder_score ~ Logged_GDP_per_capita + Social_support + Healthy_life_expectancy + Freedom_to_make_life_choices + Perceptions_of_corruption, data=happydata)
happy_coef <- coefficients(happy_model)
happy_anova <- anova(happy_model)
happy_summary <- summary(happy_model)
happy_t <- as_tibble(happy_summary[[4]])
happy_ci <- as_tibble(confint(happy_model, level=0.95))


The regression model is: \[ \hat{y} = -1.94 + 0.21_\mbox{Logged_GDP_per_capita} + 2.74_\mbox{Social_support} + 0.03_\mbox{Healthy_life_expectancy} + 1.92_\mbox{Freedom_to_make_life_choices} -0.73_\mbox{Perceptions_of_corruption}\]


Hypothesis Test for Significance of Regression Line

Hypotheses

   \(H_0: \ \beta_1 = \beta_2 = \beta_3 = \beta_4 = \beta_5 = 0\)
   \(H_1: \ \mbox{at least one } \beta_i \ne 0\)

Test Statistic

   \(F_0 = 86.25\).

p-value

   \(p < 0.0001\).

Rejection Region

   Reject if \(p < \alpha\), where \(\alpha=0.05\).

Conclusion and Interpretation

   Reject \(H_0\). There is sufficient evidence to suggest that the regression line is significant.


Hypothesis Tests for Significance of Individual Predictors


Overview

Predictor Estimate of \(\beta\) 95% CI for \(\beta\) p-value
Logged GDP per Capita 0.21 (0.05, 0.37) p = 0.0094
Social Support 2.74 (1.43, 4.05) p = 0.0001
Healthy Life Expectancy 0.03 (0.01, 0.06) p = 0.0084
Freedom to Make Life Choices 1.92 (0.97, 2.88) p = 0.0001
Perceptions of Corruption -0.73 (-1.33, -0.13) p = 0.0182


Logged GDP per Capita

GDP_p <- ggplot(happydata, aes(x = happydata$Logged_GDP_per_capita, y = happydata$Ladder_score)) + 
        geom_point(alpha = 0.5) + 
        xlab("Logged GDP per Capita") + 
        ylab("Ladder Score") +
        theme_bw()
GDP_p

Hypotheses

   \(H_0: \ \beta_1 = 0\)
   \(H_1: \ \beta_1 \ne 0\)

Test Statistic

   \(t_0 = 2.63\).

p-value

   \(p = 0.0094\).

Rejection Region

   Reject if \(p < \alpha\), where \(\alpha=0.05\).

Conclusion and Interpretation

   Reject \(H_0\). There is sufficient evidence to suggest that logged GDP per capita is a significant predictor of ladder score.


Social Support

social_p <- ggplot(happydata, aes(x = happydata$Social_support, y = happydata$Ladder_score)) + 
        geom_point(alpha = 0.5) + 
        xlab("Social Support Score") + 
        ylab("Ladder Score") +
        theme_bw()
social_p

Hypotheses

   \(H_0: \ \beta_2 = 0\)
   \(H_1: \ \beta_2 \ne 0\)

Test Statistic

   \(t_0 = 4.14\).

p-value

   \(p = 0.0001\).

Rejection Region

   Reject if \(p < \alpha\), where \(\alpha=0.05\).

Conclusion and Interpretation

   Reject \(H_0\). There is sufficient evidence to suggest that social support is a significant predictor of ladder score.


Healthy Life Expectancy

healthy_p <- ggplot(happydata, aes(x = happydata$Healthy_life_expectancy, y = happydata$Ladder_score)) + 
        geom_point(alpha = 0.5) + 
        xlab("Healthy Life Expectancy") + 
        ylab("Ladder Score") +
        theme_bw()
healthy_p

Hypotheses

   \(H_0: \ \beta_3 = 0\)
   \(H_1: \ \beta_3 \ne 0\)

Test Statistic

   \(t_0 = 2.67\).

p-value

   \(p = 0.0084\).

Rejection Region

   Reject if \(p < \alpha\), where \(\alpha=0.05\).

Conclusion and Interpretation

   Reject \(H_0\). There is sufficient evidence to suggest that healthy life expectancy is a significant predictor of ladder score.


Freedom to Make Life Choices

freedom_p <- ggplot(happydata, aes(x = happydata$Freedom_to_make_life_choices, y = happydata$Ladder_score)) + 
        geom_point(alpha = 0.5) + 
        xlab("Freedom to Make Life Choices Score") + 
        ylab("Ladder Score") +
        theme_bw()
freedom_p

Hypotheses

   \(H_0: \ \beta_4 = 0\)
   \(H_1: \ \beta_4 \ne 0\)

Test Statistic

   \(t_0 = 3.97\).

p-value

   \(p = 0.0001\).

Rejection Region

   Reject if \(p < \alpha\), where \(\alpha=0.05\).

Conclusion and Interpretation

   Reject \(H_0\). There is sufficient evidence to suggest that freedom to make life choices is a significant predictor of ladder score.


Perceptions of Corruption

corruption_p <- ggplot(happydata, aes(x = happydata$Perceptions_of_corruption, y = happydata$Ladder_score)) + 
        geom_point(alpha = 0.5) + 
        xlab("Perceptions of Corruption") + 
        ylab("Ladder Score") +
        theme_bw()
corruption_p

Hypotheses

   \(H_0: \ \beta_5 = 0\)
   \(H_1: \ \beta_5 \ne 0\)

Test Statistic

   \(t_0 = -2.39\).

p-value

   \(p = 0.0182\).

Rejection Region

   Reject if \(p < \alpha\), where \(\alpha=0.05\).

Conclusion and Interpretation

   Reject \(H_0\). There is sufficient evidence to suggest that perceptions of corruption is a significant predictor of ladder score.


95% Confidence interval for \(\beta_i\)

   The confidence interval for logged GDP per capita is (0.05, 0.37).

   The confidence interval for social support is (1.43, 4.05).

   The confidence interval for healthy life expectancy is (0.01, 0.06).

   The confidence interval for freedom to make life choices is (0.97, 2.88).

   The confidence interval for perceptions of corruption is (-1.33, -0.13).


Adjusted R-Squared

   \(R^2_\mbox{adj}=0.75\); that is, approximately 75% of the variance in ladder score is explained by the current model (logged GDP per capita, social support, healthy life expectancy, freedom to make life choices, and perceptions of corruption).


Graphical Assessment of Assumptions

almost_sas(happy_model)

   The top left graph shows no clear pattern with a curved but approximately horizontal line, satisfying the equal variance assumption. In the top right, the Q-Q plot shows that the data follows an approximate 45 degree line, satisfying the normality assumption. The histogram, in the bottom right corner, has a roughly normal shape but is a little skewed to the left. The bottom left graph shows that density also has a roughly normal shape.


Conclusion

   The hypothesis test for significance of the regression line suggests that the regression line is significant, and furthermore, hypothesis tests for significance of individual predictors suggest that each predictor is also significant. Higher logged GDP per capita, level of social support, healthy life expectancy, and freedom to make life choices, and lower perceptions of corruption, are each associated with a higher ladder score. These predictors make up a significant portion of the current model, with approximately 75% of the variance in ladder score explained by these predictors. This data suggests that money, family and friends, good health, freedom, and trustworthy leaders are all important factors of happiness.


References

Helliwell, John F., Richard Layard, Jeffrey Sachs, and Jan-Emmanuel De Neve, eds. 2020. World Happiness Report 2020. New York: Sustainable Development Solutions Network.