Introduction

   The World Happiness Report 2020 is the eighth World Happiness Report, an annual publication which interprets a wide variety of data, primarily from the Gallup World Poll, about self-reported happiness and social, economic, and environmental factors in 156 countries (Helliwell, Layard, Sachs, and De Neve, 2020). In this paper, I will explore some of the raw data used in the The World Happiness Report. The main variables I will focus on are:

Outcome/Dependent Variable

  • Ladder (happiness) score: Survey participants were asked to imagine their current position on a ladder with steps numbered from 0 to 10, with the best possible life for themselves represented at the top (step 10) and and the worst possible life for themselves represented at the bottom (step 0). The national average of the responses is used for each country.

   (Helliwell, Layard, Sachs, and De Neve, 2020, p. 19)

Predictor/Independent Variables

  • Logged GDP per capita: The natural log of GDP per capita in terms of Purchasing Power Parity (PPP) adjusted to constant 2011 international dollars. Since GDP data for 2019 was not available at the time of the report, country-specific forecasts of GDP growth were used after adjusting for population growth.

  • Social support: The national average of binary responses (0 = no, 1 = yes) to the Gallup World Poll question, “If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not?”

  • Healthy life expectancy: The national average expected number of years of life spent in good health from birth.

  • Freedom to make life choices: The national average of binary responses to the Gallup World Poll question, “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?”

  • Perceptions of corruption: The national average of binary answers to two Gallup World Poll questions, “Is corruption widespread throughout the government or not?” and “Is corruption widespread within businesses or not?” Where data for government corruption are missing, the perception of business corruption is used as the overall corruption-perception measure instead.

   (Helliwell, Layard, Sachs, and De Neve, 2020, p. 22)

   I will construct a multiple linear regression line and test for significant predictors to find out which of these variables, if any, significantly affect the ladder score.


happydata <- read_csv("2020.csv")

variables_only_data_frame <- data.frame(Ladder_score = happydata$Ladder_score, Logged_GDP_per_capita = happydata$Logged_GDP_per_capita, Social_support = happydata$Social_support, Healthy_life_expectancy = happydata$Healthy_life_expectancy, Freedom_to_make_choices = happydata$Freedom_to_make_life_choices,  Perceptions_of_corruption = happydata$Perceptions_of_corruption) 


Summary Statistics

summary_stats <- data.frame(t(basicStats(variables_only_data_frame)[c("Mean", "Stdev", "Minimum", "Median", "Maximum", "nobs"),]))

pander(summary_stats)
  Mean Stdev Minimum Median Maximum nobs
Ladder_score 5.473 1.112 2.567 5.515 7.809 153
Logged_GDP_per_capita 9.296 1.202 6.493 9.456 11.45 153
Social_support 0.8087 0.1215 0.3195 0.8292 0.9747 153
Healthy_life_expectancy 64.45 7.058 45.2 66.31 76.8 153
Freedom_to_make_choices 0.7834 0.1178 0.3966 0.7998 0.975 153
Perceptions_of_corruption 0.7331 0.1752 0.1098 0.7831 0.9356 153

Ladder Score

ggplot(data = happydata, mapping = aes(x = happydata$Ladder_score)) +
  geom_histogram(bins = 40, color = "black", fill = "lightgray") +
  xlab("Ladder Score") +
  scale_x_continuous(breaks = (seq(min(0), max(10), by = 1))) +
  theme_bw() 

   The mean (standard deviation) ladder score is 5.47 (1.11). The median ladder score is 5.52.

Logged GDP per Capita

ggplot(data = happydata, mapping = aes(x = happydata$Logged_GDP_per_capita)) +
  geom_histogram(bins = 25, color = "black", fill = "lightgray") +
  xlab("Logged GDP per Capita") +
  theme_bw() 

   The mean (standard deviation) logged GDP per capita is 9.3 (1.2). The median logged GDP per capita is 9.46.

Social Support

ggplot(data = happydata, mapping = aes(x = happydata$Social_support)) +
  geom_histogram(bins = 25, color = "black", fill = "lightgray") +
  xlab("Social Support Score") +
  theme_bw() 

   The mean (standard deviation) social support score is 0.81 (0.12). The median social support score is 0.83.

Healthy Life Expectancy

ggplot(data = happydata, mapping = aes(x = happydata$Healthy_life_expectancy)) +
  geom_histogram(bins = 25, color = "black", fill = "lightgray") +
  xlab("Healthy Life Expectancy (in years)") +
  theme_bw() 

   The mean (standard deviation) healthy life expectancy is 64.45 (7.06) years. The median healthy life expectancy is 66.31 years.

Freedom to Make Life Choices

ggplot(data = happydata, mapping = aes(x = happydata$Freedom_to_make_life_choices)) +
  geom_histogram(bins = 25, color = "black", fill = "lightgray") +
  xlab("Freedom to Make Life Choices Score") +
  theme_bw() 

   The mean (standard deviation) freedom to make life choices score is 0.78 (0.12). The median freedom to make life choices score is 0.8.

Perceptions of Corruption

ggplot(data = happydata, mapping = aes(x = happydata$Perceptions_of_corruption)) +
  geom_histogram(bins = 25, color = "black", fill = "lightgray") +
  xlab("Perceptions of Corruption Score") +
  theme_bw() 

   The mean (standard deviation) perceptions of corruption score is 0.73 (0.18). The median perceptions of corruption score is 0.78.


Multiple Linear Regression Line

happy_model <- lm(Ladder_score ~ Logged_GDP_per_capita + Social_support + Healthy_life_expectancy + Freedom_to_make_life_choices + Perceptions_of_corruption, data=happydata)
happy_coef <- coefficients(happy_model)
happy_anova <- anova(happy_model)
happy_summary <- summary(happy_model)
happy_t <- as_tibble(happy_summary[[4]])
happy_ci <- as_tibble(confint(happy_model, level=0.95))


The regression model is: \[ \hat{y} = -1.94 + 0.21_\mbox{Logged_GDP_per_capita} + 2.74_\mbox{Social_support} + 0.03_\mbox{Healthy_life_expectancy} + 1.92_\mbox{Freedom_to_make_life_choices} -0.73_\mbox{Perceptions_of_corruption}\]


Hypothesis Test for Significance of Regression Line

Hypotheses

   \(H_0: \ \beta_1 = \beta_2 = \beta_3 = \beta_4 = \beta_5 = 0\)
   \(H_1: \ \mbox{at least one } \beta_i \ne 0\)

Test Statistic

   \(F_0 = 86.25\).

p-value

   \(p < 0.0001\).

Rejection Region

   Reject if \(p < \alpha\), where \(\alpha=0.05\).

Conclusion and Interpretation

   Reject \(H_0\). There is sufficient evidence to suggest that the regression line is significant.


Hypothesis Tests for Significance of Individual Predictors


Overview

Predictor Estimate of \(\beta\) 95% CI for \(\beta\) p-value
Logged GDP per Capita 0.21 (0.05, 0.37) p = 0.0094
Social Support 2.74 (1.43, 4.05) p = 0.0001
Healthy Life Expectancy 0.03 (0.01, 0.06) p = 0.0084
Freedom to Make Life Choices 1.92 (0.97, 2.88) p = 0.0001
Perceptions of Corruption -0.73 (-1.33, -0.13) p = 0.0182


Logged GDP per Capita

GDP_p <- ggplot(happydata, aes(x = happydata$Logged_GDP_per_capita, y = happydata$Ladder_score)) + 
        geom_point(alpha = 0.5) + 
        xlab("Logged GDP per Capita") + 
        ylab("Ladder Score") +
        theme_bw()
GDP_p