The World Happiness Report 2020 is the eighth World Happiness Report, an annual publication which interprets a wide variety of data, primarily from the Gallup World Poll, about self-reported happiness and social, economic, and environmental factors in 156 countries (Helliwell, Layard, Sachs, and De Neve, 2020). In this paper, I will explore some of the raw data used in the The World Happiness Report. The main variables I will focus on are:

**Outcome/Dependent Variable**

- Ladder (happiness) score: Survey participants were asked to imagine their current position on a ladder with steps numbered from 0 to 10, with the best possible life for themselves represented at the top (step 10) and and the worst possible life for themselves represented at the bottom (step 0). The national average of the responses is used for each country.

(Helliwell, Layard, Sachs, and De Neve, 2020, p. 19)

**Predictor/Independent Variables**

Logged GDP per capita: The natural log of GDP per capita in terms of Purchasing Power Parity (PPP) adjusted to constant 2011 international dollars. Since GDP data for 2019 was not available at the time of the report, country-specific forecasts of GDP growth were used after adjusting for population growth.

Social support: The national average of binary responses (0 = no, 1 = yes) to the Gallup World Poll question, “If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not?”

Healthy life expectancy: The national average expected number of years of life spent in good health from birth.

Freedom to make life choices: The national average of binary responses to the Gallup World Poll question, “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?”

Perceptions of corruption: The national average of binary answers to two Gallup World Poll questions, “Is corruption widespread throughout the government or not?” and “Is corruption widespread within businesses or not?” Where data for government corruption are missing, the perception of business corruption is used as the overall corruption-perception measure instead.

(Helliwell, Layard, Sachs, and De Neve, 2020, p. 22)

I will construct a multiple linear regression line and test for significant predictors to find out which of these variables, if any, significantly affect the ladder score.

```
happydata <- read_csv("2020.csv")
variables_only_data_frame <- data.frame(Ladder_score = happydata$Ladder_score, Logged_GDP_per_capita = happydata$Logged_GDP_per_capita, Social_support = happydata$Social_support, Healthy_life_expectancy = happydata$Healthy_life_expectancy, Freedom_to_make_choices = happydata$Freedom_to_make_life_choices, Perceptions_of_corruption = happydata$Perceptions_of_corruption)
```

```
summary_stats <- data.frame(t(basicStats(variables_only_data_frame)[c("Mean", "Stdev", "Minimum", "Median", "Maximum", "nobs"),]))
pander(summary_stats)
```

Mean | Stdev | Minimum | Median | Maximum | nobs | |
---|---|---|---|---|---|---|

Ladder_score |
5.473 | 1.112 | 2.567 | 5.515 | 7.809 | 153 |

Logged_GDP_per_capita |
9.296 | 1.202 | 6.493 | 9.456 | 11.45 | 153 |

Social_support |
0.8087 | 0.1215 | 0.3195 | 0.8292 | 0.9747 | 153 |

Healthy_life_expectancy |
64.45 | 7.058 | 45.2 | 66.31 | 76.8 | 153 |

Freedom_to_make_choices |
0.7834 | 0.1178 | 0.3966 | 0.7998 | 0.975 | 153 |

Perceptions_of_corruption |
0.7331 | 0.1752 | 0.1098 | 0.7831 | 0.9356 | 153 |

```
ggplot(data = happydata, mapping = aes(x = happydata$Ladder_score)) +
geom_histogram(bins = 40, color = "black", fill = "lightgray") +
xlab("Ladder Score") +
scale_x_continuous(breaks = (seq(min(0), max(10), by = 1))) +
theme_bw()
```

The mean (standard deviation) ladder score is 5.47 (1.11). The median ladder score is 5.52.

```
ggplot(data = happydata, mapping = aes(x = happydata$Logged_GDP_per_capita)) +
geom_histogram(bins = 25, color = "black", fill = "lightgray") +
xlab("Logged GDP per Capita") +
theme_bw()
```

The mean (standard deviation) logged GDP per capita is 9.3 (1.2). The median logged GDP per capita is 9.46.

```
ggplot(data = happydata, mapping = aes(x = happydata$Healthy_life_expectancy)) +
geom_histogram(bins = 25, color = "black", fill = "lightgray") +
xlab("Healthy Life Expectancy (in years)") +
theme_bw()
```

The mean (standard deviation) healthy life expectancy is 64.45 (7.06) years. The median healthy life expectancy is 66.31 years.

```
ggplot(data = happydata, mapping = aes(x = happydata$Freedom_to_make_life_choices)) +
geom_histogram(bins = 25, color = "black", fill = "lightgray") +
xlab("Freedom to Make Life Choices Score") +
theme_bw()
```

The mean (standard deviation) freedom to make life choices score is 0.78 (0.12). The median freedom to make life choices score is 0.8.

```
ggplot(data = happydata, mapping = aes(x = happydata$Perceptions_of_corruption)) +
geom_histogram(bins = 25, color = "black", fill = "lightgray") +
xlab("Perceptions of Corruption Score") +
theme_bw()
```

The mean (standard deviation) perceptions of corruption score is 0.73 (0.18). The median perceptions of corruption score is 0.78.

```
happy_model <- lm(Ladder_score ~ Logged_GDP_per_capita + Social_support + Healthy_life_expectancy + Freedom_to_make_life_choices + Perceptions_of_corruption, data=happydata)
happy_coef <- coefficients(happy_model)
happy_anova <- anova(happy_model)
happy_summary <- summary(happy_model)
happy_t <- as_tibble(happy_summary[[4]])
happy_ci <- as_tibble(confint(happy_model, level=0.95))
```

The regression model is: \[ \hat{y} = -1.94 + 0.21_\mbox{Logged_GDP_per_capita} + 2.74_\mbox{Social_support} + 0.03_\mbox{Healthy_life_expectancy} + 1.92_\mbox{Freedom_to_make_life_choices} -0.73_\mbox{Perceptions_of_corruption}\]

**Hypotheses**

\(H_0: \ \beta_1 = \beta_2 = \beta_3 = \beta_4 = \beta_5 = 0\)

\(H_1: \ \mbox{at least one } \beta_i \ne 0\)

**Test Statistic**

\(F_0 = 86.25\).

*p*-value

\(p < 0.0001\).

**Rejection Region**

Reject if \(p < \alpha\), where \(\alpha=0.05\).

**Conclusion and Interpretation**

Reject \(H_0\). There is sufficient evidence to suggest that the regression line is significant.

Predictor | Estimate of \(\beta\) | 95% CI for \(\beta\) | p-value |
---|---|---|---|

Logged GDP per Capita | 0.21 | (0.05, 0.37) | p = 0.0094 |

Social Support | 2.74 | (1.43, 4.05) | p = 0.0001 |

Healthy Life Expectancy | 0.03 | (0.01, 0.06) | p = 0.0084 |

Freedom to Make Life Choices | 1.92 | (0.97, 2.88) | p = 0.0001 |

Perceptions of Corruption | -0.73 | (-1.33, -0.13) | p = 0.0182 |

```
GDP_p <- ggplot(happydata, aes(x = happydata$Logged_GDP_per_capita, y = happydata$Ladder_score)) +
geom_point(alpha = 0.5) +
xlab("Logged GDP per Capita") +
ylab("Ladder Score") +
theme_bw()
GDP_p
```

## Social Support

The mean (standard deviation) social support score is 0.81 (0.12). The median social support score is 0.83.