The Geography of Happiness: A Global Analysis

Python
EDA
Visualization
Exploring world happiness scores using Python and Plotly. PART - 1
Author

Hakki

Published

January 30, 2026

The World Happiness Report surveys over 150 countries, measuring factors like GDP, health, social support, freedom, trust, and generosity. This analysis uses data from 2015. In this analysis, I examine how these factors relate to happiness scores and look at both expected patterns and surprising deviations across regions. ***
Data Source: This analysis uses the World Happiness Report dataset, provided by the Sustainable Development Solutions Network and curated by Abigail Larion on Kaggle. Licensed under CC0.

Code
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "notebook"
Code
df_2015 = pd.read_csv("2015.csv")
df_2015.head()
Country Region Happiness Rank Happiness Score Standard Error Economy (GDP per Capita) Family Health (Life Expectancy) Freedom Trust (Government Corruption) Generosity Dystopia Residual
0 Switzerland Western Europe 1 7.587 0.03411 1.39651 1.34951 0.94143 0.66557 0.41978 0.29678 2.51738
1 Iceland Western Europe 2 7.561 0.04884 1.30232 1.40223 0.94784 0.62877 0.14145 0.43630 2.70201
2 Denmark Western Europe 3 7.527 0.03328 1.32548 1.36058 0.87464 0.64938 0.48357 0.34139 2.49204
3 Norway Western Europe 4 7.522 0.03880 1.45900 1.33095 0.88521 0.66973 0.36503 0.34699 2.46531
4 Canada North America 5 7.427 0.03553 1.32629 1.32261 0.90563 0.63297 0.32957 0.45811 2.45176

I explored the dataset to better understand its structure and content.

Code
df_2015.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 158 entries, 0 to 157
Data columns (total 12 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Country                        158 non-null    object 
 1   Region                         158 non-null    object 
 2   Happiness Rank                 158 non-null    int64  
 3   Happiness Score                158 non-null    float64
 4   Standard Error                 158 non-null    float64
 5   Economy (GDP per Capita)       158 non-null    float64
 6   Family                         158 non-null    float64
 7   Health (Life Expectancy)       158 non-null    float64
 8   Freedom                        158 non-null    float64
 9   Trust (Government Corruption)  158 non-null    float64
 10  Generosity                     158 non-null    float64
 11  Dystopia Residual              158 non-null    float64
dtypes: float64(9), int64(1), object(2)
memory usage: 14.9+ KB

I checked the dataset for missing values and verified the data types. There are no null values, and all columns are of the same length.

Code
df_2015 = df_2015.rename(columns={"Happiness Rank":"Rank","Happiness Score":"Score","Standard Error":"SE","Economy (GDP per Capita)":"GDP","Health (Life Expectancy)":"Health","Trust (Government Corruption)":"Trust","Dystopia Residual":"DR"}) # Renaming columns for easier access
df_2015["Year"] = 2015
df_2015.head(10)
Country Region Rank Score SE GDP Family Health Freedom Trust Generosity DR Year
0 Switzerland Western Europe 1 7.587 0.03411 1.39651 1.34951 0.94143 0.66557 0.41978 0.29678 2.51738 2015
1 Iceland Western Europe 2 7.561 0.04884 1.30232 1.40223 0.94784 0.62877 0.14145 0.43630 2.70201 2015
2 Denmark Western Europe 3 7.527 0.03328 1.32548 1.36058 0.87464 0.64938 0.48357 0.34139 2.49204 2015
3 Norway Western Europe 4 7.522 0.03880 1.45900 1.33095 0.88521 0.66973 0.36503 0.34699 2.46531 2015
4 Canada North America 5 7.427 0.03553 1.32629 1.32261 0.90563 0.63297 0.32957 0.45811 2.45176 2015
5 Finland Western Europe 6 7.406 0.03140 1.29025 1.31826 0.88911 0.64169 0.41372 0.23351 2.61955 2015
6 Netherlands Western Europe 7 7.378 0.02799 1.32944 1.28017 0.89284 0.61576 0.31814 0.47610 2.46570 2015
7 Sweden Western Europe 8 7.364 0.03157 1.33171 1.28907 0.91087 0.65980 0.43844 0.36262 2.37119 2015
8 New Zealand Australia and New Zealand 9 7.286 0.03371 1.25018 1.31967 0.90837 0.63938 0.42922 0.47501 2.26425 2015
9 Australia Australia and New Zealand 10 7.284 0.04083 1.33358 1.30923 0.93156 0.65124 0.35637 0.43562 2.26646 2015

Column names were renamed for better readability, and a Year column was added for future comparisons with other years.

Code
print(df_2015.groupby('Region')['Score'].mean().sort_values(ascending=False)) # Average happiness score by region
Region
Australia and New Zealand          7.285000
North America                      7.273000
Western Europe                     6.689619
Latin America and Caribbean        6.144682
Eastern Asia                       5.626167
Middle East and Northern Africa    5.406900
Central and Eastern Europe         5.332931
Southeastern Asia                  5.317444
Southern Asia                      4.580857
Sub-Saharan Africa                 4.202800
Name: Score, dtype: float64

Although Western Europe dominates the Top 10 list with seven countries, North America and Australia/New Zealand have higher regional averages.

Code
region_summary = df_2015.groupby('Region')['Score'].agg(['count', 'mean', 'median']).sort_values(by='median')
print(region_summary)
                                 count      mean  median
Region                                                  
Sub-Saharan Africa                  40  4.202800   4.272
Southern Asia                        7  4.580857   4.565
Middle East and Northern Africa     20  5.406900   5.262
Central and Eastern Europe          29  5.332931   5.286
Southeastern Asia                    9  5.317444   5.360
Eastern Asia                         6  5.626167   5.729
Latin America and Caribbean         22  6.144682   6.149
Western Europe                      21  6.689619   6.937
North America                        2  7.273000   7.273
Australia and New Zealand            2  7.285000   7.285

The regional summary shows that North America and Australia/New Zealand each include only two countries, which explains their high average and median scores.


Next, I use a correlation matrix to explore how each variable relates to the happiness score.

Code
numeric_data_2015 = df_2015.select_dtypes(include=['number']).drop(columns='Year') # Exclude 'Year' for correlation analysis
corr_matrix = numeric_data_2015.corr()
corr_matrix
Rank Score SE GDP Family Health Freedom Trust Generosity DR
Rank 1.000000 -0.992105 0.158516 -0.785267 -0.733644 -0.735613 -0.556886 -0.372315 -0.160142 -0.521999
Score -0.992105 1.000000 -0.177254 0.780966 0.740605 0.724200 0.568211 0.395199 0.180319 0.530474
SE 0.158516 -0.177254 1.000000 -0.217651 -0.120728 -0.310287 -0.129773 -0.178325 -0.088439 0.083981
GDP -0.785267 0.780966 -0.217651 1.000000 0.645299 0.816478 0.370300 0.307885 -0.010465 0.040059
Family -0.733644 0.740605 -0.120728 0.645299 1.000000 0.531104 0.441518 0.205605 0.087513 0.148117
Health -0.735613 0.724200 -0.310287 0.816478 0.531104 1.000000 0.360477 0.248335 0.108335 0.018979
Freedom -0.556886 0.568211 -0.129773 0.370300 0.441518 0.360477 1.000000 0.493524 0.373916 0.062783
Trust -0.372315 0.395199 -0.178325 0.307885 0.205605 0.248335 0.493524 1.000000 0.276123 -0.033105
Generosity -0.160142 0.180319 -0.088439 -0.010465 0.087513 0.108335 0.373916 0.276123 1.000000 -0.101301
DR -0.521999 0.530474 0.083981 0.040059 0.148117 0.018979 0.062783 -0.033105 -0.101301 1.000000
Code
fig = px.imshow(corr_matrix, text_auto=True, width=800, height=800)
fig.show()

The correlation analysis shows that GDP, family support, and health have the strongest positive relationships with happiness scores.

Code
fig = px.scatter(x=df_2015['GDP'],y=df_2015['Score'], labels={'x':'GDP','y':'Happiness Score'}, color=df_2015['Region'],hover_name=df_2015['Country'],trendline='ols', trendline_scope='overall')
fig.show()

This scatter plot illustrates the relationship between GDP and happiness scores, showing a clear positive association.

Code
fig = px.box(df_2015,y='Region', x='Score', color='Region', points='all', hover_data='Country')
fig.update_layout(
    yaxis={
        'showticklabels': False, # Hide the y-axis labels
    },
    showlegend=True
)
fig.show()

We cannot draw meaningful conclusions for North America and Australia/New Zealand due to the small number of countries. However, Western Europe and MENA show more interesting boxplot patterns. In the MENA region, there appear to be two distinct groups: countries such as Israel, the UAE, and Oman, and countries such as Syria, Yemen, and Egypt. The median is skewed toward the lower end, which is likely influenced by countries like Yemen and Syria.
Similar differences can also be observed in Western Europe, where Nordic countries contrast with countries such as Greece and Portugal. In this region, the median is skewed toward the higher end.
When we consider this plot together with the previous one, we can see that Latin American countries have GDP levels similar to those of Central and Eastern Europe and the MENA region, yet their happiness scores are comparable to those of Western European countries. This suggests that factors other than GDP may be influencing happiness, so it is worth examining other correlations.
***

Code
fig = px.box(df_2015,y='Region', x='DR', color='Region', points='all', hover_data='Country')
fig.update_layout(
    yaxis={
        'showticklabels': False, # Hide the y-axis labels
    },
    showlegend=True
)
fig.show()

When I examined the other correlations, I observed that Latin American countries have higher scores in the Dystopia Residual compared to other regions. After researching this further, I found that this phenomenon is discussed in the literature as the “Latin America Happiness Paradox.”

More information on this concept can be found below.

https://www.happinessandwellbeing.org/rojas

https://www.mappmagazine.com/articles/the-well-being-paradox

Code
top_scorers = df_2015[df_2015['Score'] > df_2015['Score'].mean()]
top_scorers.sort_values(by='DR').head() # Sorting to see countries with lowest Dystopia Residual among top scorers
Country Region Rank Score SE GDP Family Health Freedom Trust Generosity DR Year
71 Hong Kong Eastern Asia 72 5.474 0.05051 1.38604 1.05818 1.01328 0.59608 0.37124 0.39478 0.65429 2015
27 Qatar Middle East and Northern Africa 28 6.611 0.06257 1.69042 1.07860 0.79733 0.64040 0.52208 0.32573 1.55674 2015
72 Estonia Central and Eastern Europe 73 5.429 0.04013 1.15174 1.22791 0.77361 0.44888 0.15184 0.08680 1.58782 2015
65 North Cyprus Western Europe 66 5.695 0.05635 1.20806 1.07008 0.92356 0.49027 0.14280 0.26169 1.59888 2015
54 Slovenia Central and Eastern Europe 55 5.848 0.04251 1.18498 1.27385 0.87337 0.60855 0.03787 0.25328 1.61583 2015

Are they less happy than expected given their measured factors?

Top scoring countries happiness is mostly explained by measurable factors like GDP and Health. On the other hand, Latin American countries have high ‘residual’ scores. This shows that there are other factors making them happy that we cannot measure with these 6 variables.

Next, I focus on identifying countries with high happiness scores despite low levels of trust and freedom.

Code
fig = px.scatter(df_2015, x='Freedom', y='Trust', size='Score', color='Region', hover_name='Country', trendline='ols', trendline_scope='overall')
fig.show()

This is interesting: the bubbles for Central and Eastern Europe and Sub-Saharan Africa appear intertwined. The bubble sizes vary due to other factors. Rwanda, in particular, stands out among the Sub-Saharan African countries as a small bubble in the upper-right corner. This raises questions about potential data quality or country-specific measurement effects. *** Data Source: This analysis uses the World Happiness Report dataset, provided by the Sustainable Development Solutions Network and curated by Abigail Larion on Kaggle. Licensed under CC0.