The Geography of Happiness: A Global Analysis

Python

EDA

Visualization

Exploring world happiness scores using Python and Plotly. PART - 1

Author

Hakki

Published

January 30, 2026

The World Happiness Report surveys over 150 countries, measuring factors like GDP, health, social support, freedom, trust, and generosity. This analysis uses data from 2015. In this analysis, I examine how these factors relate to happiness scores and look at both expected patterns and surprising deviations across regions. ***
Data Source: This analysis uses the World Happiness Report dataset, provided by the Sustainable Development Solutions Network and curated by Abigail Larion on Kaggle. Licensed under CC0.

Code

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "notebook"

Code

df_2015 = pd.read_csv("2015.csv")
df_2015.head()

	Country	Region	Happiness Rank	Happiness Score	Standard Error	Economy (GDP per Capita)	Family	Health (Life Expectancy)	Freedom	Trust (Government Corruption)	Generosity	Dystopia Residual
0	Switzerland	Western Europe	1	7.587	0.03411	1.39651	1.34951	0.94143	0.66557	0.41978	0.29678	2.51738
1	Iceland	Western Europe	2	7.561	0.04884	1.30232	1.40223	0.94784	0.62877	0.14145	0.43630	2.70201
2	Denmark	Western Europe	3	7.527	0.03328	1.32548	1.36058	0.87464	0.64938	0.48357	0.34139	2.49204
3	Norway	Western Europe	4	7.522	0.03880	1.45900	1.33095	0.88521	0.66973	0.36503	0.34699	2.46531
4	Canada	North America	5	7.427	0.03553	1.32629	1.32261	0.90563	0.63297	0.32957	0.45811	2.45176

I explored the dataset to better understand its structure and content.

Code

df_2015.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 158 entries, 0 to 157
Data columns (total 12 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Country                        158 non-null    object 
 1   Region                         158 non-null    object 
 2   Happiness Rank                 158 non-null    int64  
 3   Happiness Score                158 non-null    float64
 4   Standard Error                 158 non-null    float64
 5   Economy (GDP per Capita)       158 non-null    float64
 6   Family                         158 non-null    float64
 7   Health (Life Expectancy)       158 non-null    float64
 8   Freedom                        158 non-null    float64
 9   Trust (Government Corruption)  158 non-null    float64
 10  Generosity                     158 non-null    float64
 11  Dystopia Residual              158 non-null    float64
dtypes: float64(9), int64(1), object(2)
memory usage: 14.9+ KB

I checked the dataset for missing values and verified the data types. There are no null values, and all columns are of the same length.

Code

df_2015 = df_2015.rename(columns={"Happiness Rank":"Rank","Happiness Score":"Score","Standard Error":"SE","Economy (GDP per Capita)":"GDP","Health (Life Expectancy)":"Health","Trust (Government Corruption)":"Trust","Dystopia Residual":"DR"}) # Renaming columns for easier access
df_2015["Year"] = 2015
df_2015.head(10)

	Country	Region	Rank	Score	SE	GDP	Family	Health	Freedom	Trust	Generosity	DR	Year
0	Switzerland	Western Europe	1	7.587	0.03411	1.39651	1.34951	0.94143	0.66557	0.41978	0.29678	2.51738	2015
1	Iceland	Western Europe	2	7.561	0.04884	1.30232	1.40223	0.94784	0.62877	0.14145	0.43630	2.70201	2015
2	Denmark	Western Europe	3	7.527	0.03328	1.32548	1.36058	0.87464	0.64938	0.48357	0.34139	2.49204	2015
3	Norway	Western Europe	4	7.522	0.03880	1.45900	1.33095	0.88521	0.66973	0.36503	0.34699	2.46531	2015
4	Canada	North America	5	7.427	0.03553	1.32629	1.32261	0.90563	0.63297	0.32957	0.45811	2.45176	2015
5	Finland	Western Europe	6	7.406	0.03140	1.29025	1.31826	0.88911	0.64169	0.41372	0.23351	2.61955	2015
6	Netherlands	Western Europe	7	7.378	0.02799	1.32944	1.28017	0.89284	0.61576	0.31814	0.47610	2.46570	2015
7	Sweden	Western Europe	8	7.364	0.03157	1.33171	1.28907	0.91087	0.65980	0.43844	0.36262	2.37119	2015
8	New Zealand	Australia and New Zealand	9	7.286	0.03371	1.25018	1.31967	0.90837	0.63938	0.42922	0.47501	2.26425	2015
9	Australia	Australia and New Zealand	10	7.284	0.04083	1.33358	1.30923	0.93156	0.65124	0.35637	0.43562	2.26646	2015

Column names were renamed for better readability, and a Year column was added for future comparisons with other years.

Code

print(df_2015.groupby('Region')['Score'].mean().sort_values(ascending=False)) # Average happiness score by region

Region
Australia and New Zealand          7.285000
North America                      7.273000
Western Europe                     6.689619
Latin America and Caribbean        6.144682
Eastern Asia                       5.626167
Middle East and Northern Africa    5.406900
Central and Eastern Europe         5.332931
Southeastern Asia                  5.317444
Southern Asia                      4.580857
Sub-Saharan Africa                 4.202800
Name: Score, dtype: float64

Although Western Europe dominates the Top 10 list with seven countries, North America and Australia/New Zealand have higher regional averages.

Code

region_summary = df_2015.groupby('Region')['Score'].agg(['count', 'mean', 'median']).sort_values(by='median')
print(region_summary)

                                 count      mean  median
Region                                                  
Sub-Saharan Africa                  40  4.202800   4.272
Southern Asia                        7  4.580857   4.565
Middle East and Northern Africa     20  5.406900   5.262
Central and Eastern Europe          29  5.332931   5.286
Southeastern Asia                    9  5.317444   5.360
Eastern Asia                         6  5.626167   5.729
Latin America and Caribbean         22  6.144682   6.149
Western Europe                      21  6.689619   6.937
North America                        2  7.273000   7.273
Australia and New Zealand            2  7.285000   7.285

The regional summary shows that North America and Australia/New Zealand each include only two countries, which explains their high average and median scores.

Next, I use a correlation matrix to explore how each variable relates to the happiness score.

Code

numeric_data_2015 = df_2015.select_dtypes(include=['number']).drop(columns='Year') # Exclude 'Year' for correlation analysis
corr_matrix = numeric_data_2015.corr()
corr_matrix

	Rank	Score	SE	GDP	Family	Health	Freedom	Trust	Generosity	DR
Rank	1.000000	-0.992105	0.158516	-0.785267	-0.733644	-0.735613	-0.556886	-0.372315	-0.160142	-0.521999
Score	-0.992105	1.000000	-0.177254	0.780966	0.740605	0.724200	0.568211	0.395199	0.180319	0.530474
SE	0.158516	-0.177254	1.000000	-0.217651	-0.120728	-0.310287	-0.129773	-0.178325	-0.088439	0.083981
GDP	-0.785267	0.780966	-0.217651	1.000000	0.645299	0.816478	0.370300	0.307885	-0.010465	0.040059
Family	-0.733644	0.740605	-0.120728	0.645299	1.000000	0.531104	0.441518	0.205605	0.087513	0.148117
Health	-0.735613	0.724200	-0.310287	0.816478	0.531104	1.000000	0.360477	0.248335	0.108335	0.018979
Freedom	-0.556886	0.568211	-0.129773	0.370300	0.441518	0.360477	1.000000	0.493524	0.373916	0.062783
Trust	-0.372315	0.395199	-0.178325	0.307885	0.205605	0.248335	0.493524	1.000000	0.276123	-0.033105
Generosity	-0.160142	0.180319	-0.088439	-0.010465	0.087513	0.108335	0.373916	0.276123	1.000000	-0.101301
DR	-0.521999	0.530474	0.083981	0.040059	0.148117	0.018979	0.062783	-0.033105	-0.101301	1.000000

Code

fig = px.imshow(corr_matrix, text_auto=True, width=800, height=800)
fig.show()

The correlation analysis shows that GDP, family support, and health have the strongest positive relationships with happiness scores.

Code

fig = px.scatter(x=df_2015['GDP'],y=df_2015['Score'], labels={'x':'GDP','y':'Happiness Score'}, color=df_2015['Region'],hover_name=df_2015['Country'],trendline='ols', trendline_scope='overall')
fig.show()

This scatter plot illustrates the relationship between GDP and happiness scores, showing a clear positive association.

Code

fig = px.box(df_2015,y='Region', x='Score', color='Region', points='all', hover_data='Country')
fig.update_layout(
    yaxis={
        'showticklabels': False, # Hide the y-axis labels
    },
    showlegend=True
)
fig.show()

We cannot draw meaningful conclusions for North America and Australia/New Zealand due to the small number of countries. However, Western Europe and MENA show more interesting boxplot patterns. In the MENA region, there appear to be two distinct groups: countries such as Israel, the UAE, and Oman, and countries such as Syria, Yemen, and Egypt. The median is skewed toward the lower end, which is likely influenced by countries like Yemen and Syria.
Similar differences can also be observed in Western Europe, where Nordic countries contrast with countries such as Greece and Portugal. In this region, the median is skewed toward the higher end.
When we consider this plot together with the previous one, we can see that Latin American countries have GDP levels similar to those of Central and Eastern Europe and the MENA region, yet their happiness scores are comparable to those of Western European countries. This suggests that factors other than GDP may be influencing happiness, so it is worth examining other correlations.
***

Code

fig = px.box(df_2015,y='Region', x='DR', color='Region', points='all', hover_data='Country')
fig.update_layout(
    yaxis={
        'showticklabels': False, # Hide the y-axis labels
    },
    showlegend=True
)
fig.show()

When I examined the other correlations, I observed that Latin American countries have higher scores in the Dystopia Residual compared to other regions. After researching this further, I found that this phenomenon is discussed in the literature as the “Latin America Happiness Paradox.”

More information on this concept can be found below.

https://www.happinessandwellbeing.org/rojas

https://www.mappmagazine.com/articles/the-well-being-paradox

Code

top_scorers = df_2015[df_2015['Score'] > df_2015['Score'].mean()]
top_scorers.sort_values(by='DR').head() # Sorting to see countries with lowest Dystopia Residual among top scorers

	Country	Region	Rank	Score	SE	GDP	Family	Health	Freedom	Trust	Generosity	DR	Year
71	Hong Kong	Eastern Asia	72	5.474	0.05051	1.38604	1.05818	1.01328	0.59608	0.37124	0.39478	0.65429	2015
27	Qatar	Middle East and Northern Africa	28	6.611	0.06257	1.69042	1.07860	0.79733	0.64040	0.52208	0.32573	1.55674	2015
72	Estonia	Central and Eastern Europe	73	5.429	0.04013	1.15174	1.22791	0.77361	0.44888	0.15184	0.08680	1.58782	2015
65	North Cyprus	Western Europe	66	5.695	0.05635	1.20806	1.07008	0.92356	0.49027	0.14280	0.26169	1.59888	2015
54	Slovenia	Central and Eastern Europe	55	5.848	0.04251	1.18498	1.27385	0.87337	0.60855	0.03787	0.25328	1.61583	2015

Are they less happy than expected given their measured factors?

Top scoring countries happiness is mostly explained by measurable factors like GDP and Health. On the other hand, Latin American countries have high ‘residual’ scores. This shows that there are other factors making them happy that we cannot measure with these 6 variables.

Next, I focus on identifying countries with high happiness scores despite low levels of trust and freedom.

Code

fig = px.scatter(df_2015, x='Freedom', y='Trust', size='Score', color='Region', hover_name='Country', trendline='ols', trendline_scope='overall')
fig.show()

This is interesting: the bubbles for Central and Eastern Europe and Sub-Saharan Africa appear intertwined. The bubble sizes vary due to other factors. Rwanda, in particular, stands out among the Sub-Saharan African countries as a small bubble in the upper-right corner. This raises questions about potential data quality or country-specific measurement effects. *** Data Source: This analysis uses the World Happiness Report dataset, provided by the Sustainable Development Solutions Network and curated by Abigail Larion on Kaggle. Licensed under CC0.