Panda's pivot table function

July 30, 2024

So I was working on a single dataframe with data from a large number of countries covering various years. In a simplified form it looked like this:

year	country	some column	another column
2018	Austria	some value	another value
2018	Belgium	etc	etc
2018	next country	etc	etc
2018	another country	etc	etc
2019	Austria	etc	etc

This dataset turned out to be too complex to easily understand the situation for individual countries per year. If only the data could be regrouped in a meaningful way! The way to do this, is by using a pivot table in Pandas.

What does the pivot table do? It regroups the data and can perform an aggregation of the values of the selected column and presents it as a table that summarises the data in a meaningful way. Simplified, it could be something like this:

year	Austria	Belgium	next country	another country
2018	7	8	10	3
2019	12	18	20	6
2020	13	21	22	9
2021	15	28	23	12
2022	18	28	31	15

*the values are just examples

In Pandas the pivot table function works as follows:
- specify the dataframe and the names for the indexes, columns and values.
- include the type of calculation to be performed, such as count, sum, etc.

For example:

EU_total_missions_year = EU_df.pivot_table(index='year', columns='cname_send', values='ccode_send', aggfunc='count')

Here the dataframe is EU_df, the index is set to year, the countries (cname_send) become the columns and the count function aggregates the number of occurrences of the column ccode_send.

For some more examples, see my notebook here