Bas Ernst

This is an image of the blog post Panda's pivot table function

Panda's pivot table function

July 30, 2024

So I was working on a single dataframe with data from a large number of countries covering various years. In a simplified form it looked like this:

year country some column another column
2018 Austria some value another value
2018 Belgium etc etc
2018 next country etc etc
2018 another country etc etc
2019 Austria etc etc

This dataset turned out to be too complex to easily understand the situation for individual countries per year. If only the data could be regrouped in a meaningful way! The way to do this, is by using a pivot table in Pandas.

What does the pivot table do? It regroups the data and can perform an aggregation of the values of the selected column and presents it as a table that summarises the data in a meaningful way. Simplified, it could be something like this:

year Austria Belgium next country another country
2018 7 8 10 3
2019 12 18 20 6
2020 13 21 22 9
2021 15 28 23 12
2022 18 28 31 15

*the values are just examples

In Pandas the pivot table function works as follows:
- specify the dataframe and the names for the indexes, columns and values.
- include the type of calculation to be performed, such as count, sum, etc.

For example:

EU_total_missions_year = EU_df.pivot_table(index='year', columns='cname_send', values='ccode_send', aggfunc='count')

Here the dataframe is EU_df, the index is set to year, the countries (cname_send) become the columns and the count function aggregates the number of occurrences of the column ccode_send.

For some more examples, see my notebook here