Home Random Python tips #2: sorting lists into buckets with defaultdict
Post
Cancel

Random Python tips #2: sorting lists into buckets with defaultdict

defaultdict is your friend

You’ve probably already found yourself in the following situation: you have a list of objects and you need to sort those objects into different buckets based on some criterion. Example:

1
2
3
4
5
countries_list = [
 "Albania",
 "Australia",
 "Belgium"
]

We would like to sort those strings by their first letter and into a dict, ie obtain this result:

1
2
3
4
5
6
7
8
9
sorted_countries = {
  "A": [
    "Albania",
    "Australia"
  ],
  "B": [
    "Belgium"
  ]
}

The most obvious way to do it would be:

1
2
3
4
5
6
sorted_countries = {}
for country in countries_list:
  c = country[0]
  if c not in sorted_countries.keys():
    sorted_countries[c] = []
  sorted_countries[c].append(country)

This works but is quite verbose and not very readable, especially if you are already a few indentation levels deep. Luckily Python provides this collection: defaultdict, which lets us be more concise:

1
2
3
4
5
6
from collections import defaultdict

sorted_countries = defaultdict(list)
for country in countries_list:
  c = country[0]
  sorted_countries[c].append(country)

Going deeper

What if you need to sort on multiple levels, e.g sort by first letter first and by word length second?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
sorted_countries = {
  "A": {
    7: [
      "Albania"
    ],
    9: [
      "Australia"
    ]
  },
  "B": {
    7: [
      "Belgium"
    ]
  }
}

Well with defaultdicts that’s easy. Just provide a defaultdict as the default:

1
2
3
4
5
sorted_countries = defaultdict(lambda: defaultdict(list))
for country in countries_list:
  c = country[0]
  l = len(country)
  sorted_countries[c][l].append(country)

We have to use a lambda here because the contructor expects a factory: a function that returns the default object to insert.

Needless to say this operation would have been much more verbose with the original method.

This post is licensed under CC BY 4.0 by the author.