defaultdict is your friend
You’ve probably already found yourself in the following situation: you have a list of objects and you need to sort those objects into different buckets based on some criterion. Example:
1
2
3
4
5
countries_list = [
"Albania",
"Australia",
"Belgium"
]
We would like to sort those strings by their first letter and into a dict, ie obtain this result:
1
2
3
4
5
6
7
8
9
sorted_countries = {
"A": [
"Albania",
"Australia"
],
"B": [
"Belgium"
]
}
The most obvious way to do it would be:
1
2
3
4
5
6
sorted_countries = {}
for country in countries_list:
c = country[0]
if c not in sorted_countries.keys():
sorted_countries[c] = []
sorted_countries[c].append(country)
This works but is quite verbose and not very readable, especially if you are already a few indentation levels deep. Luckily Python provides this collection: defaultdict, which lets us be more concise:
1
2
3
4
5
6
from collections import defaultdict
sorted_countries = defaultdict(list)
for country in countries_list:
c = country[0]
sorted_countries[c].append(country)
Going deeper
What if you need to sort on multiple levels, e.g sort by first letter first and by word length second?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
sorted_countries = {
"A": {
7: [
"Albania"
],
9: [
"Australia"
]
},
"B": {
7: [
"Belgium"
]
}
}
Well with defaultdicts that’s easy. Just provide a defaultdict
as the default:
1
2
3
4
5
sorted_countries = defaultdict(lambda: defaultdict(list))
for country in countries_list:
c = country[0]
l = len(country)
sorted_countries[c][l].append(country)
We have to use a lambda here because the contructor expects a factory: a function that returns the default object to insert.
Needless to say this operation would have been much more verbose with the original method.