How to delude your fellow developers and have Python dictionaries with non-unique keys!
Understand how Python objects behave as hashed dictionary keys.
You can think of Python dictionaries as key/value pairs, where keys are required to unique.
If keys have to be unique, how could I create the following dictionary?
{'Apple': 1, 'Apple': 2, 'Apple': 3}
This is the topic of this post, but first let’s understand how dictionaries and their keys work.
Keys have to be hashable
Dictionaries are meant to speed up the lookup of a value based on its key. That’s why they use hashes, and consequently keys have to be hashable.
Let’s create our own class, that implements a __hash__()
method. Thus, it is hashable, and we can use its objects as dictionary keys.
class Fruit:
def __hash__(self):
return 1
d = {
Fruit(): 1,
Fruit(): 2
}
# Printing d gives us the following
# {<__main__.Fruit at 0x10873eb10>: 1, <__main__.Fruit at 0x10873d9d0>: 2}
The __hash__()
method returns a constant value, 1. I.e. both instances of Fruit have the same hash. Nevertheless, the end up being two separate keys. Why?
You probably know the following about hash functions:
Same values have the same output, but having the same output doesn’t guarantee that the inputs are the same.
The case where different values have the same hash is known as collision. Typically, a good hash function should minimize collisions. Nevertheless, collisions are still possible anyway.
That’s why Python dictionaries don’t stop at checking the hash of the objects used as keys. They probably check if the objects are equal as well. And in case of collisions, two unequal objects with equal hashes will be seen as separate keys.
Ideas for what to try next?
Keys have to unique
Let’s override the __eq__()
method as well, and make sure our fruits are equal.
class Fruit:
def __hash__(self):
return 1
def __eq__(self, other):
return True
d = {
Fruit(): 1,
Fruit(): 2
}
# This time, printing d gives us the following
# {<__main__.Fruit at 0x10873ed50>: 2}
Tada! We finally have equal hashes and equal keys, and both keys are considered the same now. I.e. the second one overrides the first.
Of course, we can decide programmatically whether we want things to be equal or not.
class Fruit:
def __init__(self, name):
self.name = name
def __hash__(self):
return hash(self.name)
def __eq__(self, other):
return self.name == other.name
d = {
Fruit("Apple"): 1,
Fruit("Orange"): 2,
Fruit("Apple"): 3
}
# This time, printing d gives us the following
# {<__main__.Fruit at 0x10873cb90>: 3, <__main__.Fruit at 0x108247ad0>: 2}
Dressing the keys up
But keys look ugly, we don’t like this <__main__.Fruit .. blah blah
thing. Can we change that?
Of course, via the__repr__()
method.
class Fruit:
def __init__(self, name):
self.name = name
def __hash__(self):
return hash(self.name)
def __eq__(self, other):
return self.name == other.name
def __repr__(self):
return f"'{self.name}'"
d = {
Fruit("Apple"): 1,
Fruit("Orange"): 2,
Fruit("Apple"): 3
}
# And now, printing d gives us the following
# {'Apple': 3, 'Orange': 2}
Finally, we are about to decipher the riddle I had at the top.
All we need is a class, where objects have different hashes, yet their representations are the same. Something like this:
class Fruit:
def __init__(self, name):
self.name = name
def __hash__(self):
return hash(self.name)
def __eq__(self, other):
return self.name == other.name
def __repr__(self):
return "'Apple'"
d = {
Fruit("Apple"): 1,
Fruit("Orange"): 2,
Fruit("Banana"): 3
}
# And now, printing d gives us the following
# {'Apple': 1, 'Apple': 2, 'Apple': 3}
Actually, what I’ve just done now is effing evil.
One would think that the dictionary d
has string values as keys, given how it looks like.
And they will get an error if they try something like this:
d['Apple']
Add to that that though the 3 keys look the same, they aren’t the same.
That’s why this post is for educational purposes only.
By the way, if a class doesn’t implement __hash__()
and __eq__()
methods, Python provides a default implementation, where objects aren’t equal.
“User-defined classes have
__eq__()
and__hash__()
methods by default; with them, all objects compare unequal (except with themselves) andx.__hash__()
returns an appropriate value such thatx == y
implies both thatx is y
andhash(x) == hash(y)”
. [Python docs]
And there is a reason why Python want’s dictionary keys to be immutable. Or else one can do the following:
class Fruit:
def __init__(self, name):
self.name = name
def __hash__(self):
return hash(self.name)
def __eq__(self, other):
return self.name == other.name
def __repr__(self):
return self.name
apple = Fruit("Apple")
orange = Fruit("Orange")
d = {
apple: 1,
orange: 2,
}
Now if you mutate apple
you will run into problems.
apple.name = "banana"
d[apple]
# This will complain raising a KeyError
# KeyError: banana
As I said, this post is for educational purposes only.
I just like to see how things work when you bend them. I guess, you also find it fun to bend and break things to see how they actually work.
And before you go, here is one more post about hacking things