How to delude your fellow developers and have Python dictionaries with non-unique keys!

Understand how Python objects behave as hashed dictionary keys.

Tarek Amr

--

Photo by Cody Nottingham on Unsplash

You can think of Python dictionaries as key/value pairs, where keys are required to unique.

If keys have to be unique, how could I create the following dictionary?

{'Apple': 1, 'Apple': 2, 'Apple': 3}

This is the topic of this post, but first let’s understand how dictionaries and their keys work.

Keys have to be hashable

Dictionaries are meant to speed up the lookup of a value based on its key. That’s why they use hashes, and consequently keys have to be hashable.

Let’s create our own class, that implements a __hash__() method. Thus, it is hashable, and we can use its objects as dictionary keys.

class Fruit:

def __hash__(self):
return 1

d = {
Fruit(): 1,
Fruit(): 2
}

# Printing d gives us the following
# {<__main__.Fruit at 0x10873eb10>: 1, <__main__.Fruit at 0x10873d9d0>: 2}

The __hash__() method returns a constant value, 1. I.e. both instances of Fruit have the same hash. Nevertheless, the end up being two separate keys. Why?

You probably know the following about hash functions:

Same values have the same output, but having the same output doesn’t guarantee that the inputs are the same.

The case where different values have the same hash is known as collision. Typically, a good hash function should minimize collisions. Nevertheless, collisions are still possible anyway.

That’s why Python dictionaries don’t stop at checking the hash of the objects used as keys. They probably check if the objects are equal as well. And in case of collisions, two unequal objects with equal hashes will be seen as separate keys.

Ideas for what to try next?

Keys have to unique

Let’s override the __eq__() method as well, and make sure our fruits are equal.

class Fruit:

def __hash__(self):
return 1

def __eq__(self, other):
return True

d = {
Fruit(): 1,
Fruit(): 2
}

# This time, printing d gives us the following
# {<__main__.Fruit at 0x10873ed50>: 2}

Tada! We finally have equal hashes and equal keys, and both keys are considered the same now. I.e. the second one overrides the first.

Of course, we can decide programmatically whether we want things to be equal or not.

class Fruit:

def __init__(self, name):
self.name = name

def __hash__(self):
return hash(self.name)

def __eq__(self, other):
return self.name == other.name

d = {
Fruit("Apple"): 1,
Fruit("Orange"): 2,
Fruit("Apple"): 3
}

# This time, printing d gives us the following
# {<__main__.Fruit at 0x10873cb90>: 3, <__main__.Fruit at 0x108247ad0>: 2}

Dressing the keys up

But keys look ugly, we don’t like this <__main__.Fruit .. blah blah thing. Can we change that?

Of course, via the__repr__() method.

class Fruit:

def __init__(self, name):
self.name = name

def __hash__(self):
return hash(self.name)

def __eq__(self, other):
return self.name == other.name

def __repr__(self):
return f"'{self.name}'"

d = {
Fruit("Apple"): 1,
Fruit("Orange"): 2,
Fruit("Apple"): 3
}

# And now, printing d gives us the following
# {'Apple': 3, 'Orange': 2}

Finally, we are about to decipher the riddle I had at the top.

All we need is a class, where objects have different hashes, yet their representations are the same. Something like this:

class Fruit:

def __init__(self, name):
self.name = name

def __hash__(self):
return hash(self.name)

def __eq__(self, other):
return self.name == other.name

def __repr__(self):
return "'Apple'"

d = {
Fruit("Apple"): 1,
Fruit("Orange"): 2,
Fruit("Banana"): 3
}

# And now, printing d gives us the following
# {'Apple': 1, 'Apple': 2, 'Apple': 3}

Actually, what I’ve just done now is effing evil.

One would think that the dictionary d has string values as keys, given how it looks like.

And they will get an error if they try something like this:

d['Apple']

Add to that that though the 3 keys look the same, they aren’t the same.

That’s why this post is for educational purposes only.

By the way, if a class doesn’t implement __hash__() and __eq__()methods, Python provides a default implementation, where objects aren’t equal.

“User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y)”. [Python docs]

And there is a reason why Python want’s dictionary keys to be immutable. Or else one can do the following:

class Fruit:

def __init__(self, name):
self.name = name

def __hash__(self):
return hash(self.name)

def __eq__(self, other):
return self.name == other.name

def __repr__(self):
return self.name

apple = Fruit("Apple")
orange = Fruit("Orange")

d = {
apple: 1,
orange: 2,
}

Now if you mutate apple you will run into problems.

apple.name = "banana"
d[apple]

# This will complain raising a KeyError
# KeyError: banana

As I said, this post is for educational purposes only.

I just like to see how things work when you bend them. I guess, you also find it fun to bend and break things to see how they actually work.

--

--

Tarek Amr

I write about what machines can learn from data, what humans can learn from machines, and what businesses can learn from all three.