Hash Code – Problems

Hash code is the crucial thing in hash-based algorithms like those used in hash maps and all problems come from that simple fact. Its efficiency is as important as the efficiency of the hashing algorithm itself. Let’s talk about those problems and how to solve them.

Why hash code can cause a problem?

The main problems are:

  1. Implementation determines the collision probability. This is an important factor and people tend to forget or underestimate it.
  2. hashCode() should be implemented together with equals(Object) method as they are used jointly. The perfect implementation of those methods should correspond to each other e.g. use the same fields/methods and the same precedence.
  3. You can easily forget to modify those two methods when you add a new field or modify it partially. This can sometimes break the correctness of the program. Of course, beside the case when you don’t want to use the new fields for objects comparison and hashing.

Simple hash collision examples

Here is my list of simple hash collisions (based on OpenJDK 11):

1. Hash code of null is 0 so it collides with integer and float 0:

assert 0 == Objects.hashCode(null);
assert 0 == Integer.hashCode(0);
assert 0 == Float.hashCode(0.0f);

Your object can return hash 0 and then it collides with null too.

2. It also happens in plain arrays and lists:

Integer[] ints = {null, 7};
Integer[] ints2 = {0, 7};
assert Arrays.hashCode(ints) == Arrays.hashCode(ints2);

3. Long uses xor of high-order bits and low-order bits as hash code which has interesting effect:

assert 0 == Long.hashCode(7L<<32 | 7);
long x = (long) anyInt();
assert 0 == Long.hashCode(x<<32 | x);
long y = (long) anyInt();
assert Long.hashCode(x<<32 | y) == Long.hashCode(y<<32 | x);

4. Array hash codes can collide easily when there is an element with negative hash:

Integer[] ints = {-31, 7};
Integer[] ints2 = {-31, 0, 0, 0, 7};
assert Arrays.hashCode(ints) == Arrays.hashCode(ints2);

int x = anyInt();
Integer[] ints3 = {-31, x, 0, -31*31*x, 7};
assert Arrays.hashCode(ints) == Arrays.hashCode(ints3);

If you don’t get why -31 “hacks” hash codes checkout previous article about Java collections.

5. Empty HashMap and HashSet have hashes equal to 0.

HashSet<Integer> set = new HashSet<>();
assert 0 == set.hashCode();
set.add(0);
assert 0 == set.hashCode();

6. Moreover HashMap uses xor key hash code and value hash code hence:

HashMap<Integer, Integer> map = new HashMap<>();
assert 0 == map.hashCode();
map.put(0, 0);
assert 0 == map.hashCode();
map.put(1, 1);
assert 0 == map.hashCode();
int x = anyInt();
map.put(x, x);
assert 0 == map.hashCode();

And even more tricky:

HashMap<Integer, Integer> map = new HashMap<>();
map.put(1, -2);
map.put(-2, 1);

assert 0 == set.hashCode();

xor is fun. Isn’t it? 🙂 check out my article Unique Integer for more fun with low-level algorithms optimization.

All of those examples are only the basic ones and can be easily combined together to generate collision.

How to do not forget to modify hashCode() and equals(Object) methods?

That’s actually a tricky question. From the one side, we could just add a rule to our code analyzer like SonarQube to detect that. From the other side sometimes we intentionally want to omit a new field in the equals method. Looks like the best advice here is to be aware of it and have good code reviewers in our teams.

One thing that could help (and is one of my best tools in Java) is Lombok. You could read how Lombok saved my ass in one of the previous articles. The true power of @Data and @Value annotations comes from generating equals and hashCode (besides hiding plain Java boilerplate). You can also use Kotlin’s data classes or Java’s records (preview feature in JDK 14).

In the next episode

It is 3rd episode but definitely not the last one. We barely touch the tip of the iceberg in the hashing world but it is an important introduction to understand the pros, cons and complexity of the hashing algorithm.

Leave a Reply