Welcome to the first article of the Hash Code miniserie where you can read how the hash codes (non-cryptographic hashing algorithms) and hash collections work in different programming languages. Every software engineer uses hash collections like Python’s dictionary, Java’s hash map or C++’s unordered map. You get them to know pretty early in your learning path as they are key data structures to solve many problems effectively but are you sure that you know them well? In this miniserie, I will shed some light on this subject.

## What is the hash code

As mention above a hash code is the result of a non-cryptographic hashing algorithm so it cannot be used to provide security. It maps a value (let it be object or primitive value) to the fixed-length binary value in a deterministic way. In Java, it is a primitive 32-bit integer provided by every object via `hashCode()`

method.

## How does the hash codes work with Java primitives?

Java does not treat primitives the same way as objects. You cannot call `hashCode`

method directly on the value but you can get it via wrapper class e.g. `Integer.hashCode(fooInt)`

. For `int`

it is just the value of this integer. Types shorter than 4 bytes are casted to `int`

so there is no precision lost. Long is a 64-bit number so it has to somehow “compress” itself:

```
public static int hashCode(long value) {
return (int)(value ^ (value >>> 32));
}
```

*Note that values that could be represented by 32-bit integer have the same hashcodes as plain integers*.

Floating points numbers are a little more tricky. Java uses `floatToIntBits`

(it makes sense since floats are 4 bytes) but there special cases for `NaN`

value:

```
public static int hashCode(float value) {
return floatToIntBits(value);
}
public static int floatToIntBits(float value) {
if (!isNaN(value)) {
return floatToRawIntBits(value);
}
return 0x7fc00000;
}
public static boolean isNaN(float v) {
return (v != v);
}
public static native int floatToRawIntBits(float var0);
```

*Note how isNaN is implemented.*

Quite interesting is magic number used for `NaN`

. The quick experiment shows us that `NaN`

is greater than positive infinity and `Float.MAX_VALUE`

. Table of selected `float`

values:

Code | Hex value | Binary value | Decimal value |
---|---|---|---|

Float.floatToRawIntBits(POSITIVE_INFINITY) | 0x7f800000 | 11111111-00000000-00000000-00000000 | 2139095040 |

Float.floatToRawIntBits(Float.MAX_VALUE) | 0x7f7fffff | 11111110-11111111-11111111-11111111 | 2139095039 |

Float.floatToRawIntBits(Float.NaN) | 0x7fc00000 | 11111111-10000000-00000000-00000000 | 2143289344 |

Float.hashcode(Float.NaN) | 0x7fc00000 | 11111111-10000000-00000000-00000000 | 2143289344 |

Integer.MAX_VALUE | 0x7fffffff | 11111111-11111111-11111111-11111111 | 2147483647 |

*Note that bit representations of float’s Nan, MAX_VALUE, POSITIVE_INFINITY differ significantly and do not reach a maximum 32-bit integer value. All as per IEEE 754 standard.*

The distinguished number is used for `NaN`

to hash it always to the same value, no matter how we get it. Remember that various unpermitted operations can produce `NaN`

but with different bit representation.

`double`

uses the same algorithm as `float`

but stores bits in `long`

and then does the same XOR trick as `long`

.

```
public static int hashCode(boolean value) {
return value ? 1231 : 1237;
}
```

Why exactly those two magic numbers has been chosen? They are just two significantly large prime numbers. You will understand why it is important after reading the next article 🙂

## What about objects?

The default `hashCode()`

implementation is inherited from `Object`

class and it is a native method. A lot of people used to say that this value is related to the address of the object in the memory. The truth is in documentation:

The hashCode may or may not be implemented as some function of an object’s memory address at some point in time.

If you override it in your class hierarchy and for some reason need to get hash code value from the original object implementation then you can use `System.identityHashCode(Object)`

.

It is up to the programmer how he implements hash code in own classes. The default implementation is inherited from `Object`

and should be overwritten. I will explain the reason later.

## What about enums?

`enum`

is a special class in Java. It does not provide any new hash code implementation – it just uses `Object`

one. This is sufficient and there is no need to change that as enums are singletons and this is forced by Java itself.

## It’s not the end

Stay tuned for the next article when I will write how hash codes are generated for Java collections

*Note that all code examples come from OpenJDK 11.*

There is a small typo in code for hashCode implementation for Long, brackets are missing – it should be: `return (int)(value ^ (value >>> 32));`

Good eye! JDK11 source code uses these brackets: https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/java/lang/Long.java#L1414

Code without brackets `return (int)(value ^ value >>> 32);` still works correctly as bit shift has higher precedence than XOR.

Generally, you are right as good practices say that we should express precedence with brackets for simplicity.