Hash Code – Introduction

Welcome to the first article of the Hash Code miniserie where you can read how the hash codes (non-cryptographic hashing algorithms) and hash collections work in different programming languages. Every software engineer uses hash collections like Python’s dictionary, Java’s hash map or C++’s unordered map. You get them to know pretty early in your learning path as they are key data structures to solve many problems effectively but are you sure that you know them well? In this miniserie, I will shed some light on this subject.

What is the hash code

As mention above a hash code is the result of a non-cryptographic hashing algorithm so it cannot be used to provide security. It maps a value (let it be object or primitive value) to the fixed-length binary value in a deterministic way. In Java, it is a primitive 32-bit integer provided by every object via hashCode() method.

How does the hash codes work with Java primitives?

Java does not treat primitives the same way as objects. You cannot call hashCode method directly on the value but you can get it via wrapper class e.g. Integer.hashCode(fooInt). For int it is just the value of this integer. Types shorter than 4 bytes are casted to int so there is no precision lost. Long is a 64-bit number so it has to somehow “compress” itself:

public static int hashCode(long value) {
    return (int)(value ^ (value >>> 32));
}

Note that values that could be represented by 32-bit integer have the same hashcodes as plain integers.

Floating points numbers are a little more tricky. Java uses floatToIntBits (it makes sense since floats are 4 bytes) but there special cases for NaN value:

public static int hashCode(float value) {
    return floatToIntBits(value);
}

public static int floatToIntBits(float value) {
    if (!isNaN(value)) {
        return floatToRawIntBits(value);
    }
    return 0x7fc00000;
}

public static boolean isNaN(float v) {
    return (v != v);
}

public static native int floatToRawIntBits(float var0);

Note how isNaN is implemented.

Quite interesting is magic number used for NaN. The quick experiment shows us that NaN is greater than positive infinity and Float.MAX_VALUE. Table of selected float values:

CodeHex valueBinary valueDecimal value
Float.floatToRawIntBits(POSITIVE_INFINITY)0x7f80000011111111-00000000-00000000-000000002139095040
Float.floatToRawIntBits(Float.MAX_VALUE)0x7f7fffff11111110-11111111-11111111-111111112139095039
Float.floatToRawIntBits(Float.NaN)0x7fc0000011111111-10000000-00000000-000000002143289344
Float.hashcode(Float.NaN)0x7fc0000011111111-10000000-00000000-000000002143289344
Integer.MAX_VALUE0x7fffffff11111111-11111111-11111111-111111112147483647

Note that bit representations of float’s Nan, MAX_VALUE, POSITIVE_INFINITY differ significantly and do not reach a maximum 32-bit integer value. All as per IEEE 754 standard.

The distinguished number is used for NaN to hash it always to the same value, no matter how we get it. Remember that various unpermitted operations can produce NaN but with different bit representation.

double uses the same algorithm as float but stores bits in long and then does the same XOR trick as long.

public static int hashCode(boolean value) {
    return value ? 1231 : 1237;
}

Why exactly those two magic numbers has been chosen? They are just two significantly large prime numbers. You will understand why it is important after reading the next article 🙂

What about objects?

The default hashCode() implementation is inherited from Object class and it is a native method. A lot of people used to say that this value is related to the address of the object in the memory. The truth is in documentation:

The hashCode may or may not be implemented as some function of an object’s memory address at some point in time.

If you override it in your class hierarchy and for some reason need to get hash code value from the original object implementation then you can use System.identityHashCode(Object).

It is up to the programmer how he implements hash code in own classes. The default implementation is inherited from Object and should be overwritten. I will explain the reason later.

What about enums?

enum is a special class in Java. It does not provide any new hash code implementation – it just uses Object one. This is sufficient and there is no need to change that as enums are singletons and this is forced by Java itself.

It’s not the end

Stay tuned for the next article when I will write how hash codes are generated for Java collections

Note that all code examples come from OpenJDK 11.

2 thoughts on “Hash Code – Introduction”

  1. There is a small typo in code for hashCode implementation for Long, brackets are missing – it should be: `return (int)(value ^ (value >>> 32));`

Leave a Reply