How to Implement Java's hashCode Correctly

At SitePoint we’re always looking to expand the range of topics we cover. Lately, we’ve set our sights on exploring the world of Java. If you’re a strong Java developer who wants to contribute to our coverage, get in touch with a few ideas for articles you’d like to write.

So you’ve decided that identity isn’t enough for you and wrote a nice equals implementation?
Great! But now you have to implement hashCode as well.

Let’s see why and how to do it correctly.

Key Takeaways

Equal objects should have the same hash code in Java, hence if equals is overridden, a matching hashCode implementation must be created to ensure consistency and accuracy in storing and retrieving objects in hash-based collections.
When implementing hashCode, the same fields that are used in equals should be utilized. Mutable fields and collections should be better avoided due to potential performance issues.
Hash codes are about performance optimization, so it’s important not to invest too much energy in hashing unless profiling indicates a need for improvement.
Hash collisions, where two different objects have the same hash code, can be reduced by improving the hashing algorithm and using a larger prime number as a multiplier. This helps distribute the hash codes more evenly across the collection, reducing the likelihood of hash collisions and ensuring faster data retrieval.

Equality and Hash Code

While equality makes sense from a general perspective, hash codes are much more technical. If we were being a little hard on them, we could say that they are just an implementation detail to improve performance.

Most data structures use equals to check whether they contain an element. For example:

List<String> list = Arrays.asList("a", "b", "c");
boolean contains = list.contains("b");

The variable contains is true because, while instances of "b" are not identical (again, ignoring String interning), they are equal.

Comparing every element with the instance given to contains is wasteful, though, and a whole class of data structures uses a more performant approach. Instead of comparing the requested instance with each element they contain, they use a shortcut that reduces the number of potentially equal instances and then only compare those.

This shortcut is the hash code, which can be seen as an object’s equality boiled down to an integer value. Instances with the same hash code are not necessarily equal but equal instances have the same hash code. (Or should have, we will discuss this shortly.) Such data structures are often named after this technique, recognizable by the Hash in their name, with HashMap the most notable representative.

This is how they generally work:

When an element is added, its hash code is used to compute the index in an internal array (called a bucket).
If other, non-equal elements have the same hash code, they end up in the same bucket and must be bundled together, e.g. by adding them to a list.
When an instance is given to contains, its hash code is used to compute the bucket. Only elements therein are compared to the instance.

This way, very few, ideally no equals comparisons are required to implement contains.

As equals, hashCode is defined on Object.

Thoughts on Hashing

If hashCode is used as a shortcut to determine equality, then there is really only one thing we should care about: Equal objects should have the same hash code.

This is also why, if we override equals, we must create a matching hashCode implementation! Otherwise things that are equal according to our implementation would likely not have the same hash code because they use Object‘s implementation.

The `hashCode` Contract

Quoting the source:

The general contract of hashCode is:

Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

It is not required that if two objects are unequal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

The first bullet mirrors the consistency property of equals and the second is the requirement we came up with above. The third states an important detail that we discuss will in a moment.

math

Implementing `hashCode`

A very easy implementation of Person.hashCode is the following:

@Override
public int hashCode() {
    return Objects.hash(firstName, lastName);
}

The person’s hash code is computed by computing the hash codes for the relevant fields and combining them. Both is left to Objects‘ utility function hash.

Selecting Fields

But which fields are relevant? The requirements help answer this: If equal objects must have the same hash code, then hash code computation should not include any field that is not used for equality checks. (Otherwise two objects that only differ in those fields would be equal but have different hash codes.)

So the set of fields used for hashing should be a subset of the fields used for equality. By default both will use the same fields but there are a couple of details to consider.

Consistency

For one, there is the consistency requirement. It should be interpreted rather strictly. While it allows the hash code to change if some fields change (which is often unavoidable with mutable classes), hashing data structures are not prepared for this scenario.

As we have seen above the hash code is used to determine an element’s bucket. But if the hash-relevant fields change, the hash is not recomputed and the internal array is not updated.

This means that a later query with an equal object or even with the very same instance fails! The data structure computes the current hash code, different from the one used to store the instance, and goes looking in the wrong bucket.

Conclusion: Better not use mutable fields for hash code computation!

Performance

Hash codes might end up being computed about as often as equals is called. This can very well happen in performance critical parts of the code so it makes sense to think about performance. And unlike equals there is a little more wiggle room to optimize it.

Unless sophisticated algorithms are used or many, many fields are involved, the arithmetic cost of combining their hash codes is as negligible as it is unavoidable. But it should be considered whether all fields need to be included in the computation! Particularly collections should be viewed with suspicion. Lists and sets, for example, will compute the hash for each of their elements. Whether calling them is necessary should be considered on a case-by-case basis.

If performance is critical, using Objects.hash might not be the best choice either because it requires the creation of an array for its varargs.

But the general rule about optimization holds: Don’t do it prematurely! Use a common hash code algorithm, maybe forego including the collections, and only optimize after profiling showed potential for improvement.

Collisions

Going all-in on performance, what about this implementation?

@Override
public int hashCode() {
    return 0;
}

It’s fast, that’s for sure. And equal objects will have the same hash code so we’re good on that, too. As a bonus, no mutable fields are involved!

But remember what we said about buckets? This way all instances will end up in the same! This will typically result in a linked list holding all the elements, which is terrible for performance. Each contains, for example, triggers a linear scan of the list.

So what we want is as few items in the same bucket as possible! An algorithm that returns wildly varying hash codes, even for very similar objects, is a good start.

How to get there partly depends on the selected fields. The more details we include in the computation, the more likely it is for the hash codes to differ. Note how this is completely opposite to our thoughts about performance. So, interestingly enough, using too many or too few fields can result in bad performance.

The other part to preventing collisions is the algorithm that is used to actually compute the hash.

Computing The Hash

The easiest way to compute a field’s hash code is to just call `hashCode` on it. Combining them could be done manually. A common algorithm is to start with some arbitrary number and to repeatedly multiply it with another (often a small prime) before adding a field’s hash:

int prime = 31;
int result = 1;
result = prime * result + ((firstName == null) ? 0 : firstName.hashCode());
result = prime * result + ((lastName == null) ? 0 : lastName.hashCode());
return result;

This might result in overflows, which is not particularly problematic because they cause no exceptions in Java.

Note that even great hashing algorithms might result in uncharacteristically frequent collisions if the input data has specific patterns. As a simple example assume we would compute the hash of points by adding their x and y-coordinates. May not sound too bad until we realize that we often deal with points on the line f(x) = -x, which means x + y == 0 for all of them. Collisions, galore!

But again: Use a common algorithm and don’t worry until profiling shows that something isn’t right.

Summary

We have seen that computing hash codes is something like compressing equality to an integer value: Equal objects must have the same hash code and for performance reasons it is best if as few non-equal objects as possible share the same hash.

This means that hashCode must always be overridden if equals is.

When implementing hashCode:

Use a the same fields that are used in equals (or a subset thereof).
Better not include mutable fields.
Consider not calling hashCode on collections.
Use a common algorithm unless patterns in input data counteract them.

Remember that hashCode is about performance, so don’t waste too much energy unless profiling indicates necessity.

Frequently Asked Questions (FAQs) about Implementing Java’s Hashcode Correctly

What is the significance of the hashCode() method in Java?

The hashCode() method in Java is a built-in function that returns an integer value. It is used primarily in hash-based collections like HashMap, HashSet, and HashTable to store and retrieve objects more efficiently. The hashCode() method works in tandem with the equals() method to ensure that each object has a unique identifier. This helps in the quick retrieval of data, especially in large collections, thereby improving the performance of your Java application.

How does the hashCode() method work in Java?

The hashCode() method in Java works by generating an integer value that represents the memory address of the object. This value is used as the index number for the object in the hash-based collection. When you call the hashCode() method on an object, it uses a hashing algorithm to generate this unique integer. However, it’s important to note that two different objects can have the same hashCode, which is known as a hash collision.

What is the contract between equals() and hashCode() methods in Java?

The contract between equals() and hashCode() methods in Java is a set of rules that govern their interaction. The contract states that if two objects are equal according to the equals() method, then calling the hashCode() method on each of the two objects must produce the same integer result. This ensures consistency and accuracy when storing and retrieving objects in hash-based collections.

How can I override the hashCode() method in Java?

Overriding the hashCode() method in Java involves providing your own implementation that returns a unique integer for each object. This can be done by using the object’s instance variables and a prime number multiplier. The prime number helps in distributing the hash codes evenly across the collection, reducing the likelihood of hash collisions.

What is a hash collision and how can it be avoided?

A hash collision occurs when the hashCode() method generates the same integer for two different objects. This can lead to data loss if not handled correctly. To avoid hash collisions, you can improve your hashing algorithm to generate more unique integers. Also, using a larger prime number as a multiplier can help distribute the hash codes more evenly across the collection.

Why should I override the hashCode() method?

Overriding the hashCode() method can improve the performance of your Java application, especially when dealing with large collections. By providing your own implementation, you can generate more unique and evenly distributed hash codes, reducing the likelihood of hash collisions and ensuring faster data retrieval.

Can two unequal objects have the same hashCode in Java?

Yes, two unequal objects can have the same hashCode in Java. This is known as a hash collision. However, the likelihood of this happening can be reduced by improving your hashing algorithm and using a larger prime number as a multiplier.

What happens if I don’t override the hashCode() method?

If you don’t override the hashCode() method, Java will use its default implementation, which may not provide a unique hash code for each object. This can lead to hash collisions and slower data retrieval in hash-based collections.

How does the hashCode() method improve the performance of my Java application?

The hashCode() method improves the performance of your Java application by providing a unique identifier for each object. This allows for faster data retrieval in hash-based collections, as the object can be located directly using its hash code, rather than having to search through the entire collection.

Can I use the hashCode() method for objects in a non-hash-based collection?

While the hashCode() method is primarily used in hash-based collections, it can also be used in non-hash-based collections. However, its benefits may not be as pronounced, as non-hash-based collections do not rely on hash codes for data storage and retrieval.

How to Implement Java’s hashCode Correctly