How do Java collections work? A data structures quick guide

A quick guide exploring how Java collections (such as lists, sets, and maps) work to enhance app development with efficient data storage and retrieval methods.

By Josh Cummings

Nov 6, 2024 • 8 Minute Read

Please set an alt value for this image...

Subscribe to the newsletter

In this article, we’ll talk about some common data structures that Java and many other languages implement that make application development faster, more flexible, and more robust. We’ll look at lists, sets, and maps, as well as a few of their close cousins. By the end, you’ll be able to list a handful of data structures and how they work.

Bookmarks

It always takes me a while to think of bookmarks, and yet they always turn out to be so helpful. And the end of my first sitdown with a book, I will note the page number, close the book, set it down, and rely on an overly-developed confidence in my own memory to know where I should start when I pick up the book again.

When I come back, inevitably I’m faced with the disappointment that my memory didn’t live up to my expectations and I do a little thumbing through to find where I was.

Sometimes the vicious cycle stops there; sometimes it happens a few more times. Either way, with a flair of discovering an idea for the first time, I think “ah, bookmarks!” and things get a little bit better for me.

Data structures are a bit like clever bookmarks in that help an application store memory in ways that make retrieval simple and efficient later on.

Java has several of these data structures in what it calls its Collections API. Let’s take a quick look at the most common ones.

Lists

Lists are a kind of collection that can be indexed. Or, in other words, I can ask for the nth element. This also means I can remove the nth element or insert something at index n instead of just at the beginning or end.

In Java, the interface for this is called, not surprisingly, List. It has two common implementations: ArrayList and LinkedList.

ArrayList gets its name from the fact that it operates like an array. The difference is that ArrayList is a bit of a magician and makes it appear that the array’s size is dynamic. Like arrays, I can look up a value by an index in constant time, meaning it takes me the same amount of time to find element n as it does for me to find element 100 * n.

To prove all this, create a list and array side-by-side and see where they behave the same:

          String[] dissertationTitlesArray = { “On Defending Oneself with 
a Pineapple”, “The Plight of Chimpanzee Stenographers”, … };
List<String>  dissertationTitlesList = new ArrayList<>
(Arrays.asList(dissertationTitlesArray));
dissertationTitlesArray[0] = “On Defending Oneself with a 
Coconut”;
dissertationTitlesList.set(0, “On Defending Oneself with a 
Coconut”);
String acceptedPaper = dissertationTitlesArray[3];
acceptedPaper = dissertationTitlesList.get(3);

    

Lists are different in that you can add to them:

          dissertationTitlesList.add(“The Rise of Mediterranian-Canadian 
Fusion Restaurants”);

The way a list achieves this is by using a backing array of some initial size, by default 10. And when it gets filled up, it creates a new array double that size, copies everything into the new array, and adds the new element to the now large enough array. If we are doing a lot more reads than writes, this is a great setup because we only rarely have to pay for an array expansion.

If we are doing a lot more writes than reads, though, a LinkedList may be better. You create this in the same way:

          List<String> dissertationTitlesLinkedList = new LinkedList<>
(dissertationTitlesList);

In this case, the data is not indexed in an array. Instead, a block of memory is allocated anew for each new value and the previous block of memory is edited to point to the newly added block of memory. While this may sound complicated, it makes adding to the array very quick since an expansion event never happens. On the other hand, it means that looking up index n is proportionally faster to lookup than index 100 * n.

There are a few other interfaces and implementations that are good to be aware of. Cousins to List are Queue and Deque. These are good for describing the reading and writing semantics of the list. A Queue means that you will always add to the back and remove from the front, like a line or queue at the grocery store. A Deque is often used when you intend to always add from the top and remove from the top, like a stack of oranges at the store (unless you are my children).

There are also thread-safe implements like CopyOnWriteArrayList. These are lists that can be shared between threads in an application.

Sets

A set is like a list where all the values are unique. Given that, any object you add to a set likely needs to override the equals and hashCode methods so that you have control over saying which values are equal to each other. Or you can alternatively use Java records, which have a reasonable default for these two methods.

For example, consider a Set of Cards. You could declare a Card like this:

          public record Card(Suit suit, Integer value) {
    public enum Suit { HEARTS, SPADES, CLUBS, DIAMONDS };
}

    

Then I can safely add instances of card into a Set and Java will correctly ensure that it has no duplicates:

          Set<Card> cards = new HashSet<>();
cards.add(new Card(Suit.HEARTS, 8)); // will add
cards.add(new Card(Suit.SPADES, 9)); // will add
cards.add(new Card(Suit.HEARTS, 8)); // will not add

    

As you can see, one of the implementations is HashSet. It’s called this because the objects hash determines where it is going to live in memory. Having a good hash guarantees nearly instantaneous uniqueness checking so that inserts are quick. If you want Java to remember the order in which they were inserted, you can use LinkedHashSet.

Set also comes with TreeSet, which is helpful when, instead of a hash, the values have a natural ordering. For example, you could use a TreeSet to place in order any Cards given out of order:

          public record Card(Suit suit, Integer number) implements 
Comparable<Card> {
    // …

    @Override 
    public int compareTo(Card that) {
        if (this.equals(that)) {
            return 0;
        }
        if (this.suit.ordinal() == that.suit.ordinal()) {
            return this.number - that.number;
        }
        return this.suit.ordinal() - that.suit.ordinal();
    }
}

// …
Set<Card> ordered = new TreeSet<>();
ordered.add(new Card(Suit.DIAMONDS, 10));
ordered.add(new Card(Suit.DIAMONDS, 3));
ordered.add(new Card(Suit.HEARTS, 9));
System.out.println(ordered); // will show 9 of Hearts, then 3 of 
Diamonds, then 10 of Diamonds

    

This is because a TreeSet stores the cards in a tree structure of nodes and branches where the right branch has everything higher than a given node and the left branch has everything lower than a given node.

Again, that may sound pretty advanced. The takeaway is that HashSet and LinkedHashSet have quick insertion. LinkedHashSet and TreeSet guarantee ordering; LinkedHashSet remembers order of insertion while TreeSet remembers natural ordering.

Set also has thread-safe implementations like ConcurrentSkipListSet.

Maps

A map is an important data structure that’s like a list and a set put together; the entries can be looked up by key and they are unique by key.

You saw that with a list you need to provide a numeric index, but what if your index is more complicated than that?

For example, let’s say that you have a set of toaster models that you need to keep track of:

          public record ToasterInventory(String sku, int stock) {
}

Naturally, there’s an advantage in being able to look up the toaster inventory by sku. But, if you use a list, that’s not going to help:

          List<ToasterInventory> toasters = List.of(new ToasterInventory
(“toaster-max-3000”, 123), …);
// someone wants to order 3 toaster-max-5000s
toasters.get(???)

    

Maps to the rescue! You can save yourself by storing the inventory by key like so:

          Map<ToasterInventory> inventory = new HashMap<>();
inventory.put(“toaster-max-3000”, new ToasterInventory
(“toaster-max-3000”, 123”));
// … someone wants to order 3 toaster-max-5000s
ToasterInventory ordered = toasters.get(“toaster-max-5000”)

    

Like HashSet, this is called HashMap because it relies on the key’s hash to determine the location in memory. You can use LinkedHashSet to also remember the insertion order. Or TreeMap to have it order the values naturally by key.

Set also has thread-safe implementations like ConcurrentHashMap.

Review

That was a lot of APIs to cover! Here is a quick overview:

Type	Description
List	the interface for all list types. Can look up, add, and remove by index
Queue	the interface for collections like a line at a grocery store. You always add to one side and remove from the other
Deque	the interface for collections like a stack of oranges. You always add to one side and remove from the same side
ArrayList	a common list that is backed by an array. Lookups are instant; insertions are fast except when the array needs expanding. Good for general-purpose usage
LinkedList	a common list where each element points to the next one. Lookups are linear; insertions are instant. Good for when there are many more writes than reads or when you are always writing and reading from the ends
Set	the interface for all set types. Ensures each entry is unique
HashSet	determines memory location by the element’s hash ensuring instant access in most cases. Good for general-purpose usage
LinkedHashSet	same as HashSet; also records the order of insertion
TreeSet	determines memory location by natural ordering. Ensures quick, if not instant, access. Re-orders values as they are added
Map	the interface for all map types. Ensures each entry is unique by a key
HashMap	determines memory location by the key’s hash, ensuring instance access in most cases. Good for general-purpose usage
LinkedHashMap	same as HashMap; also records the order of insertion
TreeMap	determines memory by the key’s natural ordering. Ensures quick, if not instant, access. Re-orders values as they are added

Conclusion

You’ve just gotten exposure to some of Java’s “clever bookmarks” or data structures that you can use to hold onto and retrieve information. Lists are good for sequences of data that you either just need to loop over or can reference by index. Sets are good for data where elements need to be unique. Maps are a good combination of both; they are unique by key, they can be looked up by key, and they can be iterated over.

Josh C.

Like many software craftsmen, Josh eats, sleeps, and dreams in code. He codes for fun, and his kids code for fun! Right now, Josh works as a full-time committer on Spring Security and loves every minute. Application Security holds a special place in his heart, a place diametrically opposed to and cosmically distant from his unending hatred for checked exceptions.

More about this author