Collections Framework

Wrong collection choice shows up as latency spikes, duplicate events in idempotent pipelines, and ConcurrentModificationException in a foreach loop. This chapter maps the hierarchy to complexity, internals, and what interviewers actually ask— especially HashMap and thread-safe alternatives.

beginner mid senior

Collections hierarchy

The framework lives in java.util (and concurrent peers in java.util.concurrent). Program to interfaces; choose a concrete class for complexity and ordering guarantees.

InterfaceQuestion it answers
IterableCan I traverse elements?
CollectionBag of elements—add, remove, bulk ops
ListDo I need index + order + duplicates?
SetDo I need uniqueness?
Queue / DequeDo I process in FIFO/LIFO or both ends?
MapDo I look up by key?
Java
List<String> names = new ArrayList<>();
Set<UUID> seen = new HashSet<>();
Map<String, Integer> counts = new HashMap<>();
Deque<Task> work = new ArrayDeque<>();

// Accept widest type in APIs
void process(Collection<? extends Order> batch) { ... }

Real-world analogy: A List is a numbered deli line; a Set is a guest list (one entry per person); a Map is a phone book keyed by name.

List

Ordered sequences with positional access. Default for almost all application lists: ArrayList—unless you have a specific reason for linked nodes or copy-on-write semantics.

ArrayList — dynamic array

Backed by a growable Object[] (or typed array for some factories). Logical size size may be less than elementData.length (capacity).

  • Append at end — amortized O(1): write at size, increment; occasionally resize
  • get(i) / set(i)O(1) index arithmetic + array access
  • add/remove at middleO(n)System.arraycopy shifts elements
  • Resize — new array ~1.5× capacity (JDK growth policy), copy all references
Java
List<String> list = new ArrayList<>(10_000);  // presize — one resize avoided
list.add("a");           // amortized O(1) at end
list.get(0);             // O(1)
list.add(0, "front");    // O(n) — shift right

LinkedList — doubly linked nodes

Each element is a Node with prev, next, item. Implements List and Deque.

  • Add/remove at endsO(1) with pointer updates
  • get(i)O(n) — walk from nearer end
  • Cache locality — nodes scattered on heap; each hop misses CPU cache lines
  • Memory — extra overhead per element (two pointers + node object)

When to use which — benchmark reality

OperationArrayListLinkedList
get(i) / set(i)O(1)O(n)
add at endO(1)* amortizedO(1)
add at index iO(n)O(n) to find + O(1) link
remove at index iO(n)O(n) to find + O(1) unlink
iterate allFast (sequential array)Slower (pointer chasing)

Micro-benchmarks (JMH) on modern JVMs: ArrayList wins for typical CRUD, bulk load, and iteration. LinkedList only competes when you already hold a ListIterator at a position and do many inserts there—or you need Deque and mistakenly pick LinkedList (prefer ArrayDeque).

CopyOnWriteArrayList — concurrent reads

Thread-safe: every mutating operation (add, set, remove) copies the entire backing array. Iterators snapshot the array at creation time—no CME, weakly consistent with concurrent writes.

  • Good: listener lists, read-mostly config, event subscribers
  • Bad: frequent writes, large lists (copy cost O(n) per write)
Java
List<Runnable> listeners = new CopyOnWriteArrayList<>();
listeners.add(this::onChange);  // copies array — OK if rare

for (Runnable r : listeners) {
    r.run();  // safe even if another thread adds during loop
}
💡 Pro Tip

Pre-size new ArrayList<>(expectedSize) on bulk JDBC/stream loads. For thread-safe lists with mixed read/write, consider Collections.synchronizedList only with external sync on iteration—or concurrent queues/deques instead.

Set

Models mathematical sets: no two elements where e1.equals(e2) (unless implementation uses identity). Pick implementation by ordering, performance, and concurrency.

HashSet

Backed by a HashMap—each element is a key mapped to a constant PRESENT sentinel object. Average O(1) add, contains, remove with good hash distribution.

No iteration order guarantee—order changes when capacity resizes.

LinkedHashSet

Hash table + doubly linked list through entries → insertion-order iteration (or access-order if constructed from access-order LinkedHashMap pattern). Slightly more memory, still O(1) average ops.

TreeSet — NavigableSet

Red-black tree (balanced BST) backing TreeMap. O(log n) add/contains/remove. Iteration in sorted order—natural ordering (Comparable) or Comparator.

Extra API: ceiling, floor, subSet, headSet, tailSet—useful for range queries on ordered keys.

EnumSet — bit vector

Keys are enum constants from one enum type. Internal representation is a bit vector (or long[]) over ordinals—O(1) add/contains/remove, extremely compact and fast. Iteration order matches enum declaration order.

ImplementationOrderAvg complexityNull elements
HashSetNoneO(1)One null allowed (Java 8+)
LinkedHashSetInsertionO(1)One null
TreeSetSortedO(log n)No null (NPE on compare)
EnumSetEnum declarationO(1)No null
Java
Set<String> tags = new HashSet<>();
tags.add("java");
tags.add("java");  // false — duplicate ignored

Set<DayOfWeek> weekend = EnumSet.range(DayOfWeek.SATURDAY, DayOfWeek.SUNDAY);

NavigableSet<Integer> scores = new TreeSet<>();
scores.add(90);
scores.floor(85);  // null — none ≤ 85
⚠️ Pitfall

Mutable objects in HashSet/HashMap: if a field used in hashCode() changes after insert, the bucket index is wrong—the object becomes “lost.” Use immutable keys (String, UUID, records) or never mutate key fields.

Map — overview

Associates unique keys with values. Not a Collection, but values(), keySet(), and entrySet() are collection views backed by the map.

Interview focus: HashMap internals, resize/load factor, treeification, and ConcurrentHashMap vs synchronized wrappers. Production focus: key equality, ordering, and eviction (LRU).

HashMap — deep dive

The workhorse map: hash table of buckets, each bucket a short chain or red-black tree. Understanding this structure explains performance cliffs and thread-safety limits.

Put flow (simplified)

  1. Compute hash = key.hashCode() ^ (hash >>> 16) — spreads high bits into index
  2. Bucket index: (n - 1) & hash where n is table length (power of 2)
  3. If bucket empty → new node; else walk chain comparing hash and equals
  4. If key found → replace value; else append node
  5. If size > capacity * loadFactor (default 0.75) → resize (double n, rehash all entries)

Linked list → tree Java 8+

When a bucket chain length reaches 8 (and table length ≥ 64), chain converts to TreeNode (red-black tree) — worst-case per bucket becomes O(log n) instead of O(n) under hash collisions (e.g. malicious or poor keys).

When tree size drops to 6, untreeify back to linked list.

Load factor & resize

ConceptDefaultEffect
Load factor0.75Balance space vs collisions—higher → denser table, longer chains
Initial capacity16Power of 2; specify expected size to avoid repeated resize copies
Resize2× capacityRehash every entry into new table—O(n) pause
Java
Map<String, User> byEmail = new HashMap<>(16_384);  // ~12k entries @ 0.75
byEmail.put("[email protected]", user);
User u = byEmail.get("[email protected]");

byEmail.putIfAbsent("[email protected]", new User());
byEmail.computeIfPresent("[email protected]", (k, v) -> v.withLastLogin(Instant.now()));
byEmail.merge("[email protected]", 1, Integer::sum);  // if values are counts
🔬 Under the Hood

Keys must implement consistent hashCode and equals. String caches hash. Identity hash codes for default Object keys scatter well but identity semantics only fit niche maps (IdentityHashMap).

🎯 Interview Tip

Why is HashMap not thread-safe? Concurrent put can lose updates; historical JDKs had infinite loops during concurrent resize. Contrast Collections.synchronizedMap (one lock, whole map) vs ConcurrentHashMap (per-bin locking/CAS).

Other map implementations

LinkedHashMap — LRU cache pattern

Maintains doubly linked list through entries. Constructor flag accessOrder = true moves entry to tail on get—foundation for LRU caches.

Java
Map<String, byte[]> cache = new LinkedHashMap<>(16, 0.75f, true) {
    @Override
    protected boolean removeEldestEntry(Map.Entry<String, byte[]> eldest) {
        return size() > 10_000;
    }
};

Production caches prefer Caffeine/Guava with TTL, weight, and stats—but the pattern is the same: bounded map + eviction hook.

TreeMap — NavigableMap

Sorted by key (Comparable or Comparator). O(log n) operations. NavigableMap: lowerKey, floorEntry, subMap, descendingMap.

ConcurrentHashMap

JDKStrategy
Java 7Segment arrays — 16 segments, each with own lock; concurrency level tuned segments
Java 8+Node array like HashMap; CAS on empty bins; synchronized on first node of bin for updates; tree bins under collision

Reads generally lock-free. Aggregate size is approximate under contention. No null keys or values.

Java
ConcurrentMap<String, Long> counters = new ConcurrentHashMap<>();
counters.merge("requests", 1L, Long::sum);

// Atomic replace if present
counters.compute("key", (k, v) -> v == null ? 1L : v + 1);

Niche maps

ClassSemanticsUse case
WeakHashMapKeys weakly referenced—entries removed when key only weakly reachableListener registries, metadata caches keyed by object identity without leaking
IdentityHashMap== not equalsGraph traversal, serialization identity, JVM-level object tracking
EnumMapArray indexed by enum ordinalFastest map when keys are one enum type—dense, ordered iteration
Java
Map<DayOfWeek, Integer> traffic = new EnumMap<>(DayOfWeek.class);
traffic.put(DayOfWeek.MONDAY, 1200);

Map<Object, String> metadata = new WeakHashMap<>();
📦 Real World

Spring bean caches and rate limiters use ConcurrentHashMap. Kafka consumer offset storage and in-flight request maps need correct key equality. Hibernate second-level caches care about key serializability and identity vs equality.

Queue & Deque

Process work in order: FIFO task queues, priority scheduling, or handoff between threads. Deque adds operations at both ends—stacks, BFS, sliding windows.

ArrayDeque vs LinkedList as Queue

AspectArrayDequeLinkedList
Backing storeCircular array, resizesLinked nodes
Null elementsNot allowedAllowed
PerformanceFaster, better localitySlower for most queue workloads
RoleDefault non-blocking Deque/QueueRare—legacy choice
Java
Deque<String> stack = new ArrayDeque<>();
stack.push("a");
stack.pop();

Queue<String> fifo = new ArrayDeque<>();
fifo.offer("first");
fifo.poll();

PriorityQueue — binary heap

Min-heap by default (Comparator natural order). offer / poll are O(log n); peek is O(1). Not thread-safe. Iterator does not return sorted order—only poll drains in priority order.

Java
// Max-heap by reversing comparison
Queue<Task> urgent = new PriorityQueue<>(
    Comparator.comparing(Task::deadline).reversed());

Task next = urgent.poll();

BlockingQueue — thread pools

ImplementationCapacityTypical use
LinkedBlockingQueueOptional bound (unbounded default)ThreadPoolExecutor work queue—producer/consumer backpressure
ArrayBlockingQueueFixed arrayBounded pool queues; optional fairness (slower)
SynchronousQueueZero internal storageHandoff—producer waits for consumer (newCachedThreadPool)
DelayQueueUnboundedScheduled tasks by Delayed.getDelay
Java
BlockingQueue<Runnable> work = new LinkedBlockingQueue<>(1000);
ExecutorService pool = new ThreadPoolExecutor(
    4, 8, 60, TimeUnit.SECONDS,
    work,
    new ThreadPoolExecutor.CallerRunsPolicy());

Utility classes

Collections and Arrays provide algorithms and wrappers; Java 9+ factory methods on interfaces create compact immutable collections.

Collections static methods

CategoryExamples
Sort / searchsort(list), binarySearch (sorted list)
Shuffle / rotateshuffle, rotate, reverse
Immutable viewsunmodifiableList, unmodifiableMap — throw on mutation
Synchronized wrapperssynchronizedList — lock per method, not compound ops
Singleton / emptyemptyList(), singletonList(x)
Min / maxmin, max with Comparator

Arrays utility

Arrays.sort, binarySearch, copyOf, fill, equals, compare (Java 9+), stream, parallelPrefix—primitives and objects (objects require Comparable or Comparator for sort).

Immutable factories Java 9+

Java
List<String> colors = List.of("red", "green", "blue");
Set<Integer> ids = Set.of(1, 2, 3);
Map<String, Integer> map = Map.of("x", 1, "y", 2);
Map<String, Integer> big = Map.ofEntries(
    Map.entry("a", 1),
    Map.entry("b", 2));

// Null elements forbidden — NPE
// List.of(1, 2, 1);  // OK for List (duplicates allowed)
// Set.of(1, 1);      // IllegalArgumentException duplicate

List<String> mutable = new ArrayList<>(List.of("a"));
Collections.sort(mutable);
List<String> frozen = Collections.unmodifiableList(mutable);
⚠️ Pitfall

Collections.synchronizedList does not make compound operations atomic. Always synchronize on the list when iterating: synchronized (list) { for (...) }. Prefer concurrent collections for shared mutable structures.

Sorting & comparison

Comparable embeds the natural order of a type. Comparator defines external orders—multiple sort keys per class without changing the type.

Design decision

Use Comparable whenUse Comparator when
One obvious default order (String, Integer)Sort same type different ways (by name vs salary)
Type “owns” its ordering contractYou cannot modify the class (third-party type)
Used in TreeSet/TreeMap natural orderingUI-driven sort columns, stream sorted(comparator)
Java
public record Employee(String name, int salary, LocalDate hired)
    implements Comparable<Employee> {
    @Override
    public int compareTo(Employee o) {
        return Integer.compare(this.salary, o.salary);
    }
}

Comparator<Employee> byTenure = Comparator
    .comparing(Employee::hired)
    .thenComparing(Employee::name);

Comparator<Employee> bySalaryDesc = Comparator
    .comparingInt(Employee::salary)
    .reversed()
    .thenComparing(Employee::name, Comparator.nullsLast(String::compareTo));

employees.sort(bySalaryDesc);
Collections.sort(legacyList, byTenure);

List.sort uses TimSort — O(n log n) worst case, stable (preserves equal elements’ order). Stability matters when sorting by one key then another with thenComparing.

💡 Pro Tip

Keep compareTo consistent with equals when using sorted sets/maps—or document divergence. If compareTo == 0 but objects not equal, TreeSet holds “duplicates.” Use nullsFirst / nullsLast explicitly.

Iterators

Uniform traversal without exposing internal structure. Know the three iterator types and when structural modification during iteration triggers ConcurrentModificationException.

TypeDirectionCapabilities
IteratorForwardhasNext, next, optional remove
ListIteratorForward & backwardIndex, set, add at cursor—List only
SpliteratorSplit for parallelStreams call trySplit for fork/join; CHARACTERISTICS flags
Java
// Safe removal during iteration
for (Iterator<String> it = list.iterator(); it.hasNext(); ) {
    String s = it.next();
    if (s.isBlank()) it.remove();
}

// ListIterator — edit while scanning
ListIterator<String> li = list.listIterator();
while (li.hasNext()) {
    if (li.next().equals("obsolete")) {
        li.set("replacement");
    }
}

// Idiomatic Java 8+
list.removeIf(String::isBlank);

// UNSAFE — fail-fast CME
for (String s : list) {
    if (s.isBlank()) list.remove(s);
}

ConcurrentModificationException — why & how to avoid

Most non-concurrent collections track modCount (structural change counter). Iterator remembers expected modCount at creation; mismatch → fail-fast CME (best-effort, not guaranteed in all threads).

  • Avoid: mutate collection directly during enhanced-for; use iterator remove, removeIf, or copy-then-replace
  • Concurrent collections: weakly consistent iterators—no CME, may or may not see concurrent adds
  • Streams: not fail-fast on all sources; still avoid concurrent modification of backing collection during pipeline unless concurrent collection
Java
// ConcurrentHashMap — Iterator on entrySet is weakly consistent
ConcurrentHashMap<String, Long> map = new ConcurrentHashMap<>();
map.put("a", 1L);
for (var entry : map.entrySet()) {
    map.merge("b", 1L, Long::sum);  // no CME
}
🎯 Interview Tip

Explain fail-fast vs fail-safe (legacy CopyOnWrite, concurrent weak consistency). Know removeIf and why for-each + list.remove fails. Spliterator: ORDERED, SIZED, DISTINCT flags affect stream performance.