Collections Framework
Wrong collection choice shows up as latency spikes, duplicate events in idempotent pipelines, and ConcurrentModificationException in a foreach loop. This chapter maps the hierarchy to complexity, internals, and what interviewers actually ask— especially HashMap and thread-safe alternatives.
Collections hierarchy
The framework lives in java.util (and concurrent peers in java.util.concurrent). Program to interfaces; choose a concrete class for complexity and ordering guarantees.
Iterable<T> iterator(), for-each
└── Collection<E> size(), add(), remove(), contains()
├── List<E> ordered, duplicates OK, index access
│ ├── ArrayList
│ ├── LinkedList (also Deque)
│ └── CopyOnWriteArrayList (concurrent)
├── Set<E> no duplicate equals() elements
│ ├── HashSet / LinkedHashSet
│ ├── TreeSet (NavigableSet)
│ └── EnumSet
└── Queue<E> / Deque<E> FIFO, LIFO, offer/poll
├── ArrayDeque
├── PriorityQueue
└── BlockingQueue* (java.util.concurrent)
Map<K,V> (NOT a Collection) key → value, unique keys
├── HashMap / LinkedHashMap
├── TreeMap (NavigableMap)
├── ConcurrentHashMap
├── WeakHashMap, IdentityHashMap, EnumMap
* BlockingDeque, DelayQueue, etc. extend concurrent package
| Interface | Question it answers |
|---|---|
Iterable | Can I traverse elements? |
Collection | Bag of elements—add, remove, bulk ops |
List | Do I need index + order + duplicates? |
Set | Do I need uniqueness? |
Queue / Deque | Do I process in FIFO/LIFO or both ends? |
Map | Do I look up by key? |
List<String> names = new ArrayList<>();
Set<UUID> seen = new HashSet<>();
Map<String, Integer> counts = new HashMap<>();
Deque<Task> work = new ArrayDeque<>();
// Accept widest type in APIs
void process(Collection<? extends Order> batch) { ... }
Real-world analogy: A List is a numbered deli line; a Set is a guest list (one entry per person); a Map is a phone book keyed by name.
List
Ordered sequences with positional access. Default for almost all application lists: ArrayList—unless you have a specific reason for linked nodes or copy-on-write semantics.
ArrayList — dynamic array
Backed by a growable Object[] (or typed array for some factories). Logical size size may be less than elementData.length (capacity).
- Append at end — amortized O(1): write at
size, increment; occasionally resize - get(i) / set(i) — O(1) index arithmetic + array access
- add/remove at middle — O(n) — System.arraycopy shifts elements
- Resize — new array ~1.5× capacity (JDK growth policy), copy all references
List<String> list = new ArrayList<>(10_000); // presize — one resize avoided
list.add("a"); // amortized O(1) at end
list.get(0); // O(1)
list.add(0, "front"); // O(n) — shift right
LinkedList — doubly linked nodes
Each element is a Node with prev, next, item. Implements List and Deque.
- Add/remove at ends — O(1) with pointer updates
- get(i) — O(n) — walk from nearer end
- Cache locality — nodes scattered on heap; each hop misses CPU cache lines
- Memory — extra overhead per element (two pointers + node object)
When to use which — benchmark reality
| Operation | ArrayList | LinkedList |
|---|---|---|
| get(i) / set(i) | O(1) | O(n) |
| add at end | O(1)* amortized | O(1) |
| add at index i | O(n) | O(n) to find + O(1) link |
| remove at index i | O(n) | O(n) to find + O(1) unlink |
| iterate all | Fast (sequential array) | Slower (pointer chasing) |
Micro-benchmarks (JMH) on modern JVMs: ArrayList wins for typical CRUD, bulk load, and iteration. LinkedList only competes when you already hold a ListIterator at a position and do many inserts there—or you need Deque and mistakenly pick LinkedList (prefer ArrayDeque).
CopyOnWriteArrayList — concurrent reads
Thread-safe: every mutating operation (add, set, remove) copies the entire backing array. Iterators snapshot the array at creation time—no CME, weakly consistent with concurrent writes.
- Good: listener lists, read-mostly config, event subscribers
- Bad: frequent writes, large lists (copy cost O(n) per write)
List<Runnable> listeners = new CopyOnWriteArrayList<>();
listeners.add(this::onChange); // copies array — OK if rare
for (Runnable r : listeners) {
r.run(); // safe even if another thread adds during loop
}
Pre-size new ArrayList<>(expectedSize) on bulk JDBC/stream loads. For thread-safe lists with mixed read/write, consider Collections.synchronizedList only with external sync on iteration—or concurrent queues/deques instead.
Set
Models mathematical sets: no two elements where e1.equals(e2) (unless implementation uses identity).
Pick implementation by ordering, performance, and concurrency.
HashSet
Backed by a HashMap—each element is a key mapped to a constant PRESENT sentinel object. Average O(1) add, contains, remove with good hash distribution.
No iteration order guarantee—order changes when capacity resizes.
LinkedHashSet
Hash table + doubly linked list through entries → insertion-order iteration (or access-order if constructed from access-order LinkedHashMap pattern). Slightly more memory, still O(1) average ops.
TreeSet — NavigableSet
Red-black tree (balanced BST) backing TreeMap. O(log n) add/contains/remove. Iteration in sorted order—natural ordering (Comparable) or Comparator.
Extra API: ceiling, floor, subSet, headSet, tailSet—useful for range queries on ordered keys.
EnumSet — bit vector
Keys are enum constants from one enum type. Internal representation is a bit vector (or long[]) over ordinals—O(1) add/contains/remove, extremely compact and fast. Iteration order matches enum declaration order.
| Implementation | Order | Avg complexity | Null elements |
|---|---|---|---|
| HashSet | None | O(1) | One null allowed (Java 8+) |
| LinkedHashSet | Insertion | O(1) | One null |
| TreeSet | Sorted | O(log n) | No null (NPE on compare) |
| EnumSet | Enum declaration | O(1) | No null |
Set<String> tags = new HashSet<>();
tags.add("java");
tags.add("java"); // false — duplicate ignored
Set<DayOfWeek> weekend = EnumSet.range(DayOfWeek.SATURDAY, DayOfWeek.SUNDAY);
NavigableSet<Integer> scores = new TreeSet<>();
scores.add(90);
scores.floor(85); // null — none ≤ 85
Mutable objects in HashSet/HashMap: if a field used in hashCode() changes after insert, the bucket index is wrong—the object becomes “lost.” Use immutable keys (String, UUID, records) or never mutate key fields.
Map — overview
Associates unique keys with values. Not a Collection, but values(), keySet(), and entrySet() are collection views backed by the map.
Interview focus: HashMap internals, resize/load factor, treeification, and ConcurrentHashMap vs synchronized wrappers. Production focus: key equality, ordering, and eviction (LRU).
HashMap — deep dive
The workhorse map: hash table of buckets, each bucket a short chain or red-black tree. Understanding this structure explains performance cliffs and thread-safety limits.
Put flow (simplified)
- Compute
hash = key.hashCode() ^ (hash >>> 16)— spreads high bits into index - Bucket index:
(n - 1) & hashwherenis table length (power of 2) - If bucket empty → new node; else walk chain comparing
hashandequals - If key found → replace value; else append node
- If size >
capacity * loadFactor(default 0.75) → resize (double n, rehash all entries)
table[] (length n = power of 2) [0] → Node(k1,v1) → Node(k2,v2) chain length 2 [1] null [2] → TreeNode (red-black) treeified: chain was ≥ 8 [3] → Node(k3,v3) ... threshold = (int)(n * loadFactor) // resize when size > threshold
Linked list → tree Java 8+
When a bucket chain length reaches 8 (and table length ≥ 64), chain converts to TreeNode (red-black tree) — worst-case per bucket becomes O(log n) instead of O(n) under hash collisions (e.g. malicious or poor keys).
When tree size drops to 6, untreeify back to linked list.
Load factor & resize
| Concept | Default | Effect |
|---|---|---|
| Load factor | 0.75 | Balance space vs collisions—higher → denser table, longer chains |
| Initial capacity | 16 | Power of 2; specify expected size to avoid repeated resize copies |
| Resize | 2× capacity | Rehash every entry into new table—O(n) pause |
Map<String, User> byEmail = new HashMap<>(16_384); // ~12k entries @ 0.75
byEmail.put("[email protected]", user);
User u = byEmail.get("[email protected]");
byEmail.putIfAbsent("[email protected]", new User());
byEmail.computeIfPresent("[email protected]", (k, v) -> v.withLastLogin(Instant.now()));
byEmail.merge("[email protected]", 1, Integer::sum); // if values are counts
Keys must implement consistent hashCode and equals. String caches hash. Identity hash codes for default Object keys scatter well but identity semantics only fit niche maps (IdentityHashMap).
Why is HashMap not thread-safe? Concurrent put can lose updates; historical JDKs had infinite loops during concurrent resize. Contrast Collections.synchronizedMap (one lock, whole map) vs ConcurrentHashMap (per-bin locking/CAS).
Other map implementations
LinkedHashMap — LRU cache pattern
Maintains doubly linked list through entries. Constructor flag accessOrder = true moves entry to tail on get—foundation for LRU caches.
Map<String, byte[]> cache = new LinkedHashMap<>(16, 0.75f, true) {
@Override
protected boolean removeEldestEntry(Map.Entry<String, byte[]> eldest) {
return size() > 10_000;
}
};
Production caches prefer Caffeine/Guava with TTL, weight, and stats—but the pattern is the same: bounded map + eviction hook.
TreeMap — NavigableMap
Sorted by key (Comparable or Comparator). O(log n) operations. NavigableMap: lowerKey, floorEntry, subMap, descendingMap.
ConcurrentHashMap
| JDK | Strategy |
|---|---|
| Java 7 | Segment arrays — 16 segments, each with own lock; concurrency level tuned segments |
| Java 8+ | Node array like HashMap; CAS on empty bins; synchronized on first node of bin for updates; tree bins under collision |
Reads generally lock-free. Aggregate size is approximate under contention. No null keys or values.
ConcurrentMap<String, Long> counters = new ConcurrentHashMap<>();
counters.merge("requests", 1L, Long::sum);
// Atomic replace if present
counters.compute("key", (k, v) -> v == null ? 1L : v + 1);
Niche maps
| Class | Semantics | Use case |
|---|---|---|
| WeakHashMap | Keys weakly referenced—entries removed when key only weakly reachable | Listener registries, metadata caches keyed by object identity without leaking |
| IdentityHashMap | == not equals | Graph traversal, serialization identity, JVM-level object tracking |
| EnumMap | Array indexed by enum ordinal | Fastest map when keys are one enum type—dense, ordered iteration |
Map<DayOfWeek, Integer> traffic = new EnumMap<>(DayOfWeek.class);
traffic.put(DayOfWeek.MONDAY, 1200);
Map<Object, String> metadata = new WeakHashMap<>();
Spring bean caches and rate limiters use ConcurrentHashMap. Kafka consumer offset storage and in-flight request maps need correct key equality. Hibernate second-level caches care about key serializability and identity vs equality.
Queue & Deque
Process work in order: FIFO task queues, priority scheduling, or handoff between threads.
Deque adds operations at both ends—stacks, BFS, sliding windows.
ArrayDeque vs LinkedList as Queue
| Aspect | ArrayDeque | LinkedList |
|---|---|---|
| Backing store | Circular array, resizes | Linked nodes |
| Null elements | Not allowed | Allowed |
| Performance | Faster, better locality | Slower for most queue workloads |
| Role | Default non-blocking Deque/Queue | Rare—legacy choice |
Deque<String> stack = new ArrayDeque<>();
stack.push("a");
stack.pop();
Queue<String> fifo = new ArrayDeque<>();
fifo.offer("first");
fifo.poll();
PriorityQueue — binary heap
Min-heap by default (Comparator natural order). offer / poll are O(log n); peek is O(1). Not thread-safe. Iterator does not return sorted order—only poll drains in priority order.
// Max-heap by reversing comparison
Queue<Task> urgent = new PriorityQueue<>(
Comparator.comparing(Task::deadline).reversed());
Task next = urgent.poll();
BlockingQueue — thread pools
| Implementation | Capacity | Typical use |
|---|---|---|
| LinkedBlockingQueue | Optional bound (unbounded default) | ThreadPoolExecutor work queue—producer/consumer backpressure |
| ArrayBlockingQueue | Fixed array | Bounded pool queues; optional fairness (slower) |
| SynchronousQueue | Zero internal storage | Handoff—producer waits for consumer (newCachedThreadPool) |
| DelayQueue | Unbounded | Scheduled tasks by Delayed.getDelay |
BlockingQueue<Runnable> work = new LinkedBlockingQueue<>(1000);
ExecutorService pool = new ThreadPoolExecutor(
4, 8, 60, TimeUnit.SECONDS,
work,
new ThreadPoolExecutor.CallerRunsPolicy());
Utility classes
Collections and Arrays provide algorithms and wrappers;
Java 9+ factory methods on interfaces create compact immutable collections.
Collections static methods
| Category | Examples |
|---|---|
| Sort / search | sort(list), binarySearch (sorted list) |
| Shuffle / rotate | shuffle, rotate, reverse |
| Immutable views | unmodifiableList, unmodifiableMap — throw on mutation |
| Synchronized wrappers | synchronizedList — lock per method, not compound ops |
| Singleton / empty | emptyList(), singletonList(x) |
| Min / max | min, max with Comparator |
Arrays utility
Arrays.sort, binarySearch, copyOf, fill, equals, compare (Java 9+), stream, parallelPrefix—primitives and objects (objects require Comparable or Comparator for sort).
Immutable factories Java 9+
List<String> colors = List.of("red", "green", "blue");
Set<Integer> ids = Set.of(1, 2, 3);
Map<String, Integer> map = Map.of("x", 1, "y", 2);
Map<String, Integer> big = Map.ofEntries(
Map.entry("a", 1),
Map.entry("b", 2));
// Null elements forbidden — NPE
// List.of(1, 2, 1); // OK for List (duplicates allowed)
// Set.of(1, 1); // IllegalArgumentException duplicate
List<String> mutable = new ArrayList<>(List.of("a"));
Collections.sort(mutable);
List<String> frozen = Collections.unmodifiableList(mutable);
Collections.synchronizedList does not make compound operations atomic. Always synchronize on the list when iterating: synchronized (list) { for (...) }. Prefer concurrent collections for shared mutable structures.
Sorting & comparison
Comparable embeds the natural order of a type.
Comparator defines external orders—multiple sort keys per class without changing the type.
Design decision
| Use Comparable when | Use Comparator when |
|---|---|
| One obvious default order (String, Integer) | Sort same type different ways (by name vs salary) |
| Type “owns” its ordering contract | You cannot modify the class (third-party type) |
| Used in TreeSet/TreeMap natural ordering | UI-driven sort columns, stream sorted(comparator) |
public record Employee(String name, int salary, LocalDate hired)
implements Comparable<Employee> {
@Override
public int compareTo(Employee o) {
return Integer.compare(this.salary, o.salary);
}
}
Comparator<Employee> byTenure = Comparator
.comparing(Employee::hired)
.thenComparing(Employee::name);
Comparator<Employee> bySalaryDesc = Comparator
.comparingInt(Employee::salary)
.reversed()
.thenComparing(Employee::name, Comparator.nullsLast(String::compareTo));
employees.sort(bySalaryDesc);
Collections.sort(legacyList, byTenure);
List.sort uses TimSort — O(n log n) worst case, stable (preserves equal elements’ order). Stability matters when sorting by one key then another with thenComparing.
Keep compareTo consistent with equals when using sorted sets/maps—or document divergence. If compareTo == 0 but objects not equal, TreeSet holds “duplicates.” Use nullsFirst / nullsLast explicitly.
Iterators
Uniform traversal without exposing internal structure. Know the three iterator types and
when structural modification during iteration triggers ConcurrentModificationException.
| Type | Direction | Capabilities |
|---|---|---|
Iterator | Forward | hasNext, next, optional remove |
ListIterator | Forward & backward | Index, set, add at cursor—List only |
Spliterator | Split for parallel | Streams call trySplit for fork/join; CHARACTERISTICS flags |
// Safe removal during iteration
for (Iterator<String> it = list.iterator(); it.hasNext(); ) {
String s = it.next();
if (s.isBlank()) it.remove();
}
// ListIterator — edit while scanning
ListIterator<String> li = list.listIterator();
while (li.hasNext()) {
if (li.next().equals("obsolete")) {
li.set("replacement");
}
}
// Idiomatic Java 8+
list.removeIf(String::isBlank);
// UNSAFE — fail-fast CME
for (String s : list) {
if (s.isBlank()) list.remove(s);
}
ConcurrentModificationException — why & how to avoid
Most non-concurrent collections track modCount (structural change counter). Iterator remembers expected modCount at creation; mismatch → fail-fast CME (best-effort, not guaranteed in all threads).
- Avoid: mutate collection directly during enhanced-for; use iterator
remove,removeIf, or copy-then-replace - Concurrent collections: weakly consistent iterators—no CME, may or may not see concurrent adds
- Streams: not fail-fast on all sources; still avoid concurrent modification of backing collection during pipeline unless concurrent collection
// ConcurrentHashMap — Iterator on entrySet is weakly consistent
ConcurrentHashMap<String, Long> map = new ConcurrentHashMap<>();
map.put("a", 1L);
for (var entry : map.entrySet()) {
map.merge("b", 1L, Long::sum); // no CME
}
Explain fail-fast vs fail-safe (legacy CopyOnWrite, concurrent weak consistency). Know removeIf and why for-each + list.remove fails. Spliterator: ORDERED, SIZED, DISTINCT flags affect stream performance.