Java Streams Side Effects (Interview Questions)
Sharing 4 Interesting Java Interview Questions
Consider becoming a free or paid subscriber and support the work :)
I am running Black Friday discount of 40% on our annual subscription , which makes java newsletter’s paid subscription at $1.6/mo ($30 annually) from $50. If you like to access premium content? grab this discount until 6th December.
1. ParallelStream returns missing/duplicated elements
Why does the code below produce missing or duplicated integers?
How would you fix it without removing parallelism?
public static void main(String[] args) {
List<Integer> out = new ArrayList<>();
IntStream.rangeClosed(1, 100)
.parallel().
forEach(out::add);
}Explanation:
ArrayListis not thread-safe.Parallel threads race on:
internal resize,
index increments,
writes to the same backing array.
This causes:
lost writes,
duplicated writes,
ArrayIndexOutOfBoundsExceptionunder heavy load.
Fix:
Option 1:
This option is preferred due to:
No shared mutable state,
No locks,
No contention
Used exactly as parallel streams are designed
// use parallel stream with collector to_list()
var out1 = IntStream.rangeClosed(1, 100)
.parallel()
.boxed()
.toList();
System.out.println(out1.size()==100); //trueOption 2:
Shared mutable state across all threads
Every
add()requires acquiring a monitor lockCauses severe contention
Scaling collapses as thread count increases
// use syncronizedList
List<Integer> out2 = Collections.synchronizedList(new ArrayList<>());
IntStream.rangeClosed(1, 100)
.parallel().
forEach(out2::add);
System.out.println(out2.size()==100);//true2. Counter modified inside forEach gives incorrect totals
Below works in sequential streams, but becomes subtly dangerous if someone changes an earlier part of the pipeline to parallel stream , why?
AtomicLong total = new AtomicLong();
invoices.stream().forEach(i -> total.addAndGet(i.amount()));
invoices.parallelStream()...Explanation:
Streams guarantee correct behavior only if the mapping/reduction steps are side-effect-free.
We have:
Shared mutable state (
AtomicLong)Parallel execution
Even with AtomicLong, this can produce incorrect totals in production billing systems.
When we mutate the shared state inside forEach, the stream framework:
Cannot optimize
Cannot reorder safely
Cannot parallelize correctly
May drop operations when cancellation happens (e.g., findFirst, limit)
So we get missed increments or double increments in rare edge cases.
If parallel is desired, we can do the following:
long total = invoices.parallelStream()
.mapToLong(Invoice::amount)
.sum();3. Updating the database inside the map() causes inconsistent writes
Question:
What subtle bug can occur here related to lazy evaluation or short-circuiting
Where should DB calls appear in a stream pipeline?
orders.stream()
.map(o -> { repo.update(o.id); return o; })
.map(this::process)
.forEach(repo::save);Explanation:
map() is a lazy intermediate operation. It runs only when downstream operations demand its value.
This means:
If someone adds
.limit(10), then only the first 10 updates run.If someone later uses
.findFirst()or.anyMatch()Updates for the rest of the elements never execute.If exceptions occur later in the pipeline, the update already happened → partial writes.
If the stream ever becomes parallel,
map()may be:executed out of order
executed more than once (task retries)
skipped due to cancellation
All of these lead to an incomplete or inconsistent database state. In short, A database write is hidden inside map() does not have the execution guarantees you expect.
Where should DB calls appear in a stream pipeline?
In the terminal operation.Not inside map(), filter(), flatMap(), etc.
orders.stream()
.map(this::process)
.toList() // materialize results
.forEach(o -> repo.update(o.id));4. Accidental memory leak from capturing a large object in a lambda
Inside the lambda d -> log.debug(...):
The lambda captures the outer variable
reportbecause it referencesreport.size().This captured reference becomes part of a synthetic class instance created by the JVM to represent the lambda.
That lambda instance gets stored inside:
the Stream pipeline,
the forEach consumer,
and most importantly the logging framework’s async buffer.
Why report stays in memory too long
If your logger (Log4j2, Logback, SLF4J) uses:
asynchronous appenders
ring buffers
message formatting queues
deferred string evaluation
Then the lambda or the formatted message object is queued and held long after the stream finishes.
Because the lambda contains a reference to report, the entire list—sometimes hundreds of thousands of objects—cannot be garbage collected.
So a simple debug log line ends up keeping:
Lambda -> reference to report -> entire List<ReportData> -> thousands of ReportData rowsThis results in:
high memory spikes
long GC pauses
out-of-memory on larger batch sizes
Some Helpful Resources
Consider becoming a free or paid subscriber and support the work :)




