Java Streams Side Effects (Interview Questions)

Sharing 4 Interesting Java Interview Questions

Dec 02, 2025

Consider becoming a free or paid subscriber and support the work :)

I am running Black Friday discount of 40% on our annual subscription , which makes java newsletter’s paid subscription at $1.6/mo ($30 annually) from $50. If you like to access premium content? grab this discount until 6th December.

1. ParallelStream returns missing/duplicated elements

Why does the code below produce missing or duplicated integers?
How would you fix it without removing parallelism?

public static void main(String[] args) {
    List<Integer> out = new ArrayList<>();
    IntStream.rangeClosed(1, 100)
            .parallel().
            forEach(out::add);
}

Explanation:

ArrayList is not thread-safe.
Parallel threads race on:
- internal resize,
- index increments,
- writes to the same backing array.
This causes:
- lost writes,
- duplicated writes,
- ArrayIndexOutOfBoundsException under heavy load.

Fix:

Option 1:

This option is preferred due to:

No shared mutable state,
No locks,
No contention
Used exactly as parallel streams are designed

// use parallel stream with collector to_list()
var out1 = IntStream.rangeClosed(1, 100)
        .parallel()
        .boxed()
        .toList();
System.out.println(out1.size()==100); //true

Option 2:

Shared mutable state across all threads
Every add() requires acquiring a monitor lock
Causes severe contention
Scaling collapses as thread count increases

// use syncronizedList
List<Integer> out2 = Collections.synchronizedList(new ArrayList<>());
IntStream.rangeClosed(1, 100)
        .parallel().
        forEach(out2::add);
System.out.println(out2.size()==100);//true

2. Counter modified inside forEach gives incorrect totals

Below works in sequential streams, but becomes subtly dangerous if someone changes an earlier part of the pipeline to parallel stream , why?

AtomicLong total = new AtomicLong();
invoices.stream().forEach(i -> total.addAndGet(i.amount()));

invoices.parallelStream()...

Explanation:

Streams guarantee correct behavior only if the mapping/reduction steps are side-effect-free.

We have:

Shared mutable state (AtomicLong)
Parallel execution

Even with AtomicLong, this can produce incorrect totals in production billing systems.

When we mutate the shared state inside forEach, the stream framework:

Cannot optimize
Cannot reorder safely
Cannot parallelize correctly
May drop operations when cancellation happens (e.g., findFirst, limit)

So we get missed increments or double increments in rare edge cases.

If parallel is desired, we can do the following:

long total = invoices.parallelStream()
        .mapToLong(Invoice::amount)
        .sum();

3. Updating the database inside the map() causes inconsistent writes

Question:

What subtle bug can occur here related to lazy evaluation or short-circuiting
Where should DB calls appear in a stream pipeline?

orders.stream()
      .map(o -> { repo.update(o.id); return o; })
      .map(this::process)
      .forEach(repo::save);

Explanation:

map() is a lazy intermediate operation. It runs only when downstream operations demand its value.
This means:

If someone adds .limit(10), then only the first 10 updates run.
If someone later uses .findFirst() or .anyMatch()Updates for the rest of the elements never execute.
If exceptions occur later in the pipeline, the update already happened → partial writes.
If the stream ever becomes parallel, map() may be:
- executed out of order
- executed more than once (task retries)
- skipped due to cancellation

All of these lead to an incomplete or inconsistent database state. In short, A database write is hidden inside map() does not have the execution guarantees you expect.

Where should DB calls appear in a stream pipeline?

In the terminal operation.Not inside map(), filter(), flatMap(), etc.

orders.stream()
      .map(this::process)
      .toList()                  // materialize results
      .forEach(o -> repo.update(o.id));

4. Accidental memory leak from capturing a large object in a lambda

Inside the lambda d -> log.debug(...):

The lambda captures the outer variable report because it references report.size().
This captured reference becomes part of a synthetic class instance created by the JVM to represent the lambda.
That lambda instance gets stored inside:
- the Stream pipeline,
- the forEach consumer,
- and most importantly the logging framework’s async buffer.

Why report stays in memory too long

If your logger (Log4j2, Logback, SLF4J) uses:

asynchronous appenders
ring buffers
message formatting queues
deferred string evaluation

Then the lambda or the formatted message object is queued and held long after the stream finishes.

Because the lambda contains a reference to report, the entire list—sometimes hundreds of thousands of objects—cannot be garbage collected.

So a simple debug log line ends up keeping:

Lambda -> reference to report -> entire List<ReportData> -> thousands of ReportData rows

This results in:

high memory spikes
long GC pauses
out-of-memory on larger batch sizes

Some Helpful Resources

Consider becoming a free or paid subscriber and support the work :)

Engineering With Java

Discussion about this post

Ready for more?