Spring Data Interview Question : Efficient Keyset Pagination with Spring Data WindowIterator

Pagination, Optimization, Concept, Code Examples and more ...

Suraj Mishra

Mar 11, 2026

Scenario

You’re building a Spring Boot API to fetch comments for a post.

The table has tens of millions of rows, and users can scroll deeply into the history.

Initial Implementation:

Problem Observed in Production

As users scroll deeper:

Queries slow down dramatically
DB CPU rises
Response times become inconsistent

Upon investigation, we found that the actual executed query is somewhat like below:

Why offset pagination is inefficient

Offset pagination becomes inefficient because the database must scan and discard rows before returning the requested page.

What OFFSET actually does is:

SELECT * 
FROM post_comment
ORDER BY created_on DESC
OFFSET 100000 LIMIT 20;

The database does not jump to row 100000.

Instead it:

Reads rows in sorted order
Counts up to 100000
Discards them
Returns the next 20

So it still processes 100020 rows to serve that request.

Why does performance worsen with deeper pages?


page 1 → 20 rows scanned
page 100 → 2000 rows scanned
page 10000 → 200000 rows scanned
page 100000 → 1M rows scanned

This leads to:

More disk reads
More CPU sorting work
Larger temporary buffers
Higher query latency

So latency increases linearly with page depth.

📢 Get actionable Java/Spring Boot insights every week — from practical code tips to real-world use-case based interview questions.

Join 6200+ subscribers and level up your Spring & backend skills with hand-crafted content — no fluff.

🎯 Early Supporter Offer

The first 100 paid subscribers get the annual membership at $50/year.
👉 70 already joined — only 30 spots left.

So far we have covered 48+ real world based interview questions and will add up to 100 by end of this year.

Not convinced? Check out the details of the past work

So If offset does not scale well, What do you think we should use to improve the performance at scale?

Standard production solution is keyset pagination (a.k.a. seek pagination).

Instead of saying:

“Give me page 500”

We say:

“Give me rows after this last seen record”

This lets the database jump directly to the next rows using the index, instead of scanning and discarding earlier ones.

Example:

SELECT *
FROM post_comment
ORDER BY created_on DESC, id DESC
OFFSET 100000 LIMIT 20;

We do this:

SELECT *
FROM post_comment
WHERE (created_on, id) < (:lastCreatedOn, :lastId)
ORDER BY created_on DESC, id DESC
LIMIT 20;

Now the DB:

Uses the index to seek to the position
Reads only the next 20 rows
No scanning or discarding needed

Performance becomes O(page_size) instead of O(offset).

Why this scales

Keyset pagination:

Uses index seeks instead of scans
Has constant query cost regardless of page depth
Works well with millions of rows
Avoids duplicate/shifted results during inserts

This is why large systems use it for:

Social feeds
Comments
Activity timelines
Messaging history

How will you refactor the code to use KeySet based pagination ?

@Service
public class CommentService {

    private final PostCommentRepository repository;

    public CommentService(PostCommentRepository repository) {
        this.repository = repository;
    }

    public List<PostComment> streamAllComments(int batchSize) {

        WindowIterator<PostComment> iterator =
                WindowIterator.of(position ->
                        repository.findAllBy(
                                PageRequest.of(
                                        0,
                                        batchSize,
                                        Sort.by(
                                                Sort.Order.desc("createdOn"),
                                                Sort.Order.desc("id")
                                        )
                                ),
                                position
                        )
                ).startingAt(ScrollPosition.keyset());

        List<PostComment> results = new ArrayList<>();

        iterator.forEachRemaining(results::add);

        return results;
    }
}

What this does:

Fetches rows in windows (batches)
Uses keyset pagination automatically
Keeps memory usage small
No OFFSET anywhere
No manual cursor management

Endpoint Example

@RestController
@RequestMapping("/comments")
public class CommentController {

    private final CommentService service;

    public CommentController(CommentService service) {
        this.service = service;
    }

    @GetMapping("/stream")
    public List<PostComment> streamAll() {
        return service.streamAllComments(100);
    }
}

This endpoint:

Iterates over millions of rows safely
Loads only small windows at a time
Uses indexed seeks under the hood

Seems like we rely on sorting of the column on which we do filter. What if don’t have index on that column?

Keyset pagination relies on seeking directly to a position instead of scanning rows.

The query uses WHERE (created_on, id) < (:lastCreatedOn, :lastId)
The database can only efficiently jump to this position if there’s an index on the columns used in the WHERE + ORDER BY clause.

Without it:

The database has to scan all rows to evaluate the condition
Performance degrades to the same problem as OFFSET, especially on large tables

What if we need to access random page ? Will Keyset pagination help us ?

keyset pagination (also called cursor-based pagination) does not support random page access, and this is one of its fundamental trade-offs.

Keyset pagination (seek pagination):

Fetches the next page based on the last row’s sort values (created_on, id)
Works extremely efficiently even with millions of rows
Cannot jump to an arbitrary page number directly
Designed for infinite scroll or streaming

Problem with random pages:

Client asks for page 1000
Keyset needs a “cursor” pointing to the start of that page
Without storing all cursors, the database has no shortcut → cannot jump
OFFSET could work here, but at scale it’s slow

if Random Page Access is Required

We can use hybrid approach:
- Use keyset for scrolling / most users
- Allow OFFSET only for rare “jump to page” requests
Precompute cursors
- Store mapping of page numbers → keyset values
- Works for reporting dashboards, but adds storage/complexity

📚 Helpful Resources (5)

Before you leave, don’t forget to unlock all the real world use case based interview questions , just at $4/mo (with annual subscription).

Subscribe | Sponsor us | LinkedIn | Twitter

Happy Coding 🚀
Suraj

Engineering With Java

Discussion about this post

Ready for more?