Spring Data JPA: deleteAllInBatch*() Methods Explained!
Internal code exploration with sample dataset, examples, and more.
Introduction
Spring Data JPA provides a lot of handy features to interact with databases for all kinds of operations.
One of the much-used operations is the deletion of the records. But deleting large numbers of records one by one is not resource efficient, hence doing it them in batches makes much sense.
Spring Data JPA provides helper methods that allow us to delete records in batches. In this article, we will deep dive into them, we will explore the internal code of these methods, and write logic along with the execution.
Sample Table/Model
First of all, let's create a sample table Campaign with some mock data.
Inserting Sample Data
Now let's insert 1M records to the Campaign table. The below script (plpgsql) does the job.
Now our table contains 1M records that we can delete from Spring Data JPA.
postgres=# select count(*) from campaign;
count
---------
1000000
(1 row)
JpaRepository
JpaRepository is one of the common repositories that we extend when we define our repository interface for managing our entities.
JpaRepository provides 3 helper methods that can delete records in batches:
1. deleteAllInBatch()
2. deleteAllInBatch(Iterable<T> entities)
3. deleteAllByIdInBatch(Iterable<ID> ids)Since JpaRepository is an interface, its implementation exists in the SimpleJpaRepository class.
Now let's deep dive into these methods and write some code along with execution.
deleteAllInBatch()
This method deletes all the records in the database in one batch.
If we look at the internal implementation of this method in the SimpleJpaRepository.java class, we see that all it's doing is getting a delete query, adding query hints if it's passed, and then applying executedUpdate().
If we use this method in our logic class, we just need to extend JpaRepository to campaignRepository and we will get access to deleteAllInBatch() since this method is an interface method and implemented by SimpleJpaRepository.java.
public interface CampaignRepository extends JpaRepository<Campaign, Long> { }
Below are the generated logs for our logic execution. As we can see the generated query is “delete from campaign” .
One thing we should keep in mind is that this method removes all the records from the database in one batch, so if we have a large table it might take too much of the time to finish the operation. In that case, the truncate operation may be faster to implement.
This method is not useful if we want to delete certain batches of records but not all.
If we want to delete certain batches of records, then we can use deleteAllInBatch(Iterable<T> entities) and deleteAllByIdInBatch(Iterable<ID> ids) methods.
deleteAllInBatch(Iterable<T> entities)
deleteAllInBatch(Iterable<T> entities) takes a collection of entities that we want to delete and builds a delete query for that many records.
If we see the internal code of the method, we will see the base query is “delete from %s x” but the applyAndBind method builds another part of the query that includes the target ids of the query.
Now we can write a simple method to delete a batch of entities. In the below example, we first query certain campaigns and then pass them to deleteAllInBatch(campaign) to delete those many campaigns.
Generated logs for the above method are listed below. As we can see deleteAllInBatch(campaign) generates a query following the query
“delete from campaign where id=? or id=? or id=?”Here ids are the ids of the campaign entities that we passed.
deleteAllInBatch(Iterable<T> entities) is a useful method when we want to delete certain batches of records. If we have to delete let's say 500k records then we can certainly build sub batches and loop over them and delete 10k records or 100k records at a time so that we don’t overwhelm our database.
deleteAllByIdInBatch(Iterable<ID> ids)
Like deleteAllInBatch(Iterable<T> entities), we can also use deleteAllByIdInBatch(Iterable<ID> ids) which takes a collection of IDs instead of entities but does the same job.
This method internally calls to deleteAllInBatch(Iterable<T> entities). Below is the internal code implementation for deleteAllByIdInBatch(Iterable<ID> ids).
Our simple logic queries the database for a range of campaigns and gets the IDs and targets for deletion by passing them to the deleteAllByIdInBatch(ids).
Generated logs show that the below query was executed.
“delete from campaign where id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)”If we have to delete large batch-size records then we can always paginate them and delete them in sub-batches instead of sending let's say 500k records in one query.
Conclusion
We should always keep in mind that operating over batches instead of doing all the work in one go makes sense for databases.
Spring Data JPA provides multiple helper methods that can delete records from the database. Depending on the use case we can choose one or the other.
Apart from using these helper methods, we can always write our own custom queries since Spring Data JPA allows that too.