Spring Data JPA: Replace multiple queries with a single query
Understanding query improvement tricks as a software engineer
🚀 Java + Spring Boot + SQL Interview + Certification Prep 🚀
🎯 Perfect for aspiring developers, job switchers & interview prep warriors!🔥 What’s Inside (PDF format):
Grokking the Java Interview (Vol 1 & 2)
Grokking the Spring Boot Interview
Grokking the SQL Interview
Java SE 17 Developer Certification (1Z0-829) Guide
250+ Spring Framework Certification Practice Questions
Introduction
As software developers, we usually tend to lack database knowledge and how to be efficient while executing database queries. ORM frameworks were built so that they can bring that abstraction where developers do not need to understand databases in detail.
But in reality, we are often interrupted by a database administrator where feedback is to optimize the query.
In this article, we will see such an opportunity where we will get a quick win on our naive query.
Use Case
Our marketing users perform ad-hoc data analysis and have created a list of target users to whom they want to send an email about the upcoming campaigns. They have uploaded this file to a remote server using UI.
There is a scheduler that runs sendEmail job, this job is a spring boot rest endpoint that is invoked by cron.
Before sending an email to a customer, this job checks if the user_email exists in the database ( to avoid sending emails to the non_customer or customer who is not in the OLTP database or not subscribed to get an email marketing campaign).
Then we finally send an email to the customer who exists in the database and a file uploaded by the marketing team both.
Setup
We will set up a simple spring boot app with postgres as a database. Please follow this article to set up one.
Entity
Basically, we have one table account in the database, our entity looks like the one below.
Controller
The controller is simple to get an endpoint that invokes the email sender logic.
Here the response is void but in reality, we can return a boolean response about success or failure.
Repository
In the repository interface, we have added one method to check if the given email exists in the table or not.
All we have to do is to define the method that’s it, no implementation is needed, spring data JPA takes care of the implementation internally.
Service
We have created a notificationLogic class which consists of sendEmail method. We are calling another private method that gets the list of emails that contains in the file and database both.
Additionally, we are also measuring the total time it took to execute this method so that we can compare it with another approach for comparison.
In the getEmailExistsInDB method, we are assuming that after reading the remote server file we have an immutable list of email addresses to the marketer who wants to send emails.
Now, before sending an email we would like to check if this email exists in the OLTP system, hence we iterate over each email address and execute a query on the OLTP database if they exist then we add them to the result in the list, and Finally, we return the list.
Execution
Now, let's execute the rest endpoint by using curl.
curl localhost:8080/accounts/sendEmail
In order to see the executed query in logs, add the below property in the applications.properties file.
spring.jpa.show-sql=true=
Once we execute the query, we see the below logs, which clearly shows that it executes 4 query check for 4 emails. If the number of emails is larger we have to make more database roundtrips. It's clearly inefficient.
Improvement
Now one of the modifications we can make to our query is to pass the list of email addresses as a parameter to the SQL query and use IN operator to check if the email exists in the table.
By doing so we save multiple round trips to the database and get a performance win, since now we just have to execute the query once.
Repository
We can use @query to write our custom query in spring data jpa. We are passing our input list as params and the output will be all the matches that we got from the database. All in a single query.
Service
Now let's modify our send email method and now call the improvised getEmailExistsInDB method.
getEmailExistsInDB method now passes the entire email list to the repository method emailExisits and gets the matching emailAddress from the database.
Execution
Now let's see the improvement by executing the query. Let's invoke the controller by typing the below curl command.
curl localhost:8080/accounts/sendEmail
We can see that the total time it took is 9 ms compared to 126 ms in the previous case. It's a significant improvement.
Conclusion
In this article, we learned how we can improve our JPA query as software developers and improve significant performance by reducing database roundtrips.
We replace executing the DB query for each record with passing the entire list in a single query by using a custom query in JPA.
As software engineers, we should understand how particular queries will impact the database and try to seek some help from database experts when necessary.