Implement Retry Mechanism - Java Interview Question

Solving Java Interview Questions

Oct 15, 2024

You are designing a service that needs to communicate with an external API, which occasionally fails due to transient network issues. Describe how you would implement a retry mechanism, to handle these failures.
Followup, explain when you would use a circuit breaker instead of a retry mechanism, and discuss the scenario of implementing both of them together.

🚀Preparing For Java Interview? Checkout Grokking the Java Interview. 🚀

This book aims to bridge that gap by introducing classical Java interview questions on these crucial topics. It is designed for programmers preparing for Java interviews, providing frequently asked questions with detailed answers and explanations.

Solution

Retry With Exponential Backoff and Jitter

When communicating with an external API to handle transient network issues, we should implement a retry mechanism to automatically retry failed requests.
The retry mechanism would attempt to resend the request a limited number of times before giving up. To implement this, we would use exponential backoff with jitter to determine the wait time between retries. This strategy increases the backoff time exponentially with each retry, and jitter ( a random delay ) helps spread out retry requests, reducing the risk of overwhelming the external service simultaneous retries.

public class RetryWithExponentialBackoff {
    private static final int MAX_ATTEMPTS = 5;
    private static final long INITIAL_BACKOFF_MILLIS = 1000; // 1 second
    private static final long MAX_BACKOFF_MILLIS = 10000; // 10 seconds
    private static final Random RANDOM = new Random();

    public static void main(String[] args) {
        try {
            retryTask();
            System.out.println("Task completed successfully.");
        } catch (Exception e) {
            System.err.println("Task failed after retries: " + e.getMessage());
        }
    }

    private static void retryTask() throws Exception {
        int attempts = 0;
        while(attempts < MAX_ATTEMPTS){
            try {
                performTask();
                return;
            } catch (Exception e){
                attempts++;
                if(attempts >= MAX_ATTEMPTS){
                    throw new Exception("Max retry reached.", e);
                }

                long backOffTime = calculateBackOffWithJitter(attempts);
                System.err.printf("Attempt %d failed. Retrying in %d ms...%n", attempts, backOffTime);
                try {
                    Thread.sleep(backOffTime);
                } catch (InterruptedException ie) {
                    Thread.currentThread().interrupt();
                    throw new RuntimeException("Thread was interrupted during retry delay", ie);
                }
            }
        }
    }

    // calculated exponential backoff and add jitter
    private static long calculateBackOffWithJitter(int attempts) {
        double exponentialBackOff = Math.min(INITIAL_BACKOFF_MILLIS * Math.pow(2, attempts - 1), MAX_BACKOFF_MILLIS);
        return (long) (exponentialBackOff * RANDOM.nextDouble());
    }

    // task that randomly fails , depicting transient failure
    private static void performTask(){
        if(Math.random() > 0.7){
            System.out.println("Task succeeded.");
        }else{
            throw new RuntimeException("Task failed.");
        }
    }
}

Explanation

Constants:

MAX_ATTEMPTS: The maximum number of retry attempts (5 in this case).
INITIAL_BACKOFF_MILLIS: The initial backoff time in milliseconds (1 second).
MAX_BACKOFF_MILLIS: The maximum backoff time in milliseconds (10 seconds).
RANDOM: A Random object used to generate jitter.

retryTask Method:

Attempts to execute the performTask() method up to the maximum number of retries (MAX_ATTEMPTS).
If the task fails, the retry mechanism waits for a backoff time calculated using exponential backoff with jitter.
The backoff time increases with each retry attempt, and jitter is added to prevent simultaneous retries from overwhelming the server.

performTask Method:

This method simulates an unreliable task that might fail randomly.
If the task succeeds (Math.random() > 0.7), it returns successfully; otherwise, it throws an exception.

calculateBackoffWithJitter Method:

Exponential Backoff Calculation: The wait time is calculated using INITIAL_BACKOFF_MILLIS * 2^(attempt-1). The backoff is capped at MAX_BACKOFF_MILLIS to prevent excessively long wait times.
Jitter Addition: The backoff time is then multiplied by a random factor between 0 and 1 (RANDOM.nextDouble()). This random delay spreads out retry requests and reduces the risk of overwhelming the external service with simultaneous retries.

When to Use Circuit Breaker

We should use a circuit breaker when the service we are communicating is consistently failing. In such cases, retrying the request would only add unnecessary load and delay recovery.
A circuit breaker works by breaking the connection after a predefined number of consecutive failures. it then waits for a specified amount of time before allowing a limited number of requests to check if the external service has recovered.

Circuit Breaker With Retry Mechanism

Using a circuit breaker and retry mechanism is a common pattern to improve the resilience of applications interacting with external services. The circuit breaker prevents making requests when a service is consistently failing, while the retry mechanism handles transient failures with a strategy to attempt the request again.
When communicating with external API, we should start by attempting a request with a retry mechanism. If a failure occurs, retry with delay ( using strategies like exponential back-off and jitter ) to overcome transient issues.
Now, we can wrap the entire retry logic within a circuit breaker. The circuit breaker monitors failures across retry attempts and if a specified failure threshold is exceeded, it opens and blocks subsequent requests for a period of time.
Once the circuit breaker is open, it prevents retries from being attempted for a given duration, thus avoiding overwhelming the external service.

Checkout more Java coding interview questions: Top Java Coding Questions

Java Newsletter

Discussion about this post