Modern AI makes async code deceptively easy to scaffold. You describe what you want: "Process this batch of data asynchronously without blocking the main request," and within seconds you have a workin¯g implementation. An @Async decorator on a service method, a few event listeners wired up, everything compiling and running locally. The code looks right. It passes tests and it deploys without fuzz.
Then, under real load, the system falls in ways that local testing never revealed.
The lie is that async operations are safe by default.
They are not. Async operations without concurrency control are a way to move a problem from "visible now" to "invisible until production." You are not solving the problem of too much work. You are deferring it to a thread pool where it will exhaust resources silently.
The uncomfortable truth is that modern AI can generate async patterns perfectly well, but it has no concept of backpressure, resource limits, or when async becomes a liability instead of a benefit. That judgment—knowing when to async, how much concurrency is safe, and what guards need to be in place—is where engineering discipline actually matters. It is also exactly where AI-generated code fails most often.
The Problem: Async Operations Without Limits
When you mark a method @Async, Spring removes it from the normal request-response cycle and runs it on a thread pool. This solves the immediate problem: the request completes faster, the HTTP response goes back to the client immediately, and expensive work happens in the background.
But you have created a new problem. That background work still consumes resources. It still hits the database, it still uses memory. And if the work is queued faster than it can be processed, the queue grows indefinitely.
Consider a scenario where you async-publish events for every transaction sync. The sync listener runs on a thread pool, fetches data from an external API, and updates the database. Locally, with a handful of transactions, this works fine. The listener completes before the next transaction arrives. In production, with thousands of transactions arriving every minute, the queue explodes. The thread pool is backed up, new tasks are queued indefinitely. As a result, memory pressure increases, and the server slows down. Eventually, it becomes unresponsive.
Related Articles
Shared topics and tags
Newsletter
Expert notes in your inbox
Subscribe for new articles.
What makes this dangerous is that it does not feel like a failure. The code is running and tasks are being processed. Nothing throws an exception until the system runs out of memory or the connection pool is exhausted. By then, diagnosing the root cause is much harder than if the problem had surfaced synchronously.
I have debugged production outages where async operations were the culprit, but the investigation did not start there because the code "looked fine" and the problem manifested as database connection pool exhaustion, not as an obvious async issue. The real problem—unbounded queuing of background tasks—was hidden behind layers of infrastructure.
The async operation itself was not wrong. The mistake was treating it as safe without any control over how many concurrent operations could run. It is like opening a water valve and assuming the bucket will never overflow.
The Pattern: Virtual Threads Plus Semaphore Gating
The right approach to async operations is to be explicit about constraints. You make three decisions upfront:
How many concurrent operations should this system allow?
What resources do these operations consume?, and
How do you prevent unbounded growth?
Virtual threads, introduced in Java 21, change the equation for concurrency. Traditional threads are expensive OS constructs. Spawning thousands of them causes context switching overhead and memory pressure. Virtual threads are lightweight and managed by the JVM and scheduled efficiently on OS threads. This means you can spawn many more concurrent tasks without the resource cost of traditional threads.
But virtual threads are not unlimited. They still consume memory, they still need database connections. You still need to decide how many concurrent database operations your system can safely handle.
This is where the semaphore pattern comes in. A semaphore is a simple concurrency gate. You create one with a fixed number of permits. When a task wants to proceed, it acquires a permit. If all permits are taken, the task waits. When the task completes, the permit is released and the waiting task acquires it. The semaphore enforces an upper bound on concurrent operations.
The implementation uses Spring's AOP to make this transparent. You create an annotation that marks which operations need gating, and an aspect that enforces the semaphore acquisition and release. The business logic never sees the gating logic. It is infrastructure that protects the business logic from itself.
@Bean
public Semaphore databaseSemaphore() {
return new Semaphore(85);
}
This single line enforces a critical constraint: at most 85 concurrent database operations at any given time. The number is chosen based on your database connection pool size and the expected load. If your pool has 100 connections and you reserve some for synchronous requests, you might allow 85 concurrent async operations. If the load increases and you hit that limit, new async operations wait. They do not queue indefinitely. They wait for a permit to become available. This backpressure is what prevents the system from falling over.
The annotation is applied to any method that needs protection:
When this method is called, the aspect checks if a semaphore permit is available. If yes, it acquires it and proceeds. If no, it blocks the virtual thread until a permit is available. Once the method completes, the permit is released. The beauty of this pattern is that it is invisible to the method itself. The method just does the work. The infrastructure ensures that the work happens at a controlled rate.
Why AI Makes This Decision Invisible
AI is very good at generating async code. It understands @Async annotations and event-driven patterns. It can wire up listeners and services in ways that look correct and compile without errors. What it cannot do is reason about your system's resource limits or the trade-offs between responsiveness and safety.
When you ask an AI to "make this operation async," it generates code that is locally correct. It runs fine in tests. But it has no concept of what happens when a thousand async operations are queued simultaneously. It does not know whether your database connection pool has 10 or 100 connections. It does not know if you have monitoring in place to alert on backpressure. It does not know if your team has experience debugging unbounded queuing issues.
This is not a flaw in the AI. It is a limitation of the interface. The AI is doing exactly what you asked: making the operation async. It is not answering the question you should have asked first: "Is async the right choice here, and if so, what constraints do we need to put in place?"
This is where the engineer's role fundamentally changes. The engineer is not writing code anymore. The engineer is making architectural decisions about concurrency, resource limits, and when synchronous blocking is actually the right answer. The AI can execute those decisions once they are made. But the decision itself is not something the AI can make.
What I find happening increasingly is that AI generates code so quickly that teams skip the decision stage entirely. They ask "can you make this async?" instead of "should this be async, and how do we control the concurrency?" One question results in fast code with invisible problems. The other results in code that is safe at scale.
When Async Is Actually Wrong
Not every expensive operation should be async. Some things need synchronous backpressure. If a task is queued faster than it can be processed, you want the client to experience that slowness immediately. You want the HTTP request to take longer, which signals to the client that the system is loaded. You want the client's retry logic to kick in. You want operational visibility that something is wrong.
When you make it async, you hide that signal. The client gets a fast response. The system appears responsive. But under the surface, the queue is growing. By the time you notice the problem, the damage is done.
I have seen teams make email sending async because the operation is slow. Emails start to stack up. Admins do not notice for hours. When they do, tens of thousands of emails are queued, and the system is in a bad state. If email sending were synchronous, the customer would have felt the delay on the first email and asked why the API was slow. The problem would have surfaced immediately.
The right question is not "is this operation expensive?" The right question is "does the client need to wait for this operation?" If yes, keep it synchronous. If no, make it async and add backpressure controls. The controls might be a semaphore limiting concurrency, or a message queue that can be monitored, or an SLA that says "we process these async jobs within N minutes." But there must be a control.
The Decision Framework
Before making any operation async, ask these questions in order.
Does the client actually need to wait for this operation? If yes, keep it synchronous. If no, continue to the next question.
How many concurrent instances of this operation might be running in production? If you cannot estimate this, assume it is much higher than you think. Estimate conservatively.
What resources does each operation consume? Database connections, memory, external API calls? If it consumes scarce resources (like database connections), you need strict limits on concurrency.
What happens if the operation cannot proceed? Does it queue indefinitely and retry, or does it fail fast? Indefinite queueing requires monitoring and manual intervention. Fast failure surfaces the problem immediately.
What monitoring and alerting do you have in place? Can you see the queue depth, the average processing time, or the number of failed operations? If not, do not make it async. You cannot manage what you cannot see.
Decision Framework and Why It Matters
Use virtual threads for concurrent I/O operations: Allows many concurrent operations without OS thread overhead
Gate database operations with a semaphore: Prevents unbounded growth; forces backpressure at a safe limit
Make the gating transparent via AOP annotations: Business logic stays clean; infrastructure concerns are isolated
Measure queue depth and processing time: Visibility into concurrency issues; alerts before the system fails
Keep synchronous when the client needs immediate feedback: Synchronous operations provide natural backpressure to the client
Default to synchronous, make async only when justified: Simpler systems are usually safer until proven otherwise
Final Thoughts
This is the core of what it means to make architectural judgments in an era where AI can generate code faster than you can review it. AI will scaffold async patterns for you. It will compile and run, and it will pass your tests. What it will not do is reason about whether this particular async decision makes sense for your particular system.
Your job as an engineer is not to write code anymore. Your job is to decide what belongs in the system and what does not. Whether this operation should be async or synchronous is not a coding question. It is an architectural question. The decision requires understanding your system's constraints, your production load, and your team's ability to monitor and maintain async infrastructure.
The cost of getting this decision wrong is high. It is not a compilation error or a test failure. It is a production outage that manifests indirectly as resource exhaustion. The code works fine in staging, but it fails under real load. By then, the damage is done.
This is also where your engineering expertise becomes irreplaceable. No AI can make this judgment. No scaffolding tool can decide whether async is right for your system. This is the work that actually matters now.
Share this with your team if you are about to scaffold async code without explicitly deciding on concurrency controls. The question to ask before implementing is not "how do I make this async" but "should this be async, and if so, how much concurrency is safe?"