Stop Lying to your Bulk Load (Spring Boot 4)

Stop Lying to your Bulk Load (Spring Boot 4)

Loading 1M records? It’s a system problem, not a loop.

ximanta.sarma@gmail.com

5 min readApr 2026

Spring Boot System Design Data Processing Concurrency Parallelism Performance Engineering

Part of the "Stop Lying to Your Stack" series

The conversation usually starts with confidence. "We need to load 1 M records from a CSV file on startup. Should be straightforward. Stream it in, batch insert, we're done." Then production happens, startup hangs, and the database connection pool exhausts. Your API becomes unresponsive because some background task is saturating the database.

The lie is this: you can load massive datasets without architecture. You can't. The naive approach dies not because Java is slow but because you have made invisible assumptions about memory, concurrency, and what the database can actually do at startup time.

As AI implementations handle more of the routine work, this kind of architectural judgment becomes the differentiator. Knowing when to serialize, when to parallelize, and how to build systems that don't masquerade as simple when they are not, matters more than ever.

The Problem: Naive Bulk Loading Doesn't Scale

You have seen this pattern:

List all = csvParser.parseAll();
repository.saveAll(all);

On a small dataset, it works. On 1 M records:

The parser loads everything into heap memory. JVM heap exhaustion. Out of memory errors.
Even if it fits, one saveAll() triggers one database transaction, one lock acquisition, N individual insert row operations.
The database connection pool becomes a bottleneck. Concurrent operations (API requests, other batch jobs) starve.
Startup hangs and your system appears broken.

@Component
public class BulkDataBootstrap implements ApplicationRunner {

    private final BulkDataRepository repository;
    private final ApplicationEventPublisher eventPublisher;

    @Override
    public void run(ApplicationArguments args) throws Exception {
        if (repository.count() == 0) {
            loadData();
        }
    }

    private void loadData() throws IOException {
        ClassPathResource resource = new ClassPathResource("data/bulk-data.csv.zip");

        try (ZipInputStream zis = new ZipInputStream(resource.getInputStream())) {
            ZipEntry entry = zis.getNextEntry();

            while (entry != null) {
                if (!entry.isDirectory() && entry.getName().endsWith(".csv")) {
                    parseAndBatch(zis);
                }
                entry = zis.getNextEntry();
            }
        }
    }

    private void parseAndBatch(InputStream stream) {
        List<Record> records = csvParser.parse(stream);

        AtomicInteger count = new AtomicInteger();
        List<Entity> batch = new ArrayList<>();

        for (Record record : records) {
            batch.add(mapper.toEntity(record));
            count.incrementAndGet();

            if (count.get() >= 20000) {
                eventPublisher.publishEvent(new BatchLoadEvent(batch));
                batch = new ArrayList<>();
                count.set(0);
            }
        }

        if (!batch.isEmpty()) {
            eventPublisher.publishEvent(new BatchLoadEvent(batch));
        }
    }
}

@Component
public class BatchLoadListener {

    private final BulkDataRepository repository;
    private final TransactionTemplate transactionTemplate;

    @Async(value = "bulkLoadExecutor")
    @EventListener
    @DatabaseSemaphoreGuard
    public void handleBatchLoad(BatchLoadEvent event) {
        transactionTemplate.executeWithoutResult(status -> {
            repository.saveAllAndFlush(event.getRecords());
        });
    }
}

@Configuration
public class BulkLoadConfig {

    @Bean(name = "bulkLoadExecutor")
    public AsyncTaskExecutor bulkLoadExecutor() {
        SimpleAsyncTaskExecutor executor = new SimpleAsyncTaskExecutor();
        executor.setVirtualThreads(true);
        executor.setThreadNamePrefix("BulkLoad-");
        return executor;
    }
}

@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface DatabaseSemaphoreGuard {
}

@Aspect
@Component
public class SemaphoreGuardAspect {

    private final Semaphore semaphore;

    @Around("@annotation(DatabaseSemaphoreGuard)")
    public Object guard(ProceedingJoinPoint pjp) throws Throwable {
        semaphore.acquire();
        try {
            return pjp.proceed();
        } finally {
            semaphore.release();
        }
    }
}

@Bean
public Semaphore databaseSemaphore() {
 return new Semaphore(85); // Leave headroom for other operations
}

@Component
public class BatchLoadListener {

    @EventListener
    @ApplicationModuleListener
    @DatabaseSemaphoreGuard
    public void handleBatchLoad(BatchLoadEvent event) {
        // ...
    }
}