Backend

Performance at 10,000 concurrent players: what actually bottlenecks

What really happens when a Minecraft server serves 10,000 active players: where Hibernate, the main thread, and the JVM actually stall, and which architectural decisions avoid it.

May 17, 202610 min readby Justin Eiletz

Performance
Hibernate
Minecraft
Java
Backend

Two years ago I sat in front of a server at night whose tick rate dropped from 20 to 7 under load. Eight thousand players online, an economy system with auctions, three custom game modes running in parallel. On paper, a solid plugin set. In practice, a performance nightmare.

What I learned in the months that followed is the foundation for JEHibernate and for the architecture I use today to write plugins that stay stable at ten thousand players. This post is the deposit of what actually bottlenecks, not what shows up in performance tutorials, but what you measure on a production server under peak load.

The three real bottlenecks

When your server stutters under load, the cause is almost always one of these three. Garbage collection, broken worldgen, or "too many entities" are symptoms, not causes.

1. The main thread is sacred

Bukkit, Spigot, and Paper are at their core single-threaded. Everything happening in the world (block updates, entity tick, player movement, chunk generation) runs on a main thread that has to come round 20 times a second. If a tick takes longer than 50 ms, players see lag.

The most common sin I find in foreign plugins: a synchronous database call inside an event handler.

@EventHandler
public void onJoin(PlayerJoinEvent e) {
    PlayerStats stats = jdbc.queryForObject(
        "SELECT * FROM player_stats WHERE id = ?",
        new Object[]{ e.getPlayer().getUniqueId() },
        new StatsMapper()
    );
    e.getPlayer().sendMessage("Welcome back! Kills: " + stats.kills);
}

Looks harmless. At 50 joins per second (totally normal on a popular server) that's 50 synchronous DB queries on the main thread. At 5 ms per query that's 250 ms of pure stall. The server has lost five ticks in that one second.

Solution: anything that doesn't have to be strictly immediate goes async. JEHibernate does this by default. The query above becomes:

@EventHandler
public void onJoin(PlayerJoinEvent e) {
    UUID id = e.getPlayer().getUniqueId();
    statsRepo.findById(id).thenAccept(opt -> {
        opt.ifPresent(s -> Bukkit.getScheduler().runTask(plugin, () ->
            e.getPlayer().sendMessage("Welcome back! Kills: " + s.kills)
        ));
    });
}

The query runs on a dedicated pool, the main thread stays clean. The runTask wrapping is necessary because Bukkit API calls themselves must be synchronous, but now the database doesn't break the tick, only a 1-µs lambda execution does.

2. N+1, the silent killer

Classic Hibernate problem, doubly painful in plugins because you also do it on the main thread.

Example: player profile with related achievements. Naive implementation: one query for the player, then one per achievement. 30 achievements means 31 queries. On a server with 10,000 players and an auction house that resolves player profiles often, those N+1 patterns add up to thousands of unnecessary queries per minute.

JEHibernate explicitly recommends fetch strategies per query:

public CompletableFuture<Optional<PlayerProfile>> findFull(UUID id) {
    return runAsync(session -> session
        .createQuery(
            "SELECT p FROM PlayerProfile p " +
            "LEFT JOIN FETCH p.achievements " +
            "LEFT JOIN FETCH p.statistics " +
            "WHERE p.id = :id",
            PlayerProfile.class
        )
        .setParameter("id", id)
        .uniqueResultOptional()
    );
}

One query, three tables, no lazy-loading drama. Latency moves from "31 × 0.5 ms" to "one query with a join", under 5 ms on a modern indexed database.

3. Connection pool size is not "more is better"

HikariCP defaults are written for web applications. Web applications have hundreds of concurrent requests, each short. Minecraft plugins have five to twenty parallel worker threads continuously reading and writing.

Concretely: a pool of 50 connections for 5 worker threads is wasteful. A pool of 5 connections for a plugin with 20 parallel tasks is a brake. I tune like this:

hikari.maximumPoolSize = (workerThreads * 2) + 1
hikari.minimumIdle = workerThreads
hikari.connectionTimeout = 5000     // ms
hikari.idleTimeout = 60000          // ms
hikari.maxLifetime = 1800000        // 30 min
hikari.leakDetectionThreshold = 30000

The formula (threads × 2) + 1 is HikariCP's recommendation for workloads heavier on CPU than I/O, typical for game servers, since a non-trivial part of persistence work is serialisation and validation, not just network I/O.

Leak detection on. Always. A forgotten session in an async path will start choking your server after three days, and Hikari tells you exactly which method it happened in.

What JEHibernate solves concretely

All of this would be folklore if you had to implement it in every plugin from scratch. JEHibernate is the collection of these patterns as a small wrapper:

A central, plugin-owned SessionFactory with the right classloader setup (problem #1 of every Hibernate-in-plugin attempt).
A RepositoryService abstraction that runs CRUD operations asynchronously by default and returns a CompletableFuture: the only way to work synchronously is an explicit blockingGet() call, which feels uncomfortable enough that you don't reach for it by accident.
Built-in Flyway integration for schema migrations without hbm2ddl=update. You should never bring live servers and hbm2ddl together.
Sane HikariCP defaults tuned for game-server workloads, plus leak detection on by default.

Caching: the only real lever past 5k players

Even with perfectly async Hibernate, you eventually hit the wall. At 10k players and a player profile that's queried on login + inventory + ranking display + auction house lookup, that's quickly thousands of DB hits per minute for data that changes maybe every few minutes per player.

The solution is classic: read-through cache. Caffeine as L1 in-memory per server, Redis optional as L2 for multi-server setups.

private final AsyncLoadingCache<UUID, PlayerProfile> cache =
    Caffeine.newBuilder()
        .maximumSize(20_000)
        .expireAfterWrite(Duration.ofMinutes(5))
        .refreshAfterWrite(Duration.ofMinutes(2))
        .buildAsync(this::loadFromDb);

public CompletableFuture<PlayerProfile> get(UUID id) {
    return cache.get(id);
}

private CompletableFuture<PlayerProfile> loadFromDb(UUID id, Executor ex) {
    return repo.findById(id).thenApply(opt -> opt.orElseGet(...));
}

Now the typical lookup lands in 200 ns instead of 5 ms, a 25,000-fold speed-up. At 10k players that's the difference between a server that stutters at peak and one that runs unfazed.

The refresh-after-write setting matters here: after 2 minutes the cache reloads in the background proactively without blocking a player lookup. Players never see a cold cache-miss latency unless it's their first login.

How I write plugins today

Three non-negotiable rules:

No synchronous I/O on the main thread. Ever. If a plugin does it and I find it during an audit, I rewrite it before any other refactoring.
Every repository method returns a CompletableFuture. The async world isn't the special case. It's the default. Synchronous calls are explicit (blockingGet()), not implicit.
Caching from day one, even when it seems trivial. A plugin without a cache is a plugin that gets rewritten in six months once player count grows. Wiring up Caffeine takes 30 minutes, saves three days later.

These patterns run on production servers today with mid five-figure active player counts. Tick rate constant 20.0, p99 query latency under 8 ms, GC pause under 5 ms. That's not magic. It's the consequence of taking the three bottlenecks above seriously before they turn into a fire.

If you want to start now

If your server is currently stuttering under load and you don't know where the problem lies, three diagnostic steps, free, ten minutes:

Spark / Timings. Both show which thread is blocking. If the main thread is hanging in a DB method, you have bottleneck #1.
Hibernate stats logger or similar. Activate it. If you see over a thousand identical SELECTs in one minute, you have bottleneck #2 (N+1) and you're missing caching too.
HikariCP pool stats. Watch them. If activeConnections regularly sits near maximumPoolSize, your pool is too small or you have a connection leak.

If you're stuck after these three steps, get in touch. A plugin audit by hour usually solves the performance problem faster than three more weekends of debugging attempts do.

JEHibernate is open source and deliberately small so other developers can read the code and adapt it. Pull requests welcome.

If you'd like to see what else I work on, my GitRoll profile shows an honest overview of my open-source activity:

gitroll.io/profile/uJqhTlv0BCHcZ7Nz5hb2lUHyeiwR2