Skip to content

Your 1M RPS Service is a Ticking Time Bomb

“Goroutines are cheap.” Every Go developer has heard this. And for common use cases, it’s true. But when you start aiming for 1 Million Requests Per Second, that saying becomes dangerous.

I ran some tests to see exactly when the “Naive” approach to Go concurrency falls apart and how an adaptive worker pool changes the game.

I used a 10-core machine and pushed 1 Million requests per second for 30 seconds. Each request was designed to be somewhat realistic: 20ms of processing and a 500KB memory allocation (like handling a decoded JSON object).

func handle(w http.ResponseWriter, r *http.Request) {
go process(r.Context()) // Just spawn and hope
}
pool, _ := adaptivepool.New(
adaptivepool.WithMaxWorkers(5000),
adaptivepool.WithQueueSize(100000),
)
// ... in handler
err := pool.Submit(ctx, job)

The Naive version immediately tried to spawn over 100,000 goroutines. The result was a mess. The CPU spent more time on context switching and garbage collection than actual work.

With the adaptive pool, even though we were rejecting millions of incoming requests, we actually finished 4.6 times more jobs than the naive version did. By capping the workers at 5,000, the CPU could actually focus on finishing tasks rather than managing them.

RAM usage is where the naive approach really fails.

  • Naive: Hit 50.86 GB of RAM. Most standard servers would have crashed long before this.
  • Adaptive Pool: Stayed steady at 1.41 GB.

The naive approach is essentially a landmine. It works fine during your dev tests, but in production it’ll just eat RAM until the OS kills your process.

When the system is overwhelmed, users feel it.

  • Naive: Average latency shot up to 2,063 ms. That’s a 100x increase for a 20ms task.
  • Adaptive Pool: Kept it around 161 ms under the same pressure.

The pool rejected 31 million tasks, which sounds bad, but it’s actually exactly what you want. It’s better to give some users an immediate error than to make every single user wait 2 seconds before the whole server crashes anyway.

MetricNaive ApproachAdaptive Pool
ReliabilityHigh risk of OOMResource-bounded
Latency2,000ms+ (Unusable)160ms (Acceptable)
System HealthNear-zero CPU for workFocused on execution
ResultServer crashSystem stays up

Scaling to high throughput isn’t about how many goroutines you can Create. It’s about how many you can manage before the overhead kills your system.

Concurrency is a resource, just like memory or CPU. If you don’t bound it, your system isn’t production-ready.

Check out the code on GitHub: go-adaptive-pool