Scaling Websockets to 1M+ Concurrent Connections
The Challenge
When we started building out our real-time notification engine, we hit a wall at 50k concurrent connections. The Node.js event loop was getting blocked, and memory usage was spiking unpredictably.
Kernel Tuning
The first step was to tune the Linux kernel. We increased the maximum number of open file descriptors (ulimit -n) and tweaked the TCP keepalive settings. This allowed a single server to maintain many more idle connections without crashing.
Cluster Mode & Redis Adapter
We moved to a clustered architecture using the Redis adapter for Socket.io. This allowed us to scale horizontally across multiple nodes. Messages published to one node were propagated to all others via Redis Pub/Sub.
The Result
With these changes, we successfully benchmarked 1M+ concurrent connections with sub-100ms latency on a cluster of just 3 standard instances.
Discussion (2)
Great detailed breakdown. The kernel tuning part is often overlooked.
Would love to see the specific sysctl.conf settings you used.