Benchmarks, C10M, High Scalability, Push Notifications, WebSockets

12 Million Concurrent Connections with MigratoryData WebSocket Server


We have recently completed a new performance benchmark which demonstrates that MigratoryData WebSocket Server is able to handle 12 million concurrent users from a single server Dell PowerEdge R610 while pushing a substantial amount of live data (1.015 gigabit per second). This benchmark scenario shows that MigratoryData WebSocket Server is ideal for infrastructures delivering real-time data to a huge number of users, especially for mobile push notifications infrastructures that are typically demanded by telecom customers with tens of millions users.

Benchmark Results

In this benchmark scenario, MigratoryData scales up to 12 million concurrent users from a single Dell PowerEdge R610 server while pushing up to 1.015 Gbps live data (each user receives a 512-byte message every minute). The CPU utilization diagram below shows that MigratoryData WebSocket Server scales linearly with the hardware.

c12m-migratorydata-cpu-usage

According to the chart above, MigratoryData uses only 57% CPU to handle 12 million users. The remaining 43% CPU could be used to scale even more. However, we are limited by the RAM available on this machine (we use Centos 6.4 with standard Linux kernel and only the Linux kernel memory footprint for 12 million sockets is about 36 GB).

Also, the linearity of CPU utilization above indicates that MigratoryData WebSocket Server should normally scale vertically beyond 12 million concurrent users if additional RAM and CPU are available.

Detailed Results of MigratoryData WebSocket Server Running on a Single Dell R610 Server

In the table below, it is important to note that we’ve obtained the results using the default configuration of MigratoryData WebSocket Server, a fresh installation of Linux Centos 6.4 (without any kernel source code modification or other special tuning), and the standard network configuration (employing the default MTU 1500, etc).

Number of concurrent client connections 3,000,000 6,000,000 9,000,000 12,000,000
Number of messages per minute to each client 1 1 1 1
Total Messages Throughput 50,000 100,000 150,000 200,000
Average Latency (milliseconds) 5 35 92 268
Standard Deviation for Latency (milliseconds) 36 123 263 424
Maximum Latency (milliseconds) 640 951 1292 2024
Network Utilization 0.254 Gbps 0.507 Gbps 0.767 Gbps 1.015 Gbps
CPU Utilization (average) 14% 24% 39% 57%
RAM Memory Allocated to the Java JVM 54 GB 54 GB 54 GB 54 GB

Latency is defined here as the time needed for a message to propagate from the publisher to the client, via the MigratoryData server. In other words, the latency of a message is the difference between the time at which the message is sent by the benchmark publisher to the MigratoryData server and the time at which the message is received by the benchmark client from the MigratoryData server.

Hardware & Setup

MigratoryData Websocket Server version 4.0.7 ran on a single Dell PowerEdge R610 server as follows:

Model Name Dell PowerEdge R610
Manufacturing Date Q4 2011
Dimension 1U
Number of CPUs 2
Number of Cores per CPU 6
Total Number of Cores 12
CPU type Intel Xeon Processor X5650
(12 MB Cache, 2.66 GHz, 6.40 GT/s QPI)
Memory 96 GB RAM
(DDR3 1333 MHz)
Network Intel X520-DA2
(10 Gbps)
Operating System Centos 6.4
Java Version Oracle (Sun) JRE 1.7

The benchmark clients and benchmark publishers ran on 14 identical Dell PowerEdge SC1435 servers as follows:

  • Four servers Dell SC1435 were used to run up to four instances of the benchmark publisher. For example, to publish 100,000 messages per second, we used four instances of the benchmark publisher on the four servers Dell SC1435, each publisher sending 25,000 messages per second.
  • Ten servers Dell SC1435 were used to run up to ten instances of the benchmark client. For example, to open 12,000,000 concurrent connections, we used ten instances of the benchmark client on the ten servers Dell SC1435, each client opening 1,200,000 concurrent connections.

The server Dell PowerEdge R610 (used to run a single instance of MigratoryData Server) and the 14 servers Dell PowerEdge SC1435 (used to run benchmark clients and benchmark publishers) were connected via a gigabit switch Dell PowerConnect 6224 enhanced with a 2-port 10 Gbps module.

The Benchmark Scenario

  • Each client subscribes to a single different subject; for example, to achieve 12 million concurrent users, we used 12 million subjects.
  • Each client receives a message every minute; for example, to push a message per minute to 12 million concurrent users, the publisher sent 200,000 messages per second (the subject of each message was chosen randomly from the total of 12 million subjects)
  • The payload of each message is a 512-byte string (consisting of 512 random alphanumeric characters)

Methodology

We performed 4 benchmark tests corresponding to the 4 results summarized above, in order to simulate 3,000,000 / 6,000,000 / 9,000,000 / 12,000,000 concurrent users from a single instance of MigratoryData WebSocket Server.

The clock of the Dell R610 server (used to run MigratoryData Server) and the clocks of the 14 servers Dell SC1435 (used to run benchmark clients and benchmark publishers) were synchronized via ntpd. The latency was measured for all messages, not only for a sample. We’ve measured mean latency, maximum latency and standard deviation for the latency during 10 minutes and the results are reported above. We’ve also ran the most demanding scenario with 12 million concurrent connections during 6 hours and observed that MigratoryData WebSocket Server remains perfectly stable.

Linear Horizontal Scalability

MigratoryData WebSocket Server and its APIs offer the possibility to build a high-availability cluster.

Each instance of MigratoryData WebSocket Server in the cluster runs independently from the other cluster members. It exchanges only negligible coordination information or, depending on the clustering type you configure, does not exchange any information at all with the other cluster members. Therefore, MigratoryData WebSocket Server offers linear horizontal scalability.

One can deploy a high-availability cluster of MigratoryData servers to achieve any number of concurrent users. For example, using the linear horizontal scalability of MigratoryData WebSocket Server and the 12 million vertical scalability demonstrated here, one could achieve say 60 million connections using a cluster with 5 instances of MigratoryData WebSocket Server running on 5 Dell PowerEdge R610 servers.

Note: Even if MigratoryData WebSocket Server comes with linear horizontal scalability, in a production deployment, one also needs to consider the situation when a cluster member might go down. If this were to occur, the users of the server which goes down will automatically be reconnected by the MigratoryData API to the other cluster members. Thus, the other cluster members would support the load introduced by the member which fails.

The implication of this is that, for the example above, in a production deployment, it is recommended to have at least 7-8 servers to achieve 60 million concurrent users such that, if a failure were to occur, each server will have enough reserve to accept part of the users of the cluster member which fails.

Conclusion

In 2010, we’ve achieved 1 million concurrent connections on a single 1U server. While handling 1 million concurrent connections on a small server still remains a challenge for the WebSocket servers’ industry, we prove here that MigratoryData’s WebSocket Server scales an order of magnitude higher and achieves 12 million concurrent connections on a single 1U server.

10 thoughts on “12 Million Concurrent Connections with MigratoryData WebSocket Server”

  1. Very Exciting! I have few question, based on my observation with 24×7 connected systems, the socket connection would lose connectivity in average after 2-4 days. Some might take longer and some every few seconds, due to ISP, Location, etc. Now in 12 Million benchmark what would be fantastic to see is how your server holds up if you close 100 sockets/second and open 100 sockets/second while maintaining your 12 million connections. Is there any type of benchmark using similar scenario? Using XMPP servers and some other technologies, 50k connections per server(8 core) was my highest number
    One more question, how long it took to open all 12 million connections on your server?

    1. Hi Ramin and thanks for your comment.

      Because you talk about 24×7 availability, please note that multiple MigratoryData instances can be configured as an active/active cluster that provides: horizontal scalability, load balancing and failover, high availablity with no single point of failure, and guaranteed message delivery even in presence of sudden hardware failures or network disconnections.

      The client monitors its connection to the cluster. When a disconnection is detected, it automatically reconnects to another cluster member. In addition, the client receives from the new cluster member any message published during the failover, so no message is lost.

      So, clients that disconnect from a MigratoryData server are automatically reconnected to another MigratoryData server in the cluster with no message loss.

      Currently we don’t have benchmark results when clients disconnect and reconnect while handling 12 million concurrent connections. We will try to include this during the next round of benchmarks. However, please note that in large production deployment MigratoryData works excellent when many clients connect and disconnect and while MigratoryData is handling lots of concurrent connections.

      Concerning your last question. We didn’t record this information. But because we already received many questions related to this, we will measure this information during the next round of benchmarks. If I remember well, MigratoryData server accepted clients at a rate of about 30k/40k connections per second (each client benchmark tool which simulated 1.2 million concurrent users connected at a frequency of 3k/4k connections per seconds and we started 10 instances of the client benchmark tool simultaneously). So, we needed about 6 minutes to open the 12 million concurrent connections.

      Mihai

      1. Hi Mihai,
        Thanks for your quick reply. The fact alone that you were able to have 12 millions connections open is incredible. I got to admit your numbers have 2 to 3 digits more than what I have ever seen. After holidays I got to check out your system.
        On a sparate note, in production even if my system can hold up to 12 million connection, in case of system crash, a massive tsunami of 12 million reconnects, will cripple the rest of servers and supporting database, etc….. So I wouldn’t design my system to carry more than 100k – 500k connections per server.
        R.

        1. Ramin,

          Just let us know when you are ready to commence the benchmark and we will provide you all the necessary tools and information.

          Concerning your note. The deployment should indeed take into consideration the situation when a cluster member goes down and its clients reconnect suddenly to the cluster. Please note however that our APIs will distribute the clients of the broken cluster member to *all* the remaining cluster members. Therefore, the larger MigratoryData cluster is, the less the impact of a cluster member failure will be.

          Also, see my Note in the “Linear Horizontal Scalability” section above.

          So, the number of concurrent connections per cluster member should be correlated with the cluster size and your use case. The MigratoryData Benchmark Tools offer the ability to simulate your use case and calibrate the cluster size and how many concurrent connections to assign to a cluster member. You can test the stability of your cluster when a cluster member goes down and its clients automatically reconnect to the remaining cluster members.

          Mihai

      2. Around 2010 I played around with 2 million concurrent connections using Ubuntu. However, I too noticed that the server can only manage 40k connections per second *OR* service it’s existing connections. Likewise, if I’m connecting at a rate of 20k connections per second (and I can only manage 40k connections per second tops) then the rate at which I can service the existing millions of connections halves 😦

        In the real world on the real internet this is a problem. Why? 40k connections is only 0.4% of the 10 million connections. And as Ramin said, there are no end of reasons why connections can close unexpectedly. So if only 40k out of 10 million connections fail per second and try to reconnect then the single server is blocked doing only connections. And if only 20k out of the 10 million connections fail per second and try to reconnect then the single server is half blocked and the message delivering ability to the existing connections is halved 😦

        Do you have any servers in production on the internet which commonly have millions of concurrent connections? If so, what is the common ratio per second of incoming connections to existing connections? And how much are peak connections per second on an average day (not assuming the tsunami / thundering herd scenario)?

        Do you have any ideas on how to workaround the 40k connections per second bottleneck in the Linux network stack?

Leave a comment