Communication costs in real-world networks

17 May 2013

tl;dr: Network latencies in the wild can be expensive, especially at the tail and across datacenters: how bad are they, and what can we do about them? Make sure to explore the the demo.

Network latency makes distributed programming hard. Even when a distributed system is fault-free, any communication between servers affects performance. While the theoretical lower-bound on communication delay is the speed of light—not horrible, at least within a single datacenter—latencies are rarely this fast. I’ve been working on and benchmarking communication-avoiding databases and wanted to isolate and quantify the behavior of real-world networks both within and across datacenters. This post contains both an interactive demo of what we found, some high-level trends, and some implications for distributed systems designs.

I wasn’t aware of any datasets describing network behavior both within and across datacenters, so we launched m1.small Amazon EC2 instances in each of the eight geo-distributed “Regions,” across the three us-east “Availability Zones” (three co-located datacenters in Virginia), and within one datacenter (us-east-b). We measured RTTs between hosts for a week at a granularity of one ping per second.1

I've made the raw data available on Github but, as an excuse to play with D3.js, I built this interactive visualization; select different percentiles and drag and zoom into the graph:2,3

High-level Takeaways

Aside from the absolute numbers and the raw data, I think that there are a few interesting takeaways. If you’re a networking guru, these may be obvious, but I found the magnitude of these trends surprising. (N.B. These aren’t necessarily Amazon-specific, and this is hardly an indictment of AWS.)4

Implications for Distributed Systems Designers

Aside from any particular statistical behavior or correlations, this data highlights the importance of reasoning about latency in distributed system design. While many five-star wizards of distributed computing have long warned us of the pitfalls of network latency, there are at least two additional challenges today: almost every new system is distributed and many systems are operating at larger scale than ever before. The former means that more distributed systems developers need communication-avoiding techniques, while the latter means that the tail will continue to grow. Even if we solve the LAN latency problem, the lower bound on communication cost is still much higher than that of local data access, and multi-datacenter system deployments are increasingly common. While we can reduce some inefficiencies today, there are fundamental barriers to improvement, like the speed of light; I believe the solution to avoiding latency penalties will come from better software, algorithms, and programming techniques instead of better network hardware. Better languages, semantics, and libraries are a start.


Footnotes

[1] There’s a non-negligible chance that this post generates debate with respect to this methodology. My primary purpose for this experiment was to demonstrate the considerable gap between LAN and WAN latencies, which are easily captured by the data (if this is your cup of tea, let’s talk!). However, it’s possible that EC2 virtualization and the choice of m1.small instances led to higher latencies due to factors like multi-tenancy and VM migration. There’s also no doubt that larger packet sizes would change these trends; indeed, in recent database benchmarking, we’ve observed several additional effects related to local processing and EC2 NIC behavior under heavy traffic. Please feel free to leave a comment or get in contact, especially if you have suggestions for improvement or have any data to share; I’ll gladly link to it and use it if possible.

[2] If you like this stuff, there’s some really cool research that studies the metric spaces that arise from network topologies; one of my favorite papers is “Network Coordinates in the Wild” by Ledlie et al. in NSDI 2007, which applies network coordinates to the (real-world/”production”) Azureus file sharing network.

[3] My apologies for overlaying the cross-AZ and the us-east results on top of the cross-region data. I looked into pinning each of these two clusters in designated locations but was only able to pin one at a time and eventually gave up, settling for the effect that auto-adjusts the label size.

[4] I don’t study networks (rather, I spend my time building systems on top of them), but there’s a lot of ongoing work on alleviating these problems. For a short position paper regarding what should be possible and what we may need to fix, check out “It’s Time For Low Latency” by Rumble et al. in HotOS 2011.

You can follow me on Twitter here.