In the latest issue of the ZeroC newsletter Marc Laukien picks up the baton for binary protocol performance over XML/SOAP in an article rhetorically called "Why Smart People Defend Bad Standards." The article follows up on my posting on the excellent work being done at CSIRO and Macquarie University on evaluating performance (among other aspects) of enterprise middleware, for where there were subsequent follow-ups from Michi Henning (also of ZeroC) rebutting those figures, Mark Little pointing out that you pick the right tool for the right job, and Savaspointing out that in distributed computing Waldo is king.
There are a couple of inaccuracies in the newsletter which I should point out before going any further, not Marc's fault thought since he wasn't to know that:
- I'm a bad person cheering for a smart Internetworking standard; and
- I recently saw a presentation on the latest results from the SOAP performance at a workshop at Sydney University, presented by Paul Greenfield of CSIRO and hadn't merely read Alex's PowerPoints. I spent a great deal of time talking to Paul during that workshop about precisely this issue and getting my head round his initially counter-intuitive results.
One important point to remember when considering the claims made by the CSIRO/Macquarie University researchers is that they have no commercial axe to grind and in no way stand to make any gains from publishing the facts about system performance save enhancing their academic reputations. Conversely commercial organisations have much to gain by having performant products and therefore will always strive to unequivocally demonstrate their product's performance. You should also keep in mind that I suck at maths in any calculations that follow.
To kick things off, Marc puts forward figures for SOAP and binary protocol middleware where he states:
"Alex [Ng, of Macquarie] quotes 7ms for a simple SOAP message"
and
"Ice and other binary-protocol middleware, such as CORBA ORBs, are tens or hundreds of times faster, with latency figures in the range of 0.03ms to 0.3ms."
Quite so, I think those figures are very plausible especially for system like ICE which the clever folks at ZeroC have no doubt optimised very well. However, as I made clear in my original posting, over large distances even the speed of light becomes an important limiting factor in performance. So let's cast our comparison within the same framework that recent CSIRO/Macquarie University work used - Internet scale integration.
A Simple Experiment
The CSIRO/Macquarie work used a fast private network between Sydney and Canberra. The distance between these two cities is approximately 300km. Short by Australian standards but I suspect a fairly average end-to-end distance in much of the rest of the world. Now, the speed of light is approximately 300,000km/sec which means that information can travel from Sydney to Canberra in around 1ms, assuming the intervening network equipment causes no latencies (which is slightly inaccurate, but will suffice for this experiment). For roundtrip which enables true RPC this doubles to around 2ms (ok- this isn't the canonical case for Web Services, but just roll with it for now).
So before we even start we have a 2ms latency to contend with. Thank you very much Einstein.
Things are looking up in the binary protocol experiment since the additional latency is a meagre 0.03ms per node in the best case and a still, well, meagre 0.3ms per node in the worst case. This brings the total latency to somewhere between 2.06ms and 2.6ms (roundtrip plus latency at each end). Pretty respectable I think.
In the SOAP case, as one might expect things are not quite as optimal such is the penalty for using a general-purpose Internetworking protocol. We can assume that at best SOAP processing will incur an additional 7ms latency at each end of the link bringing the SOAP total to 16ms - around 14ms slower than the binary version.
So in a realistic Internet situation which both SOAP and (say) ICE are designed for the difference is about 14ms. That is, a good binary protocol implementation is about 8x faster than the .Net 1.1 SOAP stack which was the basis for the CSIRO/Macquarie work. Now 8x faster is an impressive figure and in some cases will be well worth sacrificing some degree of interoperability for. However 8x falls a little short of the "tens or hundreds of times faster" that the latest ICE newsletter claims.
Now think for a minute that 300km is not actually that far. Darwin and Adelaide (both of which I am assured have Internet connectivity) are just over 3000km apart - a factor of 10 times larger than the Sydney to Canberra distance. Again we'll assume that the network gods are benevolent up to the point of general relativity which means that we have a latency of around 10ms one-way and so 20ms for a roundtrip. For the optimised binary case this means a total latency of somewhere between 20.06ms and 20.6ms. Still pretty respectable. For the SOAP case we have a total roundtrip time of 20ms plus the SOAP processing latency of two lots of 7ms which gives a total of 34ms. In this case the optimised binary version beats the SOAP version by 14ms again.
"Can you see what it is yet?"
Eventually the limiting factor for integration of distant systems is the speed of light and not the latency of processing individual messages. The latency difference between the two approaches will always be approximately 14ms but as network latency gets larger (over distance) the cost of processing the message payload for either binary or SOAP has significantly less impact on overall latency.
Rockin' in the Real World
But as all network gamers know real networks don't approach the speed of light - not even over short distances - and lag is just as much your mortal enemy as frag. For example to ping a peer on my local wireless network the best latency (as measured by running batch of pings) is around 2ms but my average latency is 100ms over a negligible distance compared to the speed of light. In the average case on this network the binary protocol is a mere 14% faster than SOAP - certainly not tens of times faster.
Of course everyone knows that wireless sucks for latency and real gamers play on real gamerboy LANs. I'm working in China at the moment and the Chinese love on-line gaming almost as much as they love smoking and gambling (lots) so of course the client on this engagement has a blisteringly quick Gigabit network for *ahem* work. Well at least during office hours it's for work - usually. The ping on this network is on average below 1ms which means that once again binary trumps SOAP by a long margin : approximately 1ms versus approximately 15ms - SOAP is 15 times slower on this network - the first measurement that got into the hitherto mythical "tens of times" faster. One and a half tens faster anyway.
However transfer protocols like SOAP and middleware like ICE are designed for Internet scale integration (though with ICE's latency there is no reason to not use it in a LAN if you don't have interoperability to worry about). Let's see how SOAP and binary stack up with a real, but fast, Internet connection. Since I'm in China I don't have access to any fast Internet connections so thanks to Savas for pinging a few places on my behalf.
Savas is currently (but not for much longer) based at the School of Computing at the University of Newcastle upon Tyne in the UK. Newcastle University is the central hub for the NorMAN network and is North Eastern England's route onto JANET, the super quick UK academic network, via a 2.5 Gbit/s link. From NorMAN Savas enjoys a 1GBit connection into his office which I miss and am supremely jealous of. While this is hardly a typical home broadband package it's a good starting point for seeing how the latent the Internet can be, while factoring out concerns such as poor domestic ISPs.
In three separate and broadly representative pings, we observe the following latencies:
| Route | Best Ping Latency |
| Newcastle University to Imperial College London | 10ms |
| Newcastle University to ForthNet.gr in Greece | 70ms |
| Newcastle University to Columbia University | 78ms |
In terms of end-to-end latency for a roundtrip (assuming symmetry on the network) we get:
| Route | Average Case Binary Latency | SOAP Latency | SOAP Penalty |
| Newcastle University to Imperial College London | 20.66ms | 34ms | 64.57% |
| Newcastle University to ForthNet.gr in Greece | 140.66ms | 154ms | 9.48% |
| Newcastle University to Columbia University | 156.66ms | 170ms | 7.85% |
So when you are one hop away from an expensive national backbone network and a couple of hops (via that network) away from another well connected endpoint, SOAP is a little over one and one-half times slower than binary (tens? hundreds? Not really). Of course not many of us are afforded such luxuries and so our traffic has to traverse a few hops over better or worse networks. For example the over the path to ForthNet in Greece, the penalty is a little under 10% greater latency for SOAP. Over the geographically longer but somewhat faster transatlantic links to the USA the SOAP penalty is 8.5% versus binary.
Of course the typically larger messages which SOAP (even with MTOM) implies means that SOAP messages are more susceptible to packet loss, retransmission, and therefore larger latencies than binary. And this will tend to degrade SOAP performance over distance too by some factor depending on the reliability of the underlying network. Nonetheless I stand by my original posting that while SOAP is slower than a good binary protocol it's not by much. And it's getting faster.