Can Web services scale / RAPS of RACS? OMG.
To: The editor at Computer magazine (computer.org)
CC: Ken Birman at Cornell University
I enjoyed reading the October 2005 edition of Computer, and after the obligatory reading of “At Random”, I flipped with interest to the Web Technologies column: “Can Web Services Scale Up?” by Ken Birman.
In my simple take on it, Ken suggests the architectural solution to scaling Web services is in relying on the infrastructure (i.e. application servers) to sort it out, while the “user simply designs a data structure and employs multicast technology” for the magical but expensive fairies to sprinkle data dust everywhere. Of course, he points out, CORBA tried and failed, erring by embedding a powerful solution into a tool mismatched to developer needs. CORBA’s lack of wide adoption is due to arguably far more errors that just that (but that’s another topic). He then also sets the scene for even more infrastructure with the requirement for sophisticated monitoring and management tools for these behemoth clusters of RAPS of RACS.
I couldn’t find where Ken addresses the reason for why such complex clustering technologies would be “needed” for scale in the first place: stateful context in the middle tier. Get rid of that and you get rid of the need for sophisticated infrastructure, too-smart-for-their-packets network devices, and super-uber-monitoring-management-of-management-monitors that monitor and manage clusters of hosts of virtual machines. And then you can also buy this neat tool from Quest to tell you why all that stuff has broken too.
State lives in the data tier. Those boys know how to manage and scale state real well. Let’s keep it there, rather than multicasting megabytes of it in the app tier to handle failover (because that’s naught to do with load in a real man’s architecture).
Web services scale horizontally very well when they are provided their context by their consumers (and perhaps handlers along the way), their state is persisted in the nether tiers of database land, and have their load balanced by simple little devices in the switch fabric. They’re a lot cheaper like that too. It’s sort of like the difference between the Library of Congress and the World Wide Web. Are we getting it yet?
Regards,
Josh





Do I use clustering? Sure, under very specific circumstances and with the design of the application being fully aware of the distribution. Do I use it for Web services? No, they don’t need it.
Refer Waldo, et al (http://research.sun.com/techrep/1994/smli_tr-94-29.pdf): “There are fundamental differences between the interactions of distributed objects and the interactions of non-distributed objects. Further, work in distributed object-oriented systems that is based on a model that ignores or denies these differences is doomed to failure, and could easily lead to an industry-wide rejection of the notion of distributed object-based systems.”
RAPS of RACS enchant the developer into the very ignorance so described. If they are lucky enough to have a skilled, experienced practitioner cogniscant of the pitfalls, such as yourself, they may very well indeed be able to successfully spend the copious amounts of funds and time required to make it work properly. The vast majority do not.
Comment by Joshua Graham — March 2, 2007 @ 12:00 am
Ken:
a) still hasn’t answered the question as to why these massively over-controlled clustering beasts are needed (which they aren’t if you don’t have state in the middle tiers)
b) only points out systems that have massive financial reserves
c) doesn’t actually point out how or why those systems use a cluster (or farms of clusters)
d) must never have used a web browser to use, apart from the most massively scaled system on the planet (being the WWW itself), one of the most well-known non-clustered but very scaled-out information retrieval system: www.google.com
Comment by Joshua Graham — March 2, 2007 @ 12:00 am
From Ken:
Thanks, Josh.
Code I (personally) wrote runs the New York Stock Exchange, the Swiss Exchange, the French (soon to be half of Europe) air traffic control system, the US Navy AEGIS, the core of Microsoft’s clustering system for the Vista release and the fault-tolerance mechanism in IBM Websphere. I guess these aren’t real systems.
Which is the real-man’s solution you had in mind?
Ken
Comment by Joshua Graham — March 2, 2007 @ 12:00 am
It is certainly true that there are many issues still to be resolved with a RAPS/RACS approach, but there is no doubt in my mind that scalability of Web services is a crucial issue that no one has solved so far. In my work we use WSs in a service-oriented grid environment for computational science. Our scientists’ workflows easily exhaust the capabilities of very powerful servers hosting Web services. If you know of a good solution that may already be out there, I would be genuinely interested to get some pointers/explanations, etc. I really mean it. If you don’t mind, please could you clarify this for me?
Wrt a monitoring and management solution, this makes a whole lot of sense, actually. We miss many opportunities to (let our systems) handle failures in a clever way by not having sufficient info at crucial moments during runtime. We still rely on human experts using expertise and intuition (sometimes looking for the needle in the haystack and almost always after the fact/failure) to keep complex, highly distributed systems running. There’s got to be a better way - and IMHO it starts with better/different monitoring and management.
Comment by Bruno Wassermann — March 15, 2007 @ 8:28 am