Jay Kreps of LinkedIn recently made an excellent detailed 360° overview post explaining why log is an underlying data structure of pretty much everything in distributed systems. Well, I feel tempted to add that same holds true for collaborative editing (like in Google Docs and others). Also, I’d like to point out the relation between partially ordered logs, the offline-first approach, AP systems and the so-called Web 3.0 (like in Meteor, pouchdb. Any real-time approach to collaborative editing decomposes a document into a sequence of atomic operations. The most known one is OT, also there is the WOOT/CRDT based family, including CT. GDocs is an OT system. I did a CT system currently in beta at letters.yandex.ru.
Note that letters is a tiny pilot project which is totally incomparable to GDocs in terms of man-years spent (I expect at least two orders of magnitude difference; three is even more likely). Still, letters works offline effortlessly from the day one; if your WiFi flaps badly, you may not even notice that. GDocs implemented offline mode after some years of work and it works sorta conditionally (i.e. there are caveats, like a native extension is needed and the browser must be Chrome). This villariba-villabajo story definitely deserves some explanation.
In a real-time collaborative editor, concurrent operations come “faster” than they are propagated so it is no longer possible to pretend there is any linear order of events (which doesn’t exist in distributed systems anyway). Hence, OT transforms operations to achieve the same final text independently of their partial order. Because of transformations and cross-dependencies, operations become entangled and complexity snowballs.
Meanwhile, the CT borrows some serious tricks from the land of hardcore distributed systems, namely (1) Lamport timestamps (2) vector clocks and (3) partially ordered logs of immutable operations. Attaching a timestamp to every letter seems a pipe dream in the beginning, but it works quite well in practice. Having 100KB of text instead of 10KB is not a major burden. Regarding less-ordered operation logs, Mr Kreps mentions them in the context of hardcore AP systems and Amazon Dynamo in particular. These logs are trickier to use as they require application-specific reconciliation, but such a system scales much much better. (That’s why I’ll never buy MongoDB’s “humongous” scalability claims. Turn it that way or another, but MongoDB is a single-master system from the C corner
AP techniques let CT handle cases of high RTT nicely including of course the extreme case of offline work (which means RTT of several hours or days essentially). There are several points in a CT system where operations from different sources are simply mixed together effortlessly and the result is correct.
So, let me do some speculation on the future of so-called Web 3.0 technologies. WebSocket allows for real-time message exchange between a browser and a server; WebStorage and IndexedDB allow to keep data at the client and WebRTC let browsers talk to each other. These new technologies definitely support the trend of more data and more code being pushed closer to the end user and farther from the server. But the latter aspect is a great inconvenience.
The first reaction of developers was to proxy a database to the client to have a familiar LAMP-like architecture inside the browser instead of the server. Well, there are two shortcomings here. First, proxying a single-master system to zillions of real-world clients actually puts it into the native land of AP systems. Which is a round peg, square hole situation. And even if we use an AP database, such level of fontend-backend tight coupling is not necessarily good. So, I see those as the ugly first generation of “Web 3.0” systems.
My bet stays with some sort of AP middleware using all the serious tricks inside the browser. As we see from the aforementioned letters system, that is perfectly feasible. Even in a situation when every keystroke is an event in a partially ordered log.
Interesting times are coming.