Friday, December 29, 2006

Out on the edge with the ZODB

Someday I'd love to write a book called something like Application Development with the Zope Object Database (ZODB). Writing it would let me share the information I've learned, as well as make me learn the bits I still haven't gotten around to. That's how I really think of my Zope 3 work--in that context, I'm

  1. A Python programmer
  2. who uses the ZODB and related libraries
  3. with web-related components and libraries.

The ZODB is a very nice tool to work with. The Zope books out there just touch on it, and I think many Zope technologies can be introduced from the perspective of the ZODB, rather than the other way around. Maybe I'll write that book yet.

In any case, last week at work we found another interesting ZODB edge case to contemplate. More than a year ago, a colleague had added an interesting and useful feature to a package I largely created, zc.catalog (PyPI, browse SVN, checkout SVN). Rather than processing catalog requests as they came in, the feature allowed the requests to be queued for the end of the transaction. The big win here is for objects that may have multiple changes within a transaction, such as during object creation and initialization: we can collapse all the index requests for a given object into one, and hopefully save redundant work.

When I studied how this this had been done, I saw that the work queue was a persistent object with a connection to the database. Why, I wondered? This object should always begin and end every transaction empty! Can't we just use a purely transient object, discarded at the end of the transaction?

The answer (or at least my answer) was subtransactions. If you index something, then start a subtransaction, then index something else, then abort the subtransaction, then the first index request should remain in the queue. We really want transactional behavior, not persistence, and persistence is the easiest way to get that (without writing a transaction manager and so on).

Our problem is somewhat obvious in retrospect: using a persistent object for this queue can cause write conflicts. In the case of an object without special conflict resolution code, concurrent transactions that both cause the catalog to index anything will make the queue "dirty" in both transactions, and thus generate a conflict error. The object must have a specific conflict resolution policy that notices when all three states (mine, the conflicting one, and the shared historical one) are the same; or (I learned) you can have the client code call _p_invalidate on the queue after it is empty to discard the dirty state and return to the initial version.

The queue was a persistent dict. BTree buckets have conflict resolution code, but unfortunately not this bit of logic, so they would not have helped (if I remember and understand correctly, an empty bucket usually means that it should be discarded from a tree, so it can only resolve a conflict safely if all three states are empty, not just the two conflicting ones). The remaining solutions are to write an object with the conflict resolution code, to use _p_invalidate, or possibly to make a custom transaction manager. As I understand it, a colleague plans to choose _p_invalidate, as well as to add tests for the queueing behavior that were omitted from the initial addition.

The transaction manager seems like it might be a "purer" solution, but the other approaches might be more practical. In any case, I found the problem and resolution to be fascinating, and it added another few bits of ZODB knowledge to my store.

Tuesday, December 12, 2006

Web Component Development with Zope 3, Second Edition

I got my comp copy of Philipp von Weitershausen's Web Component Development with Zope 3, Second Edition yesterday. I was the primary technical reviewer for this second edition. It's an impressive book, and I'm happy I was able to be a part of it.

It's also fascinating as an artifact documenting the current state of Zope. The message it sends is overwhelmingly positive.

  • While the usual polish problems in technical books are still evident (shame on us reviewers), it still is one of the better technology books I've encountered--looking through my library, say, top 10%. It's impressive that we attract authors and books of this caliber.
  • Springer is a very high-quality publisher.
  • I don't know of any other Zope books to get to a second edition--and the first edition is from 2005!
  • Phil Eby's foreward is high praise from a well-respected voice in the Python community.
  • The system the book describes is largely attractive and powerful from my perspective--it reminds me why I got excited about Zope in the first place.
Generally, it gives me a strong, welcome feeling of health for Zope.

There is a negative side to be found as well.

  • The book highlights some obvious holes for the beginner--for instance, when I was reviewing the text and trying to help polish some of the story, I was really struck by how painful the Zope 3 indexing story must be for a beginner, even while I love the flexibility.
  • The book doesn't touch on AJAX, because Zope 3 doesn't have an integrated story for it. It's easy enough to build AJAX apps--nothing is stopping you--but it's up to you. No nice AJAX widget story.
  • Philipp mentions a lack of relational database support beyond the basic transactional support of the database adapters; this both highlights a legitimate lack, and points to a continuing tension in the Zope community over the importance and use of the ZODB, the Zope Object Database.
  • Phil Eby's foreward alludes to the larger Python community's apathy or even antagonism to Zope; he also admits that he himself does not use it (presumably favoring his own PEAK).
Ah, well.

In any case, I highly recommend the book to anyone wanting to learn Zope 3. On the one hand, it's really the only option now, with Stephan Richter's book not moving to a second edition, and no other books on the horizon; on the other hand, even without competitors, it's amazingly welcoming, clear and thorough.