Monday, February 16, 2009

Oh, the farmer and the cowman should be friends: URI parsing with Routes versus graph traversal

(This post's title alludes to a song from the musical Oklahoma, in case you were wondering.)

I, like many web application developers, am impressed with the Routes model for mapping a URI to application code (as in RoR, or any number of Python versions). I plan to use it for "hobby" work, and I'm advocating it at my job.

For many web applications, it seems to work as well or better than the other approach to web application URI parsing with which I'm familiar, graph traversal. In the graph traversal approach I know, you typically divide up the URI path elements by slashes into individual path elements. For example, "/musical_theater/rodgers_and_hammerstein/oklahoma" becomes ["", "musical_theater", "rodgers_and_hammerstein", "oklahoma"]). Then you start with a given graph node and use each path element as input to traverse the graph. For instance, repoze.bfg strictly uses __getitem__ to traverse the graph, so the example URI above might equate to root_object["musical_theater"]["rodgers_and_hammerstein"]["oklahoma"].

The Routes model is particularly nice for web sites publishing square, non-hierarchical data. If you don't have a graph to traverse, then you have to do something else!

Moreover, I buy into the argument that Routes encourages you to think about your URI space separately from your model. This fits in well with REST philosophies, in particular if you regard your URIs as a significant aspect of your user interface.

In defense of graph traversal, I generally have found that traversing model objects has resulted in reasonable URIs. Also, one could traverse a graph of abstract traversal controllers instead of models (and in fact, at my job, that is what the code of Launchpad does, as of this writing).

But typically, graph traversal does tend to mix model and URI in a way that can force "model" objects into a system when all you really want is a URI.

For instance, in Zope sites that I have designed, I have frequently felt awkward about the top-level design--the part of the design in which you are arranging top-level access to your models. This part of the website functionality often does not map naturally to model objects. In Zope using the ZODB, the nodes in the traversed graph are usually persistent objects, and so the top-level objects have a "model" feel; and yet they are usually just scaffolding until you get into the meat--the real models--of the application.

As another example, URIs in which path elements are really query-string-like filters on a view rather than true graph traversal are possible, but not as natural with graph traversal systems. For example, consider this URI from trulia.com: http://www.trulia.com/for_sale/3p_beds/2p_baths/SINGLE-FAMILY_HOME_type/resale,new_homes_lt/38.652833,38.976488,-85.838055,-85.455951_xy/10_zm/. (No, I'm not planning on moving to Indiana.) That URI reads well, and follows typical REST advice to move information into the URI. It's doable with graph traversal approaches, but is not really traversing a graph.

Graph traversal has some strengths as well, though.

An obvious one is when you have a graph to traverse. Perhaps you have a CMS in which documents can be arranged into arbitrarily nested folders. Or perhaps you have some concept of "projects" that can contain other projects, to an arbitrary depth.

Of course, in the same way that graph traversal can be made to handle pure-URI stories, such as with Launchpad's abstract traversal controllers, Routes can handle graph traversal. But I argue that graph traversal is more natural to, um, traversing graphs.

In particular, if you have graph nodes that can be dynamically created that have different traversal rules, as in the CMS example above, then defining how to traverse per graph node can be more natural and cleaner than specifying the rules in a routes file and a single controller.

Also, when a routes system starts to make heavy use of regular expressions--say, a rule that specifies anything beyond static strings, a controller, an id, a view, and a "catch all" for the rest of the URI--simple graph traversal approaches can be much easier to express and understand. (Examples of relatively simple traversal approaches are the Launchpad navigation traversers, or the repoze.bfg __getitem__ approach.)

So, they both have applicability. Maybe we can combine the two approaches when it makes sense. The farmer and the cowman should be friends. (You get to decide which approach is the farmer, and is which is the cowboy, though see the postscript.)

For some projects, Routes or graph traversal alone might fit the bill perfectly. I do tend to guess that Routes is the better general-purpose approach. But for some applications--if they present a complex data structure, for instance, and especially one in which one or more aspects of the site can be presented as a graph--then maybe you ought to have Routes for the top of your site, which then can defer to graph traversal for certain parts of your site that make sense.

megrok.trails goes down this road, but not quite the way I'm thinking of at the moment. It fits Routes-style traversal within a larger context of graph traversal. I'd like to turn that inside out: when appropriate, have a Routes mapping with a wildcard that consumes the entire tail end of a URI, and then sends this to an intermediate controller, which uses graph traversal on the wildcard part of the URI to find the "real" controller. Routes is entirely in charge initially, and explicitly defers to graph traversal if so requested.

I wouldn't be surprised to learn if such a thing existed for Routes. It would be pretty easy to code up. I'd like to use something like it.


Postscript: For what it's worth, I'm struck by an overwhelming desire to relate the farmer, making fences, to Routes, making nice, simple URI rules; and to relate the cowman, herding free-range cattle, to graph traversal, letting you walk over arbitrary model graphs. But metaphors like that sometimes get people up in arms, because the Routes people might want to be the rough-and-tumble cowboys, and the graph traversal people might want to be the practical and pragmatic farmers. So forget I said anything like that.)

2 comments:

chrism said...

FWIW, repoze.bfg does just what you indicate within the sentence "Routes is entirely in charge initially, and explicitly defers to graph traversal if so requested." See the urldispatch chapter of the repoze.bfg docs for more info.

Marius Gedminas said...

I don't normally do this, but I have to say that I agree completely and don't have anything to add.

Well, maybe a little note: one of the things I like about Routes is that it gives me a bidirectional mapping. I can go from a URL to a particular view, and I can generate the URL if I know the view I want. Doing this in the graph traversal way is more difficult: there can be more than one path from the root to the desired node, and some of the graph nodes are "more real" than other nodes.