Saturday, February 28, 2009

Getting VMWare Faster

I like Macs, and OS X. I like Ubuntu too, and with my job at Canonical, I develop with it.

I've been developing in Ubuntu in a VMWare Fusion image, for a variety of reasons. I recently got really tired of the slow speed I was experiencing, though. I decided to investigate what I could do to keep my Mac/Linux story portable, but faster, that didn't involve a new computer.

Random Googling to the rescue!

The most persistent advice I read was to get an external disk. I also read that using a reasonably fast external disk over a relatively slow connection like USB 2 might not do so well. I also read...a whole bunch of other things.

Here's what I did:

  • I already was using my FireWire 800 port for some other drives, so I decided to go for broke with an eSATA Express Card adapter.
  • I got a 500GB 5200RPM 2.5" disk drive in a rocketfish enclosure with eSATA and USB connections.
  • After connecting everything up, I built a new VMWare image on the external drive. I used Ubuntu 64 bit, because I read that VMWare could take advantage of some 64-bit opcodes. The 64bit Ubuntu image is named "AMD64" but I read that it works fine on Intel 64 bit. My experience bears this out.
  • I gave the image 1.5G of my 3G RAM.
  • I configured the image to pre-allocate the necessary space on the hard-drive because VMWare Fusion mentions that this might speed things up.
  • I made sure that the 3D graphics acceleration was not turned on for the image, since Linux can't use it anyway.

The results were gratifying! The image certainly feels snappier. More concretely, running a (presumably representative) subset of the Launchpad test suite took Between 20 and 30 minutes on the old image, and between two and three minutes on the new one. <happy sigh>

Now the next question would be "which changes made the biggest difference?" I'd love to know--but probably not enough to actually do the experiment. Instead, I'll use the faster speed to help in this coming week's sprint to abstract a REST webservice framework!

Monday, February 16, 2009

Oh, the farmer and the cowman should be friends: URI parsing with Routes versus graph traversal

(This post's title alludes to a song from the musical Oklahoma, in case you were wondering.)

I, like many web application developers, am impressed with the Routes model for mapping a URI to application code (as in RoR, or any number of Python versions). I plan to use it for "hobby" work, and I'm advocating it at my job.

For many web applications, it seems to work as well or better than the other approach to web application URI parsing with which I'm familiar, graph traversal. In the graph traversal approach I know, you typically divide up the URI path elements by slashes into individual path elements. For example, "/musical_theater/rodgers_and_hammerstein/oklahoma" becomes ["", "musical_theater", "rodgers_and_hammerstein", "oklahoma"]). Then you start with a given graph node and use each path element as input to traverse the graph. For instance, repoze.bfg strictly uses __getitem__ to traverse the graph, so the example URI above might equate to root_object["musical_theater"]["rodgers_and_hammerstein"]["oklahoma"].

The Routes model is particularly nice for web sites publishing square, non-hierarchical data. If you don't have a graph to traverse, then you have to do something else!

Moreover, I buy into the argument that Routes encourages you to think about your URI space separately from your model. This fits in well with REST philosophies, in particular if you regard your URIs as a significant aspect of your user interface.

In defense of graph traversal, I generally have found that traversing model objects has resulted in reasonable URIs. Also, one could traverse a graph of abstract traversal controllers instead of models (and in fact, at my job, that is what the code of Launchpad does, as of this writing).

But typically, graph traversal does tend to mix model and URI in a way that can force "model" objects into a system when all you really want is a URI.

For instance, in Zope sites that I have designed, I have frequently felt awkward about the top-level design--the part of the design in which you are arranging top-level access to your models. This part of the website functionality often does not map naturally to model objects. In Zope using the ZODB, the nodes in the traversed graph are usually persistent objects, and so the top-level objects have a "model" feel; and yet they are usually just scaffolding until you get into the meat--the real models--of the application.

As another example, URIs in which path elements are really query-string-like filters on a view rather than true graph traversal are possible, but not as natural with graph traversal systems. For example, consider this URI from trulia.com: http://www.trulia.com/for_sale/3p_beds/2p_baths/SINGLE-FAMILY_HOME_type/resale,new_homes_lt/38.652833,38.976488,-85.838055,-85.455951_xy/10_zm/. (No, I'm not planning on moving to Indiana.) That URI reads well, and follows typical REST advice to move information into the URI. It's doable with graph traversal approaches, but is not really traversing a graph.

Graph traversal has some strengths as well, though.

An obvious one is when you have a graph to traverse. Perhaps you have a CMS in which documents can be arranged into arbitrarily nested folders. Or perhaps you have some concept of "projects" that can contain other projects, to an arbitrary depth.

Of course, in the same way that graph traversal can be made to handle pure-URI stories, such as with Launchpad's abstract traversal controllers, Routes can handle graph traversal. But I argue that graph traversal is more natural to, um, traversing graphs.

In particular, if you have graph nodes that can be dynamically created that have different traversal rules, as in the CMS example above, then defining how to traverse per graph node can be more natural and cleaner than specifying the rules in a routes file and a single controller.

Also, when a routes system starts to make heavy use of regular expressions--say, a rule that specifies anything beyond static strings, a controller, an id, a view, and a "catch all" for the rest of the URI--simple graph traversal approaches can be much easier to express and understand. (Examples of relatively simple traversal approaches are the Launchpad navigation traversers, or the repoze.bfg __getitem__ approach.)

So, they both have applicability. Maybe we can combine the two approaches when it makes sense. The farmer and the cowman should be friends. (You get to decide which approach is the farmer, and is which is the cowboy, though see the postscript.)

For some projects, Routes or graph traversal alone might fit the bill perfectly. I do tend to guess that Routes is the better general-purpose approach. But for some applications--if they present a complex data structure, for instance, and especially one in which one or more aspects of the site can be presented as a graph--then maybe you ought to have Routes for the top of your site, which then can defer to graph traversal for certain parts of your site that make sense.

megrok.trails goes down this road, but not quite the way I'm thinking of at the moment. It fits Routes-style traversal within a larger context of graph traversal. I'd like to turn that inside out: when appropriate, have a Routes mapping with a wildcard that consumes the entire tail end of a URI, and then sends this to an intermediate controller, which uses graph traversal on the wildcard part of the URI to find the "real" controller. Routes is entirely in charge initially, and explicitly defers to graph traversal if so requested.

I wouldn't be surprised to learn if such a thing existed for Routes. It would be pretty easy to code up. I'd like to use something like it.


Postscript: For what it's worth, I'm struck by an overwhelming desire to relate the farmer, making fences, to Routes, making nice, simple URI rules; and to relate the cowman, herding free-range cattle, to graph traversal, letting you walk over arbitrary model graphs. But metaphors like that sometimes get people up in arms, because the Routes people might want to be the rough-and-tumble cowboys, and the graph traversal people might want to be the practical and pragmatic farmers. So forget I said anything like that.)

Don't use __*__ in Python unless you are hacking Python

As an aside, I've essentially finished the REST book I was reading, so could theoretically launch into blogging about that; and I have been doing a lot of reading and exploring of the Inform 7 ideas I introduced earlier. But those are too big and daunting to write about quite yet.

When starting the Pylons book today, I noticed that their Routes library uses the __*__ pattern for some of their API (__before__ and __after__ at the least, it seems).

The same kind of pattern is sometimes in the Zope code base. zope.location uses __parent__ for back pointers. The component registry defined in zope.component uses __bases__ on instances, which is especially confusing because __bases__ has a special Python meaning on classes.

Why, for goodness sake?

The __*__ pattern is explicitly claimed by the language for internal bits. Here's the pertinent bit from the language reference:

__*__
System-defined names. These names are defined by the interpreter and its implementation (including the standard library); applications should not expect to define additional names using this convention. The set of names of this class defined by Python may be extended in future versions.

That seems pretty clear. The pertinent section of PEP 8 is pretty clear too:

__double_leading_and_trailing_underscore__: "magic" objects or attributes that live in user-controlled namespaces. E.g. __init__, __import__ or __file__. Never invent such names; only use them as documented.

There are plenty of other naming conventions that a framework can claim for internal bits. The ZODB, for instance, even though it was first written a pretty long time ago for the web world, used prefixes like _p_ to signify their own internal bits. It conveys the same kind of "I'm magic" idea, but does not step on the language's toes unnecessarily,

Apparently, there is a defense for older code that uses __*__: I read that Guido's initial style post from which PEP 8 evolved said that it was OK to claim __*__ names under special circumstances. The post is not at its old location any more, but thanks to the wayback machine we can see the evidence. Guido used to say this:

__double_leading_and_trailing_underscore__: "magic" objects or attributes that live in user-controlled namespaces, e.g. __init__, __import__ or __file__. Sometimes these are defined by the user to trigger certain magic behavior (e.g. operator overloading); sometimes these are inserted by the infrastructure for its own use or for debugging purposes. Since the infrastructure (loosely defined as the Python interpreter and the standard library) may decide to grow its list of magic attributes in future versions, user code should generally refrain from using this convention for its own use. User code that aspires to become part of the infrastructure could combine this with a short prefix inside the underscores, e.g. __bobo_magic_attr__.

OK. That was a bit of a waffle. But it's not there anymore, and in any case, some or all of the uses of the __*__ convention I've already listed have no particular need for claiming to be Python-level infrastructure.

How about we stop using __*__ now, unless we are hacking Python itself?

Monday, February 9, 2009

My Claim: "MTV" Is Silly.

repoze.bfg's creator, Chris McDonough, in an informative and corrective comment to my last blog post, among other things disagreed with me about my assertion that "repoze.bfg doesn't provide a model story; it provides a traversal story."

I'm afraid I didn't communicate myself well enough. He might still disagree, and I might still fail to make my points clearly, but I found this the most interesting observation I made, so let me try again.

  • I think applying Model-View-Controller to client-side application frameworks makes sense. MVC actually makes sense, with a reasonably clear delineation of responsibilities, when you look at Cocoa, for instance, or even in some of the more recent JS frameworks (Sproutcore, for instance).
  • All of the reasonably recent web frameworks out there with which I am currently even vaguely familiar (Zope 3, Django, RoR, bfg, Pylons) have a model and something responsible for rendering. The rendering usually goes out to a template, but not always. It's not really cooked that you have to, and usually that's regarded as an advantage and a flexibility. "Render however you want! Use whatever library you want!" So, the heart of the system is Model-View. The Controller isn't there, and the Template is an implementation detail of the View. "Model-Template-View" or "MTV" just seems silly to me. I'd prefer it if everyone acknowledged that our web frameworks usually just have "MV," and move on.
  • bfg doesn't care what it is traversing. Pretty much, give it things that it can use the __getitem__ protocol on, and then when it has consumed the path, it'll adapt to a view class. The __getitem__ bits could be a model...or not! What if your data model didn't jibe with your URL model? That's completely reasonable, and the routes guys have plenty of examples in their apps because of how they think about URLs. What if you like the __getitem__ pattern for your URLs, but your URL story is different than your model. You might build a true MVC system with bfg: pure data-driven models; the bfg traversal system used exclusively over "controller" objects that handle traversal and maybe request (i.e., form) parsing; and views that adapt the controllers to *only* render. Maybe the controllers even optionally have WADL-like contracts based on request inputs.

So, my point was actually not that bfg was cheating in any way--certainly, for instance, nothing like Zope 2. To recap, then.

  1. I find the "MTV" term to be specious generally, whatever the web framework. That's just a criticism of the term, as web framework marketing has adopted it in the past few years. I'd love for it to retire.
  2. interestingly, I don't think bfg is truly tied to the "MTV" model. It doesn't care what it traverses. The MTV model works fine, but a story like what I described, in which the models are maintained separately from the URL space, and the traversed objects are traversal "controllers," would also work well. Then thinking about any additional responsibility of the traversed objects is an interesting exercise, especially in light of REST-ian approaches.

So, yes, Chris is right, from one perspective, bfg is as much MTV as anybody else. That's fine. I'm just railing against the term, and saying that bfg can be used for more than just "MTV".

Saturday, February 7, 2009

Repoze.bfg

As mentioned earlier, I've spent time looking at other frameworks lately.

I've spent more time on Chris McDonough's repoze.bfg than any of the others so far. This is probably because, as discussed below, it's very minimal. It's also documented well. Finally, it follows a few standard old Zope patterns that don't require much thought for me to process. Given all that, I can understand it quickly, and so am enticed to spend more time to read and think a bit about its design.

I looked at it again because of this recent bit of marketing: http://plope.com/whats_your_web_framework_doing. Chris makes his point, which is valid; and he's selling to his design's goals, which is the point of this kind of presentation.

As an aside, I'm a "can't we all just get along" kind of guy, so the fact that the trade-offs of repoze.bfg's design are not discussed in comparison with the other frameworks bothers me, even though having done so would have made the piece much worse marketing and much harder-to-read communication. (A repoze.bfg design tradeoff example: if you always need authentication and authorization for your web apps, you'll need to plug in and understand more WSGI middleware, and then the given comparison is not as pertinent, at least for Grok and Django.) I actually wouldn't be surprised if repoze.bfg still would do very well in the chosen metric, if the set up actually did include authorization and authentication middleware in the profile; and if it didn't discard the webob.Response. That would have been mildly more interesting to me. But, whatever, it's marketing, and I get Chris' point.

The name of the framework, "bfg" is funny on multiple levels. The level that sticks with me is that the F[*&^%$#] G[un] is really not that B[ig]. As the documentation points out, this is a very minimal framework:

Minimalism: repoze.bfg provides only the very basics: URL to code mapping, templating, and security. There is not much more to the framework than these pieces: you are expected to provide the rest.

That's nice for a "pay for what you eat" story, as the documentation says elsewhere. But it's also insufficient for any website I've ever made. There are at least some suggested patterns to follow elsewhere within the repoze meta-project: repoze.who and repoze.what are available for authentication and authorization, for instance.

But it is a framework that wants more guidance, more "rails," more framework, to get some basics done. What about web form helpers: maybe we ought to use Ian Bicking's stuff. Or what about REST helpers? You might be able to write some interesting adapters from a generic RESTful view to a CRUD-ish interface, like the patterns I've seen in Rails. But it's not there now (and the documentation states that it is an active goal to hide the zope component architecture, which could have helped with this).

While it's nice to be lightweight, I think this would make a more appealing sales pitch. Maybe it's in the plans to build related libraries and integrate them in "building with repoze.bfg" tutorials, or maybe it's antithetical to Chris' goals, who knows.

Of the three features that the framework provides, the view and templating story is the least interesting to me. It seems very similar to the Zope 3/Grok story. I intend to checkout Ian Bicking's webob library, and I hope to use chameleon at work and for hobby projects, but that's the extent of it.

The security story is very similar to Grok's. They both forego framework-level security checks during traversal, I believe, while they differ in the last step: repoze.bfg security-protects the last traversed object within the view code, as I understand it, while Grok security-protects the view. For what it is worth, I prefer the repoze.bfg approach.

The traversal story is the most interesting to me. When I first heard the repoze.bfg traversal plan of "__getitem__ over the model, period," as opposed to the more flexible standard traversal story of Grok/Zope 3, I was skeptical, but the more I think about it the more I like it. The traversal story in Zope 3 has always been a pain to use for me, and while there might be a more powerfully flexible way to alleviate that pain than the repoze.bfg approach (and I think Grok might have tackled this already), the __getitem__ simplicity still is appealing.

In that vein, I find that the assertion that the repoze.bfg code is not MVC but "MVT" (Model-View-Template) like Django doesn't feel right. repoze.bfg doesn't provide a model story; it provides a traversal story. As such, you could be traversing over models or controllers; the code doesn't care which. This is "TV" (Traversable-View); or "[MC]V," from a regex perspective on MVC; or some other odd acronym. Not MVT.

In any case, while I might explore using repoze.bfg on some hobby projects, primarily to get my hands dirty with WSGI and get a better handle on a couple of Ian Bicking's libraries, I won't be working with this at work, and I have a higher personal priority to get some time with Django. I'll probably continue to follow repoze.bfg's development from a distance for now...and be glad that I don't have to write any marketing myself.