Monday, February 16, 2009

Don't use __*__ in Python unless you are hacking Python

As an aside, I've essentially finished the REST book I was reading, so could theoretically launch into blogging about that; and I have been doing a lot of reading and exploring of the Inform 7 ideas I introduced earlier. But those are too big and daunting to write about quite yet.

When starting the Pylons book today, I noticed that their Routes library uses the __*__ pattern for some of their API (__before__ and __after__ at the least, it seems).

The same kind of pattern is sometimes in the Zope code base. zope.location uses __parent__ for back pointers. The component registry defined in zope.component uses __bases__ on instances, which is especially confusing because __bases__ has a special Python meaning on classes.

Why, for goodness sake?

The __*__ pattern is explicitly claimed by the language for internal bits. Here's the pertinent bit from the language reference:

System-defined names. These names are defined by the interpreter and its implementation (including the standard library); applications should not expect to define additional names using this convention. The set of names of this class defined by Python may be extended in future versions.

That seems pretty clear. The pertinent section of PEP 8 is pretty clear too:

__double_leading_and_trailing_underscore__: "magic" objects or attributes that live in user-controlled namespaces. E.g. __init__, __import__ or __file__. Never invent such names; only use them as documented.

There are plenty of other naming conventions that a framework can claim for internal bits. The ZODB, for instance, even though it was first written a pretty long time ago for the web world, used prefixes like _p_ to signify their own internal bits. It conveys the same kind of "I'm magic" idea, but does not step on the language's toes unnecessarily,

Apparently, there is a defense for older code that uses __*__: I read that Guido's initial style post from which PEP 8 evolved said that it was OK to claim __*__ names under special circumstances. The post is not at its old location any more, but thanks to the wayback machine we can see the evidence. Guido used to say this:

__double_leading_and_trailing_underscore__: "magic" objects or attributes that live in user-controlled namespaces, e.g. __init__, __import__ or __file__. Sometimes these are defined by the user to trigger certain magic behavior (e.g. operator overloading); sometimes these are inserted by the infrastructure for its own use or for debugging purposes. Since the infrastructure (loosely defined as the Python interpreter and the standard library) may decide to grow its list of magic attributes in future versions, user code should generally refrain from using this convention for its own use. User code that aspires to become part of the infrastructure could combine this with a short prefix inside the underscores, e.g. __bobo_magic_attr__.

OK. That was a bit of a waffle. But it's not there anymore, and in any case, some or all of the uses of the __*__ convention I've already listed have no particular need for claiming to be Python-level infrastructure.

How about we stop using __*__ now, unless we are hacking Python itself?


D said...

Here is a long-winded description of how I see the current situation.

The Python developers are especially sensitive about changes to the language and go out of their way to demonstrate that making a change will not break existing code (too badly). For example, whenever a new reserved keyword is suggested, someone goes through the standard library and a few major projects (Zope, Twisted, etc...) to see how often it is used as an identifier. This is a measure of how painful it will be to adopt the new reserved keyword.

The prohibition on new __*__ names is a sort of insurance policy in backwards compatibility arguments like this. When a library developer starts complaining about a PEP that stomps all over his __*__ name, the core developers can always say, "back off - we told you that you can't have them." This is, in the grand scheme of things, very good since everyone wants the built-in magic methods to have simple, easy-to-remember names, and the core Python developers are a pretty level-headed bunch who don't go out of their way to push other developers around.

Fortunately, these arguments tends to not happen - the mere fact that __*__ names are strongly discouraged with ominous warnings has kept this namespace clear enough.

Furthermore, Python itself has historically not pushed this issue very hard. For example, the parser could issue warnings about such names (or, more hard-line yet, refuse to run programs with custom __*__ names), but it currently does not.

And everyone comes out reasonably happy.

Simon Rivada said...

Even though D above here has a nice example of why not to use your own __*__ functions. However I disagree with the part of the Python interpreter giving warnings about it. Like the title of this blog post says, if you are hacking you might want to overwrite certain built in __*__ functions (not saying this is a wise plan).

Some packages are built on this, so that would be breaking backwards compatibility even more so.