Notes on repoze.bfg (now Pyramid) and ZODB
Published on 17 November 2010, updated on 13 April 2011, Comments
Introduction
I’ve been working on a couple of projects based on ZODB and repoze.bfg
(which is now becoming Pyramid). I ended up learning a few useful things that I couldn’t find in the
official documentation. I think the doc
assumes some familiarity with the Zope ecosystem, which I didn’t have when I
started. That assumption made my learning curve a bit steeper than it could
have been. Things I felt missing from the doc are not part of the framework
itself, but are provided by satellite packages, either part of Zope or of
Repoze. It’s not the role of the official manual to document these tools, even
though you will probably need them for any real world development.
repoze.bfg
deserves more introductory material, especially
targeted at people who haven’t used Zope before. In this article, I’m going to
provide various information that I wish I had when I started. repoze.bfg
plays well with both ZODB and SQLAlchemy but here I’m talking only about working with ZODB.
Code reloading
First of all, while developing, you’ll want to have the code reloading itself automatically when you make changes:
paster serve etc/myapp.ini --reload
ZODB
Persistent classes
In ZODB, changes made on regular Python objects are not automatically persisted. This confused me a few times. For example, if you have a list attribute on one of your objects:
>>> root['foo'].bar
[1, 2, 3]
You might think (as I did) that this would work:
>>> root['foo'].bar.append(4)
>>> transaction.commit() # Commit is needed in the shell
Well as you might have guessed by now, the change won’t get saved. One way to tell ZODB to persist the change is to set the _p_changed
attribute after you’ve updated a collection:
>>> root['foo'].bar.append(4)
>>> root['foo'].bar._p_changed = True
>>> transaction.commit()
It’s ugly and error-prone, but there is a better way: use one of the persistent
classes that ship with ZODB:
>>> from persistent.mapping import PersistentList
>>> root['foo'].my_list = PersistentList()
That way any change to your list will be persisted and you don’t have to
remember to set the _p_changed
attribute. There are also PersistentMapping
for dictionary-like objects and the generic Persistent
class which you can
subclass to make your own persistent objects:
>>> from persistent import Persistent
>>> from persistent.mapping import PersistentMapping
Folders
While persistent classes are good for storing objects in ZODB, if we intend to use these objects with Traversal, they need to have a __name__
and a __parent__
attribute. While the tutorial explains well how we can manage these attributes ourselves, there is actually an easier way: repoze.folder
provides a Folder
class which subclasses PersistentMapping
and manages __name__
and __parent__
attributes automatically. To get this benefit, you just need to define your models by subclassing Folder
.
Database queries
Coming from an SQL background, I spent a comical amount of time searching the web for information about how to query data from ZODB. Actually, you just don’t do that. ZODB is a storage mechanism, it doesn’t provide any facility to query the data. Instead you use a third-party indexing package: repoze.catalog
. There is even a tutorial on how to integrate it with repoze.bfg
which I ignored until I realized what it was for.
Conflicts
ZODB transactions use optimistic concurrency control and therefore, now and again, a transaction can fail. This will give you an error such as:
ConflictError: database conflict error (oid 0x114f, class
myapp.models.MyModel, serial this txn started with 0x038d931ff0f3c944
2011-04-11 18:07:56.473193, serial currently committed 0x038d932e737a9077
If your code is committing transactions explicitly using
transaction.commit()
, then you can catch this exception and try again. However
your app may be using the transaction WSGI middleware
repoze.tm2
(or maybe the older repoze.tm
) so
that you don’t have to explicitly call transaction.commit()
in your code. In
this case you won’t be able to catch the error because the commit is done by the
middleware, after your own code has been executed. The solution is to use
another middleware: repoze.retry
. It will
retry the WSGI request in case of ConflictError
.
Database Maintenance
It’s important to be aware of the fact that ZODB records all changes. Even if you save the same value multiple times, it will record copies of the value and your database file will grow and eventually fill up your disk space much faster than you’d think.
You can be careful not to write to the database when it’s not
really needed, but of course your app very likely needs to modify existing data.
In order to prevent the database file to grow more than necessary, you have to
regularly “pack” the database. ZODB comes with a command line tool called
zeopack
which does this. For example, if you use buildout, the command line might look
like:
$ ./bin/zeopack localhost:9000
You will probably want to call it periodically using cron
. This article has more
info about this issue.
Traversal
When I first read the description of Traversal, it seemed very mysterious and clever. All these talks about graph, context finding, view lookup, etc. got me quite confused at first. The reason why that description is abstract is because the mechanism itself is abstract and should in theory be usable in different contexts, but in reality it’s mainly used for one thing: mapping URLs to objects stored in ZODB. Be aware that I am intentionally simplifying here, Traversal is more than that but if you’re a beginner my explanation should help you get started.
Matching model classes
The idea of Traversal is excellent. URLs in web applications often correspond to objects in a database. As web backend developers, we’ve probably all been doing something like:
# Pseudo code example
def show_book():
book_id = request.params.get('book_id')
book = Book.get(book_id)
if book:
# do something with book
# ...
return render_template("book.html", {'book': book})
else:
return 404
The object id could also be part of the URL path (/books/42
), but the principle
is the same: we get an id, we try to get an object for that id and we do some
work on that object or return an error if the object wasn’t found. That’s
exactly what Traversal does, without you having to write a single line of code.
Of course, Traversal is no magic and your database has to follow a certain
structure for the mechanism to work. It has to follow the structure of the URLs
you want to map. ZODB databases are structured as nested dictionary-like objects (often Folder objects, as we mentioned before).
Now let’s say your ZODB database has the following structure:
database_root = {
'books': {
'42': <Book Object>
}
}
Using repoze.bfg
’s Traversal mechanism, here is how the
equivalent of the previous example would look like:
# Pseudo bfg-style code example
@bfg_view(for_=Book)
def show_book(context, request):
# Do stuff here if needed...
return render_template("book.html", {'book': context})
If you visit the URL /books/42
, Traversal will automatically map it to your Book object located at root['books']['42']
in ZODB and pass it as the context
argument of your function.
This is already quite useful but it gets even more powerful when you combine this with interfaces.
Matching interfaces
Interfaces come from Zope and provide a way to declare expectations about how a class should behave (if you happen know Java, you’re already familiar with the principle). For example, let’s say your app allows users to store books and photos and you want to allow visitors to leave comments on both books and photos. You can define an interface such as:
from zope.interface import Interface
class ICommentable(Interface):
pass
Then you mark your models as implementing this interface:
class Book(Folder):
implements(ICommentable)
# ... rest of the class definition
class Photo(Folder):
implements(ICommentable)
# ... rest of the class definition
Now you can write a request handler that will work for any object implementing the interface:
@bfg_view(for_=ICommentable, name="comment")
def create_comment(context, request):
comment = request.params.get('comment_content')
# context is either a Book or a Photo
context.comments.append(comment)
# ... then send a response...
If you visit /books/42/comment
or /photos/23/comment
, your create_comment
function will be called. In the case of books, Traversal knows it should use create_comment
instead of show_book
that we defined in the previous section because of the name
argument we passed to the view definition. Names take precedence. Using interfaces with Traversal allows you to write generic request handlers easily.
Why don’t we specify any attribute in the interface definition?
As far as Traversal is concerned we don’t need to. Python uses duck-typing: if
our objects behave like commentable objects, they are commentable. Here the
interface is merely a marker. However you can and maybe should specify attributes in your interface definition to make sure your generic code receives what it expects. I left it out for simplicity but zope.interface
has all you need for that.
Templating
Chameleon is a popular choice among repoze.bfg
users, probably because it reminds them of Zope templates.
In Chameleon templates, the equivalent of if
statements is tal:condition
. If you’re like me, you might find yourself looking for the equivalent of an else
clause. Well, it just doesn’t exist. If you think about it, Chameleon is based on XML tags. Any XML tag needs a matching closing tag, so how would and else
tag look like?
Instead you just write another condition with the opposite value:
<p tal:condition="foo">
foo is true
</p>
<p tal:condition="not foo">
foo is false
</p>
Beware of Middleware
Be careful when using middleware. If you don’t configure your application properly it could break your scripts and the bfg shell. You might end up seeing an error such as:
$ paster --plugin=repoze.bfg bfgshell etc/paste.ini zodb
Traceback (most recent call last):
[...]
File "[...]/site-packages/repoze.bfg-1.1-py2.5.egg/repoze/bfg/scripting.py", line 14, in get_root
registry = app.registry
AttributeError: MyApp instance has no attribute 'registry'
This message might be a bit confusing but actually the key of the problem is provided by bfgshell
help:
$ paster --plugin=repoze.bfg help bfgshell
[...]
Example::
$ paster bfgshell myapp.ini main
.. note:: You should use a ``section_name`` that refers to the
actual ``app`` section in the config file that points at
your BFG app without any middleware wrapping, or this
command will almost certainly fail.
[...]
There we are. The error message above is caused by doing precisely what we
shouldn’t do: call a section name that refers to an app wrapped in WSGI
middleware. So inspect your INI config file and check if the section you’re calling
(in our case a section called zodb
) makes use of any middleware. If it’s a
pipleline
or filter-app
section, it does use middleware. If it’s just an
app
section, look for a filter-with
entry in that section. If you can’t
find anything suspicious in your INI file, the middleware might be called
programmatically within your app’s initialization code (grepping for
“middleware” is probably the quickest way to find out where).
Now that you identified the cause of the problem, you will need to reorganize your config file so that it provides two different application sections:
- an
app
section that refers to your bare BFG app, which will be used bybfgshell
and by scripts, - a
pipeline
orfilter-app
section that wraps your bare BFG app with the WSGI middleware you need and that you’ll call with Paster, mod_wsgi or whatever you happen to use to serve your app.
There is more than one way to do this, so please refer to Paste Deploy
reference documentation to make informed decisions about how to restructure your configuration.
Conclusion
repoze.bfg
is a very robust framework. While working with it, I didn’t hit a single bug, which is quite rare with web frameworks. Using Traversal and ZODB is an interesting and refreshing approach to building web applications. I hope these notes can make it a little easier for beginners to get started.