As I already mentioned in my last posting I participated in this years Google Summer of Code. I sent an application for the MoinMoin Wiki Project and got elected as one of the students who were allowed to work on MoinMoin over the summer. My task was to Extend and refactor the MoinMoin Storage Engine.

Historically, MoinMoin has always stored everything as text files on the disk. There are several disadvantages to this old approach:

  • Due to the way the files were stored, the storage didn’t scale well.
  • It is almost impossible to have separate dedicated “database” servers.
  • Pages, users and attachments weren’t stored uniformly, thus making the system more complex.

The idea of my task (not the first of its kind) was to inject an abstracted storage layer. MoinMoin now talks to some object it knows is a storage backend and does not care how the backend handles storage technically. The things you store inside backends are called items which have revisions. Pages, users and attachments are now uniformly stored as or inside such items. MoinMoin just says “store this item” and the backend does, depending on what kind of backend it is.

The administrator defines what backends to use for user and data storage. You can choose between several backends, e.g. a Mercurial backend (which was another GSoC task) or a filesystem backend (still useful if you don’t have a database). It is not difficult to write a new backend since all you need to do is implement a single class (The new API was designed for that). There are some other (still unfinished) backends that can be used as middleware, e.g. a backend that wraps other backends and stores items in the correct backend depending on the name of the item. There is also a converter script that takes a source and a target backend and transfers all the data from the source to the target backend. This is especially useful in combination with the read-only FS17 backend. As the name indicates, this backend supports reading data from your “old” MoinMoin 1.7 installations. This allows you to migrate to the new storage system and swap backends easily afterwards in case your requirements change. The sheer amount of backends I just mentioned should be proof enough that it is not hard to write a new backend, especially since there are even more backends in existence. As a side note: These changes make it possible to come up with a SQL or even a SQLAlchemy backend. (The benefit of the latter is to be database-agnostic.)

Note, however, that this will not make it into MoinMoin 1.8. There is still work that needs to be done. If you want to help or even contribute a backend, join #moin-dev on Freenode and we will help you get started.

It was fun to participate. The other developers were friendly and welcoming, which is essential in an Open Source environment. So thanks Thomas, Alexander, Reimar, Radomir and Armin! I especially need to thank Johannes for being a fantastic mentor (no objections)! I learned a lot from all of you. So thanks a bunch for allowing me to work on your project and thanks Google for driving the Summer of Code!

1 comment Sep 7, 2008 5:44:00 PM Coding, MoinMoin, Python

Comment by ThomasWaldmann — Sep 7, 2008 7:31:00 PM | #- re

Hi Christopher,

thanks for your great work in SOC 2008!

I just wanted to add some details to your post above (otherwise some people might misinterpret it):

  • The filesystem storage scales as well as the filesystem you use for it scales.

Modern filesystems use directory indexes, btrees, etc. to make handling large amount of files fast. One can see a filesystem as a database that is highly optimized for simple name -> data lookups (and that is the most frequent operation a wiki does). Thus, having many thousands or ten thousands of pages is not really a problem.

  • A SQL database backend is not necessarily faster or better for everybody.

It adds processing overhead and latencies. You need to have a SQL server and administer it. It can be a plus if you do complex queries or if you run a SQL DBS anyways, of course.

  • There is no (SQL) database backend yet - thus, yes, the filesystem backend is still useful. :)

I am very curious what backends we’ll see in the next months. Python Developers, we need you!

Cheers,

Thomas