Wolverine’s Blog

Force Midnight Commander to exit in current directory

Posted on January 6, 2012 by Wolverine

I have some Gentoo systems that have mc installed, but it’s default behaviour is to exit to the directory it was run from. I wanted to change this, because most of the time I’m using it to navigate the filesystem and then after exiting mc I want to be dropped to the directory I have navigated to. To achieve this you can issue this little script which will force Midnight Commander to exit in this directory.

mkdir /etc/profile.d
cat > /etc/profile.d/mc-chdir.sh << EOF
#!/bin/sh
if [ /usr/libexec/mc/mc.sh ]; then
    . /usr/libexec/mc/mc.sh
fi
EOF
chmod 775 /etc/profile.d/mc-chdir.sh

Posted in Random | Tagged linux gentoo mc exit current directory | Leave a comment

Why the all famous “cloud” is not an answer?

Posted on October 18, 2011 by Wolverine

Edit: Seems like my post is irrelevant since there are projects like Tahoe-LAFS and Ceph.

Being lucky to own a new machine capable of running multiple virtual systems, I have decided to try few of them I wasn’t able to easily install and use before. Being a hardcore Linux user, but not inclined to bash Microsoft or any other operating system vendor right out of the box (hey, Windows 7 is not that bad after all), I have tried few Linux distros I wasn’t very familiar with and also this new Windows 8 Developer Preview.

Since I am a familiar with and really like Mandriva I’ve decided to give this new Mandriva 2011 a spin. I’ve downloaded new Ubuntu, Fedora, Arch, Mageia and some other distros also just to try and see if any of them really are being that much different from the others. I wanted to see the progress in Linux and other operating systems as I haven’t got touch with other systems than XP, Mandriva and Gentoo recently.

I won’t go into details of each and every OS I have tested, but I want to share with you some thoughts about the direction and general progress in evolution of operating systems in general. My feelings are not very positive I might say.

I’m currently using Mandriva 2010.2 mostly. My server is running multiple Gentoo installations over a VMWare hypervisor. Those two distros are two different worlds and both have their pros and cons, but so far they did the job for me. Mandriva 2010.2 is really good desktop and laptop distro.

Gentoo makes pretty good server system – however I have some objections. Those objections are that you have to put significant amount of time to maintain your system. Time that is precious for me and I just want things to work and update as they should. Without spending hours for simple system upkeep. I’m a business person and I’m calculating my time economically. Gentoo is really cool system and I find it very satisfying and pleasant to use – when you have time. I don’t, but unfortunatelly I must somehow stick to it for the time being – lacking any real alternative that would offer that much stability and speed. Gentoo, if maintained properly, is fast. It’s customizable to the point no other distro can be. However when you’re a business conscious person I would not recommend to use it on daily basis. Too much knowledge and time must be put into simple upkeep and maintenance.

Mandriva is really great distribution for daily use, but I would think twice about using it on the server. Why? Because there were some outstanding critical security bugs that lead to compromise of my server machines not once, but few times. I wouldn’t have anything against it if patches were provided in a reasonable time, but honestly. Waiting half a year for fixing critical security bug in ProFTPD was way too much. Mandriva is also a distro which longevity (i.e. support time) is way too short for a real production server.

Of course. It has many security features out-of-the-box. I like msec and the way it’s configured from the start. I like it’s so uninstrusive for a power user and doesn’t get in the way when you want to accomplish some more advanced things without all those druid crap getting in the way. It just plays nicely and I’m really, really amazed by how Mandriva found balance between ease of use and needs of power users. They had their ups and downs, but this distro was really solid for most of the time. I have used many different distributions, but I always was amazed that in Mandriva most of the time everything worked. I liked it’s logical layout of configuration files and interoperation between editing configs by hand and by druids. Something Debianbased distros (including Ubuntu) could only dream of. Despite the fact that the OS was never pretty and I must admit it lacked taste when it came down to fancy user experience, it was still very rock solid. There were some great tools that Mandriva offered like msec, mcc and relevant drak*/*drake utilities and URPMi which I still find superior to any other package manager, maybe except for Gentoo’s emerge.

I’m still using MDV 2010.2 and honestly I’m a little bit disappointed that I’d have to stick to it for the nearest future. I simply find that this system is the most rock solid distro ever. For now. I did some certifications from Novell, so I also know SLES and SLED. Those are not bad distros indeed and when you take into account some of the tools Novell built, you have to say that they are really good choice for an enterprise network. Somehow I’ve always found SuSE really messy distro. They have their own ways of doing things which I personally don’t like.SuSE without Novell stuff is just too chaotic and inconsistent. Nevertheless Novell OES2 is a great business system which is stable and reliable. However I don’t find SuSE based management tools too attractive or even useful sometimes. Zypper is nowhere near URPMi. I know it’s a little bit more focused on tech-heads, but I always found it distracting that Novell tools are made by engineers for engineers and not for your ordinary next-door admin. I being a highly technical person, find it somehow too overwhelming and not pleasant to work with. Sure. I’m amazed by technical superiority of SLES over other distros, but honestly – technical superiority doesn’t mean it has to be technically impossible to manage without engineer level of knowledge. Sure. I can manage it, but people who are not into IT for twenty years, but have some good knowledge should be able to manage it too. Simply. Using and managing Novell systems is not fun at all. And your daily work should not only be pleasant, but also productive. Novell tools are technically superior to other solutions, but are just too hard to use for an ordinary person. And technical person also has to invest significant time into getting the idea of how things tick. That shouldn’t be this way.

I’m a person who likes doing things my way, but I don’t want to invest much time for getting knowledge of all the tips and tricks to make things work my way. Configuring your system should be pretty straightforward and when you want to tackle with parts of the system on a more advanced level, system shouldn’t get into your way, allow you to configure and manage things the way you want, but without any overhead of unncecessary technical details and stealing your time for learning really unnecessary things. After all computer is just a tool for getting things done and if you are like me, not a technical marvel to praise and invest all of your time for exploring it’s nuances, quirks and technicalities when it’s really unnecessary.

Since a computer and in fact the underlaying operating system is the key to your experience we are slowly getting to the point of this elaborate. For me productivity is the most valuable thing. I’m running a small company and I don’t want to invest time in managing my systems. I work with many clients who treat computer as a tool and not their toy. Time is what counts. I don’t want to be managing all the technical stuff and since I’m working with many clients I have some experience that they do not care about their updates, anti-virus scanning, backups, system configuration or management. They only care about how they can do their work with their computer and their operating system. Most of my clients treat IT as necessary evil, but still evil. Most of my clients can’t afford a dedicated network administrator or full-blown IT department. And I share their beliefs. Ordinary operating system user should get to the computer, do his work and leave. Not caring about all the technical details or computer/OS management stuff. For many years I have seen that most of open source community (and IT in general) is simply forgetting this. Client treats computer/OS as a tool, not as a thing of admiration. No matter how much we love all the intricate details of how those binary logic things tick, ordinary user doesn’t share it. And we shouldn’t rebel. We have to accept it, that not everyone can find the beauty in IT, nor does he want to know anything about the technical details of his own operating system. Nevermind it’s so beautiful, simple and logical for us. It’s a little bit arrogant of us, IT guys, to force our knowledge and admiration on those poor souls who doesn’t understand the art of IT. And it’s us to shame, not them. Because we thought our way is the only way that should be taught to others. It’s not. And it’s a tragedy of IT and open source community as a whole that most of us don’t understand this.

I will stress this once again. Ordinary user wants to do his work on his operating system. Computer is just a tool. And honestly as a technical person with a background of network administrator and software developer, not to mention an owner of the company – I want to do my work with my computer and do it as productively as I can. That’s why I care about offloading tedious and really mostly uninteresting work from my shoulders. I want my operating system to be able to do most of the ordinary management automatically. I don’t want to care about backups, synchronization, antivirus sweeps, updates and such things when I really have to do my work. And do it quickly.

So there was an idea. Big companies and the industry thought. Hey! Why don’t we do this by pushing more and more to the cloud? We can keep our users updated, synchronized whereever they want to be, on whatever device they like. We can scan their files to see if they are not infected, ensure their data is always available and they will enjoy the benefits of (marketing bullshit) cloud. And we will slowly force everybody towards Software-as-a-Service upon which we can monetize (not to mention we will get knowledge of everything about them – who will guarantee we won’t). Right?

Wrong. In my opinion IT industry got it wrong. There are many objections to the cloud which we all know. Privacy and data security being one of the most significant concerns. Let me express my opinion it this way. Industry got the right problem, but they provide the wrong solution. Of course many will praise that it’s the way, but I won’t discuss their motives. I have just recently installed Ubuntu and Windows 8. What I’m seeing? Two most significant operating systems (except MacOS, but I’m not familiar with it) are subtly forcing it’s users to use their clouds. We all know about those privacy invasions done by mobile phone operating systems – guess what they will do with all your private or company data offloaded to the cloud. Who will guarantee that your data is properly stored, inaccessible to others, etc.? Knowledge is power. You’d better not hand your data right away to the corporation you have no control over. After all corporations are known to be handing over data about you to security agencies or using it against people who can endanger them. Corporations are entities making profit of their shareholders. Who can guarantee that your innovative technological company won’t interfere with their plans to dominate some part of the market you are competing with them? Cloud is dangerous.

But I’m not going to elaborate more on this issue. I’m sure if you’re interested in the topic you already know those problems with the cloud. Cloud is a centralisation of power and knowledge about it’s users in one invisible hand. Why open source community should be on high alert when hearing “cloud”? Because it’s against one of the most significant principles of the community itself. Open source was and I hope still is about freedom. Freedom of communication, freedom to share, freedom to express, freedom to innovate, freedom to creatively express oneself. It’s not about the software. It’s about art. It’s about the humanistic ecosystem that was built around the software. It’s about exchange of thoughts and ideas, improvement and striving for excellence. Even if it manifests itself only in software development.

Cloud is against the open source’s inherited ideal of decentralisation, of Bazaar. Cloud is the next level of Cathedral. You may own the free bazaar software, but you will be forced to use cathedral cloud – one or another, but still in the hands of one or another entity. I’m really surprised that the community doesn’t raise objections to this matter. Take it this way. It’s an issue like Napster or Kazaa vs Bittorrent. Cloud is a centralised entity. No matter there are many clouds. They all have one weakness – a controlling entitiy. Just like Napster or Kazaa was one day. Bittorent on the other hand is decentralised. And that is what we need. We don’t need clouds. We need SWARMS.

What is a SWARM? It’s a decentralised cloud. No single entity controls it. It’s a community thing. A SWARM is voluntary cloud. It’s a concept of encrypted, decentralised storage allowing synchronisation of your data in a secure manner. Much like Bittorent is a peer to peer network of data exchange, so is SWARM to synchronization. Of course SWARM poses significant technical difficulties to develop. But I’m sure there are many brilliant minds that can and ultimately will overcome problems with this concept.

How I would see the SWARM implementation? It may be global distributed filesystem. It may be a distributed database. Data synchronised with the SWARM may be distributed to active SWARM nodes and replicated on purpose. I imagine a daemon that would be running on each and every SWARM participating node that would give up some resources for the whole SWARM. Think of it as a server running on each SWARM node providing for example 100MB of storage space to the SWARM. The amount of data you can store in the SWARM would be determined by your participation of resources for the SWARM. Security of your data would be guaranteed by strong cryptography and chunking of your data. Like in Freenet, nobody will know what his SWARM node is storing and to whom this data belongs.

Unfortunately there are some problems with this, Freenet storage is based on popularity of the requested content. There are more problems. Which nodes should receive your chunk of data? How the data should be distributed and replicated? How many nodes should receive a copy of your chunk synchronised with the SWARM? How the SWARM should cope with data loss or significant loss of storage providing SWARM nodes? Should your node track copies of your data and if it looses some of the nodes synchronised to your data, how should it decide to distribute replication of your data? How to ensure you can always reach your data if we must assume tha SWARM nodes are unreliable? How to deal with contraction of node count hosting copies of your data? How much data can you store in the SWARM at any given time?

If the SWARM is going to be a real project those and other questions must be answered first. Due to it’s chaotic and unreliable nature it would be hard or even impossible to devise algorithms that will guarantee the availability of your data all the time. It may be necessary to loosen some of the restrictions on SWARM concept. However the idea might find it’s way to some private environments where SWARM might be an attractive alternative for non-controllable cloud.

If you want to talk about the SWARM concept with me feel free to do so in the comments section.

Posted in Random | Leave a comment

Fancier logging in Python (with module and method/function names)

Posted on October 16, 2011 by Wolverine

As Python programmers most of us know the old good logging module in Python standard library. This is somewhat really good and useful library for debugging and logging the behavior of the program, but I found it somewhat lacking.

What I needed was to be able to log what function or class method issued the log.debug() and in what line. This was needed by me for asynchronous and multithreaded networking library project I’m working on.

The solution was to write my own custom logging.Logger class implementation that overrides it’s debug method with some fancy inspect module tricks. Works like charm! Try it by yourself.


import logging
import inspect

logging.basicConfig(level = logging.DEBUG)

class MyLogger(logging.Logger):
    
    def debug(self, msg, *args, **kwargs):
        method = inspect.stack()[1][3]
        frm = inspect.stack()[1][0]
        if 'self' in frm.f_locals:
            clsname = frm.f_locals['self'].__class__.__name__
            method = clsname + '.' + method
        if not method.startswith('<'):
            method += '()'
        msg = ':'.join((method, str(frm.f_lineno), msg))
        self.__class__.__bases__[0].debug(self, msg, *args, **kwargs)

logging.setLoggerClass(MyLogger)
getLogger = logging.getLogger

Voila! Now, after instancing this new MyLogger class, you can use it just like normal Logger class, except for the bonus of having module name, function or method names and line number in your log.

Here's a simple debug output from this class:

DEBUG:network.connpool:Connection.analyze():187:Found packet: [0, 1, 0, 42, 0] DEBUG:network.connpool:Connection.analyze():187:Found packet: [0, 1, 1, 69, 0] DEBUG:network.server:PacketHandler.handle_close():102:Closing connection id: 139697871065392 DEBUG:core.event:emit():153:Emitted net.conn "CONN_DROP" DEBUG:core.event:EventManager.process():194:Sending event net.conn (CONN_DROP) to ConnectionPool DEBUG:core.event:ConnectionPool.notify():86:ConnectionPool got event of type "net.conn" DEBUG:network.connpool:ConnectionPool.remove():294:Closing connection id: 139697871065392 DEBUG:network.connpool:ConnectionPool.parse_packets():332:Removing connection id: 139697871065392

Posted in Programming, Python | Tagged python log line numbers method names functions | Leave a comment

Python GUI in Linux frame buffer

Posted on July 2, 2011 by Wolverine

I was recently wondering about how can I display some graphics in Linux frame buffer with Python. It seems that if your text terminal is initialized in frame buffer mode (most of modern distributions do this), what you need is our old friend PyGame. It supports a wide selection of video drivers among which there is also some for utilizing frame buffer. If you want to play with or develop GUI applications in Python without X Window then here’s a simple example how.

PyGame checks for environment variable SDL_VIDEODRIVER that specifies which video driver PyGame should utilize for initializing display. This can come handy if you are running pure PyGame based application and want to use it if you don’t have or want X.org. Unfortunately this also means that you have to manually export SDL_VIDEODRIVER variable before running your application. There’s a little trick you can do for automatically determining whether you want to run your app in frame buffer.

When application is run within X.org context, the DISPLAY environment variable is passed to your application (in fact it’s set in the system context for the app, but that doesn’t matter). You can test if app was run from X Window system like this:

import os
disp_no = os.getenv('DISPLAY')
if disp_no:
    print "I'm running under X display = {0}".format(disp_no)

When we know if we have X Window running or not we can act accordingly. What we want to do in our app is to check for the X display first, the check if SDL_VIDEODRIVER is set and if not we want to set it automatically. This is how we do it:

driver = 'directfb'
if not os.getenv('SDL_VIDEODRIVER'):
    os.putenv('SDL_VIDEODRIVER', driver)

As PyGame documentation states there are different video drivers available under Linux: x11, dga, fbcon, directfb, ggi, vgl, svgalib, aalib. Obviously we are interested only in three: directfb, fbcon and svgalib. Those are the drivers for console frame buffers. Which should you choose depends on the Linux distribution you have. Ideally you want directfb, because it’s probably the most advanced and modern type of frame buffer library. You can either let your users choose the appropriate driver before you launch your primary app or you can test which is the most suitable. This is really simple:

drivers = ['directfb', 'fbcon', 'svgalib']

found = False
for driver in drivers:
    if not os.getenv('SDL_VIDEODRIVER'):
        os.putenv('SDL_VIDEODRIVER', driver)
    try:
        pygame.display.init()
    except pygame.error:
        print 'Driver: {0} failed.'.format(driver)
        continue
    found = True
    break

if not found:
   raise Exception('No suitable video driver found!')

This will automatically try all drivers in the list and stay with the first one that works. If none is found an Exception will be raised.

When we found appropriate driver we now have to initialize our screen and display mode. With frame buffer we have to determine it’s resolution before we can appropriately set mode. We can do this like this:

size = (pygame.display.Info().current_w, pygame.display.Info().current_h)
screen = pygame.display.set_mode(size, pygame.FULLSCREEN)

pygame.display.Info() is a special object that carries information about current display. We use current_w and current_h variables to set new size for PyGame display. If we didn’t do it, we would not be using the whole frame buffer resolution which would look awkward. I’m not sure if it’s possible to switch between frame buffer modes after fb console terminals were initialized.

After we’ve called pygame.display.set_mode() we can proceed with our application development as usually.

Posted in Game Programming, Linux, Programming, Python | Tagged console, directfb, display, frame buffer, gui, mode, pygame, Python, sdl | 21 Comments

Paster server encoding problems

Posted on June 20, 2011 by Wolverine

If you have stumbled upon strange unicode encoding errors within Paste server like this one:

  File "/usr/lib/python2.6/site-packages/paste/httpserver.py", line 437, in handle_one_request
    self.wsgi_execute()
  File "/usr/lib/python2.6/site-packages/paste/httpserver.py", line 290, in wsgi_execute
    self.wsgi_write_chunk(chunk)
  File "/usr/lib/python2.6/site-packages/paste/httpserver.py", line 150, in wsgi_write_chunk
    self.wfile.write(chunk)
  File "/usr/lib64/python2.6/socket.py", line 292, in write
    data = str(data) # XXX Should really reject non-string non-buffers
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 1280: ordinal not in range(128)

You can easily fix those encoding bugs typing into config/environment.py (if you are using Pylons):

# Workaround stupid Paster encoding bugs
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

just before def load_environment() function. Module reload is needed, because sys.setdefaultencoding() is sometimes not accessible from command-line. This will make your encoding problems within Paster go away.

Posted in Coding, Programming, Pylons, Python | Tagged encoding, error, fix, paste, pylons, Python, unicode | 2 Comments

Pylons, ContextObj attribute error, strict_c and fix

Posted on June 20, 2011 by Wolverine

I’ve recently upgraded my Pylons to version 1.0 for the CMS project I’m working on and few things broke as a result. There were some changes in dependency packages as well, most notably changing of url_for to url which had to be replaced everywhere within the code.

However the most irritating problem was with attributes of template context, mainly because of their absence in the templates. Template context in previous versions of Pylons worked in a manner that whenever there was no attribute in the template context object “c” it would be created on the fly being an empty string. Pylons 1.0 changed this behavior so you had to explicitly create the attribute. Given the scale of the project I’m not willing to go through each template out of more than hundred and then check whether or not I forgot some attribute.

Googling a little bit gave answer that you had to insert this line into config/environment.py:

    config['pylons.strict_c'] = False

just before your template customization, but it didn’t work!

I’ve been looking for an answer to this problem for quite some time and found that if you change the above line from pylons.strict_c to:

    config['pylons.strict_tmpl_context'] = False
�

it will happily work. So there you are. 🙂

Posted in Coding, Programming, Pylons, Python | Tagged attribute error, mako, pylons, Python, strict, template context | Leave a comment

How to get class method name in Python?

Posted on June 11, 2011 by Wolverine

There are two ways to do this. You can either do:


def whoami():
    """Returns name of the method."""
    import sys
    return sys._getframe(1).f_code.co_name

def callersname():
    """Returns name of the caller."""
    import sys
    return sys._getframe(2).f_code.co_name

Or you can simply do:


import inspect
inspect.stack()[0][3]

This will give you method name. Pretty simple?

Posted in Coding, Programming, Python | Tagged class method, name, Python | Leave a comment

PocketSpinx voice recognition with GStreamer and Mandriva

Posted on May 3, 2011 by Wolverine

I’ve recently bumped upon an open source speech recognition software for Linux based on the CMU Sphinx project called PocketSphinx. Unfortunately standard Mandriva repositories doesn’t have this PocketSphinx library in the 64 bit arch version. So I’ve downloaded SphinxBase and PocketSphinx and compiled them myself, by the usual means of ./configure –prefix=/usr –libdir=/usr/lib64 && make && make install

It all went well, except for the little fact I have noticed when trying to use some examples on how to use PocketSphinx from Python (yes, there are bindings for it). Unfortunately, the compilation omitted GStreamer plugin for PocketSphinx which was needed. I’ve figured out that you have to install from URPMI:
urpmi lib64gstreamer-plugins-base0.10-devel
and then recompile PocketSphinx (remember to do “make clean” first, or else nothing will change after reconfiguring). The plugin compiled successfully and I can now use PocketSphinx from Python.

Posted in Mandriva, Python | Tagged gstreamer, mandriva, Python, sphinx, voice recognition | 1 Comment

Using SQLAlchemy-Migrate with Elixir model in Pylons

Posted on April 28, 2011 by Wolverine

I’ve recently come across the database schema migration toolkit that builts on top of the awesome SQLAlchemy ORM library named SQLAlchemy-Migrate. Since I’m working mostly on Pylons projects with extensive schema written using SQL Elixir, I was wondering if I could make SQLAlchemy-Migrate to work with my Elixir models. I’ve managed to make it work perfectly. But before I tell you how, some quick review of the tools and libraries for those of you new to the world of SQLAlchemy.

SQLAlchemy is an Object-Relational Mapper, ORM in short, that maps Python object classes to database objects, i.e.: tables, records, foreign-keys and all the other related stuff. Since SQLAlchemy is an ORM, you don’t have to write sometimes complicated SQL code for querying database. Instead you are working with Python objects and methods. SQLAlchemy adds overhead to SQL query execution times performance, but this tradeoff most of the time is worth it. Trust me. And besides, SQLAlchemy makes your project database engine independent, at least in theory.

Elixir is a declarative-style layer on top of SQLAlchemy that utilizes so called ActiveRecord design pattern. It allows you to use some shortcut methods known mostly from Rails ORM. Elixir is a library that fully interfaces SQLAlchemy and wraps some of the methods and declarations into one Entity. SQLAlchemy as of version 0.6 has declarative layer extension, but it doesn’t have those ActiveRecord methods. Using Elixir is easier than bare SQLAlchemy, but I should warn you, that without understanding SQLAlchemy first you’d be really lost. Elixir is much harder to debug when you stumble upon some kind of non-trivial error, since it’s a wrapper on top of SQLAlchemy.

However Elixir, with some caution and understanding on what you’re doing, can be really satisfactory to work with. Currently I’m working on advanced content management system and I can assure you that it’s model is really complicated. Without Elixir I would be lost long time ago, but since it’s very clean and transparent, just a quick glance at a class based on Elixir Entity gives you complete picture of what’s going on. I very much like it and would recommend using it, especially because you can mix Elixir and SQLAlchemy without any problems provided you know what you’re doing. But enough of this.

So. SQLAlchemy-Migrate. What is it? As I’ve said at the beginning it’s a database schema versioning toolkit built on top of SQLAlchemy. It doesn’t support Elixir out of the box, but with a little trickery you can make it work. What is database schema versioning? Well. It’s a record of changes in database structure between some arbitrary “points in time” called versions. Let’s suppose you have a table in Elixir model defined like this:

class Page(Entity):
    id = Field(Integer, primary_key = True)
    title = Field(Unicode(100))
    content = Field(UnicodeText, required = True)

This is fairly basic Elixir entity that really works. That’s all. You just run setup_all() method from Elixir and you can use this entity in your code querying, updating, inserting and making changes in database. Let’s assume that you use this model and it behaves really well, but when you develop your application further, you find it rather limiting.

For example you wish you could add a new field, so the entity would automatically update creation date and time on inserting new Page record to database. You can alter your model’s Entity like this:

class Page(Entity):
    id = Field(Integer, primary_key = True)
    title = Field(Unicode(100))
    content = Field(UnicodeText, required = True)
    ctime = Field(DateTime, required = True, default = datetime.datetime.now())

It will work without any problems, but… Not on existing database. You would either have to issue SQL query directly on the database to bring it up to date with your defined model (i.e. add new column ctime) OR in case of Pylons run paster setup-app which will drop your tables and recreate database from scratch. This is acceptable for new project or development databases, but not on testing (to some degree) or especially production ones, since you would loose all your data.

Here comes SQLAlchemy-Migrate. It’s an extension and toolset built on and working with SQLAlchemy models. It’s intended for schema migration, which is exactly resolving our problem changing and keeping up to date our database table’s structure with our application’s model. Migrate would create repository of changes for your application and create additional table in the database called migrate_version. After creating the repository your database would be at version zero. Official documentation states that this version should represent empty database, in our case empty model.

I’ll digress here a little bit. I’m mostly agreeing with this approach, because it ensures that you’ll build your model from the scratch and then use Migrate to upgrade it to current or desired schema version. There are however some cases when I find it unacceptable and not worth the work. One such case is the CMS I’m currently working on. It’s almost three years worth of work, containing nearly hundred tables, numerous Entity mix-ins, very deep relations of ManyToOne, OneToMany, ManyToMany and other, each table and model class has multitude of methods, decorators and much much more. Rewriting this model according to official documentation guidelines would be really hard work giving virtually no benefits.

What I wanted was to have an established baseline database model that could be extended for this CMS and changes could be committed to the baseline version and further. I need such kind of arrangement, because current version of this application with baseline model is deployed in production for few clients, that I have an agreement with for upgrade, when new versions of the software would be available. This way I could write migration scripts and upgrade their system instances to the latest version, without all the tedious work on writing alternation and schema change scripts in pure SQL, dumping current data and readjusting it to new schema.

SQLAlchemy-Migrate allows me to do this, because each change is registered sequentially as new version of the schema. For each change I write a script with two methods: upgrade and downgrade. First one brings changes up to date and the second drops those changes if you would like to return to previous version of the database schema.

Basically setting a baseline model is a good idea. Official Migrate docs state that database should be empty, but they don’t restrict it such as version zero of migration repository must be empty. Probably it’s related to the problems that might occur when working in teams and you would need to quickly bootstrap new application instance, but I won’t digress here. You can find much information on this topic in the Internet.

The only problem might be that with baseline model must be always in current version of migrate repository. Using your application models IS NOT recommended. There are many problems that can arise from such arrangement and that’s why, according to the Zen Of Python: “Explicit is better than implicit” which means that you have to keep two similar models – one in your application and one in the repository.

Yes. That’s somehow doubling the work, but when you setup a baseline just like me, you can simply track only changes in your migration repository model, updating application model accordingly. Shouldn’t be much work and it’s a good tradeoff for a comfort of having tested and verified database structure upgrade path. And most of those problems that can arise for keeping in sync your two models could be mitigated to some degree by using Mercurial or some other version control system.

Phew. Let’s get back to our problem. How to make SQLAlchemy-Migrate work with Elixir? Let’s suppose that you’d be interested in adding migration facilities to your existing project. I’d assume that you installed sqlalchemy-migrate via pip or easy_install. If so, you’re ready to go.

This is how I’ve setup my migrate repository. First step is to create a migration repository. I have assumed that it will reside under my project’s root directory. The project is under VirtualEnv control and it’s main directory is a subdirectory within the VirtualEnv root. So from my main project directory I have issued:

migrate create migration "Project Repo"

What this basically does is to create a migration subdirectory in the main project’s directory that will hold the sqlalchemy-migrate repository named “Project Repo”. After successful repository creation you must put your database under control. If you’re using Pylons, check out development.ini option sqlalchemy.url, which will give you necessary credentials for your database. To put your database under migrate version control you have to issue:

migrate version_control postgres://login:password@host/database

This will initialize your database for the version control with migrate repository version zero. You also can create a support script which will hold the credentials for your database, so you wouldn’t have to pass those to the migrate script. It’s a convenience script which wraps around migrate. We’ll call it migrate.py and it can be automatically created by typing in this command:

migrate manage manage.py --repository=migration --url=postgres://login:password@host/database

This will create migrate.py script in your project’s root directory, for repository residing under migration subdirectory and with credentials specified by –url option. Note that I’m using PostgreSQL engine, but you can use whatever engine that SQLAlchemy supports. Note however that SQLite doesn’t support full ALTER statements in SQL, so you might not get results that you’ve expected when using sqlalchemy-migrate. Now you can use:

python manage.py

as a shortcut for your migration repository administration. You can issue:

python manage.py version

to check at what version your migration repo is at and you can also use:

python manage.py db_version

to see what version of the migration model your database is currently at.

Ok. So we’re finally getting to the juicy part. We have our migration repository setup and ready to use. Let’s establish that our baseline model would be the first version of Page Elixir Entity presented in this post (this without ctime field). We want to create a script that will allow us to create a script for updating our baseline model with this option. Let’s say that we would name this change “Add ctime DateTime field to Page”. We can issue:

python manage.py script “Add ctime DateTime field to Page”

This will create a script named migration/versions/001_Add_ctime_DateTime_field_to_Page.py in the migration/versions subdirectory. It will look like this:

from sqlalchemy import *
from migrate import *

def upgrade(migrate_engine):
    # Upgrade operations go here. Don't create your own engine; bind migrate_engine
    # to your metadata
    pass

def downgrade(migrate_engine):
    # Operations to reverse the above upgrade go here.
    pass

There are two functions: upgrade – which brings database up to date to current version of the script (in this case version 1) and downgrade – which will rollback all the changes created by this version’s changeset (given of course you write the method code).

Since we’re using Elixir and not pure SQLAlchemy, it needs additional setup for each change script. Also having baseline model as our version zero for the migration, we must make sure that we won’t introduce conflicts with our model. If you’re probably familiar with SQLAlchemy, you know that Metadata object instance holds database metadata including database engine bindings. As you can see upgrade and downgrade methods take migrate_engine as their parameter. This is the engine specified by –url option on the migrate script, in our case manage.py which will feed the engine automagically. You have to import additional modules in our script:

from sqlalchemy import *
from migrate import *
from elixir import *
import elixir

This will import Elixir and it’s namespace, so we can use Entity, Field and it’s definitions directly, but will also allow us to utilize elixir namespace. This would be needed for engine binding to Elixir’s metadata, since we don’t want to use pure SQLAlchemy metadata. We want to update Entity (and database table) Page accordingly. To do this we should declare Page class which inherits from Entity like this:


import datetime

class Page(Entity):
    using_options(tablename = 'page', autoload = True)
    ctime = Field(DateTime, required = True, default = datetime.datetime.now())

This sets up Elixir Entity named Page, which will take our baseline model and add additional field named ctime. This however doesn’t do anything, except loading our Page model from database and declaring additional table field. Take a look at using_options. It has two parameters:

tablename – which holds a value of page; and
autoload – which is set to True.

What those options do? Option tablename is specifying table name in the database this model will resolve to. This is important, because sometimes when using ManyToMany, Elixir creates distinct names. Second autoload set to True, states that Elixir should load table definition from the database in a way our previously defined fields would be available to us, namely: id, content and title. Defining ctime would cause addition, not substitution of the table fields and in our model.

Note: You could also forget about “autoload = True” in the using_options if you had specified at the top of the script, just right after imports a directive:

options_defaults["autosetup"] = True

But since, we want to have control over our model, I didn’t use it. If you think it’s safe (not many relations), go ahead and use it. Just know what you’re doing. This Entity basically creates a model foundation for upgrade and downgrade functions. So let’s get into them. First we’ll take an upgrade:

def upgrade(migrate_engine):
    # Upgrade operations go here. Don't create your own engine; bind migrate_engine
    # to your metadata
    elixir.metadata.bind = migrate_engine

    Page.table.c.ctime.create()

Wait a second, you would say. Everywhere you’ve looked tutorials bound metadata to global metadata variable which was instantiated as metadata = MetaData(), even those describing Elixir usage with migrate. We’ll. Unfortunately they are all wrong.You have to bind your database engine to Elixir metadata and not the pure SQLAlchemy metadata. This way we don’t need any global MetaData instance and we will be sure that there are no problems with engine_dialect not found errors. That’s basically the magic trick. You could also bind metadata directly to the Elixir Entity table, but that’s too much fuss. If you need that kind of granularity, you probably wouldn’t be reading this tutorial after all. 😉

To actually create a new column for the table, we have to actually point to the table’s column definition and call .create() method on it, so that migrate would create this column with SQL’s ALTER statement. Just remember that SQLite has some limitations. PostgreSQL and MySQL should be perfectly fine. Ok. This is pretty straightforward. Without delay, we’ll do something analogous with downgrade function:

def downgrade(migrate_engine):
    # Operations to reverse the above upgrade go here.
    elixir.metadata.bind = migrate_engine

    Page.table.c.ctime.drop()

This causes the column definition to be dropped from database. You can now test our repository change script typing at your command prompt:

python manage.py test

And you should see:

Upgrading...
done
Downgrading...
done
Success

This will confirm that migration upgrade to version 1 and downgrade to version 0 is working fine. I could end just here, because that’s almost all you need to know, but unfortunately migrate is a little bitchy with Elixir, when using imported model Entities for using in relations. And with Elixir relations also. I have eaten my teeth on it, but I’ve finally managed to get it up and running perfectly.

Let’s suppose that we want to create another Entity named User, which for clarity would be defined as:

class User(Entity):
    id = Field(Integer, primary_key = True)
    login = Field(Unicode(30), required = True, unique = True)

and it would reside somewhere else, like project_directory.model.entities subdirectory. We want to create a ManyToOne Elixir relation in our Page model with the field name of created_by. Migrate needs a little trickery for this to work. So, let’s suppose that instead of ctime field, we want this created_by relation. We must edit our Page entity accordingly:


from project.model.entities import User

class Page(Entity):
    using_options(tablename = 'page', autoload = True)
    created_by = ManyToOne('User')

This will initialize our model instance of Page with relation to User entity. However, migrate does not know anything about ManyToOne relation, so we would have to fallback to our old, good SQLAlchemy within upgrade and downgrade functions. Since Elixir uses Integer fields for ManyToOne relations with the name constructed as: <RELATION_FIELD_NAME>_<PRIMARY_KEY_FIELD>, given RELATION_FIELD_NAME = created_by, and PRIMARY_KEY_FIELD = id, we would end up with created_by_id field of Integer type. We have to create a relation manually within the upgrade function like this:

def upgrade(migrate_engine):
    # Upgrade operations go here. Don't create your own engine; bind migrate_engine
    # to your metadata
    elixir.metadata.bind = migrate_engine

    clmn = Column('created_by_id', Integer)
    clmn.create(Page.table)

    cbid_fk = ForeignKeyConstraint([Page.table.c.created_by_id], [User.table.c.id])
    cbid_fk.create()

    Index('ix_page_created_by_id', Page.table.c.created_by_id).create()

This will add created_by_id Integer field that will be used for relation. Note that we have to define this relation manually. cbid_fk variable would create ForeignKey Constraint on this Integer field which relates to User.id field and adds it to database table definition. Index will create index on this field to speed up lookup. This is how a full ManyToOne relation from Elixir looks under SQLAlchemy. Determining names for ForeignKey and Index should be pretty straightforward for you.

By analogy we define downgrade function. This time however we only need to drop this Integer field. Migrate will take care of removing ForeignKey and Index for us. That’s how this function looks like:

def downgrade(migrate_engine):
    # Operations to reverse the above upgrade go here.
    elixir.metadata.bind = migrate_engine

    Page.table.c.created_by_id.drop()

That’s how you can use Elixir with SQLAlchemy-Migrate using also relations. Perhaps this should be designed better on behalf of Migrate, but this package is pretty awesome and I commend it’s developers with a really useful and easy to use package, that’s really well documented.

Feel free to post comments if you have suggestions for improvement or perhaps better way to force Migrate to work in accord with Elixir. That’s all folks for now. Hopefully you found this useful. Cheers.

Posted in Elixir, Programming, Pylons, Python, SQLAlchemy | 13 Comments

How to retrieve current module path in Python?

Posted on April 19, 2011 by Wolverine

If you ever wondered how current path for module can be retrieved reliably in Python, there is a relatively simple way to do this. Each module has a __file__ variable that can be used for it.

The answer is pretty simple.

import os path = os.path.abspath(__file__) dir_path = os.path.dirname(path)

And that’s all. If you put this in your module, dir_path variable represents current directory of the module. That’s pretty cool thing, because it doesn’t matter what is the directory the program was called from, dir_path will always represent proper path for each module. And one of the benefits for using this simple construct is an ability to modify sys.path.

What for you might ask. Well, let’s assume that you have your own project that spans multiple subdirectories. However, you have also supporting scripts or programs which are used only from command-line and de facto they are not part of your primary application. They can be used for example for database migration, sanitizing, fixing, cleanup, backups and all other administrative purposes you can imagine, which are not part of your application. If your main project has it’s own virtual environment like e.g. using Django or Pylons, it’s hard to use module import namespace from within the subdirectory, such as the subdirectory was your import root. You always have to add primary project namespace and that somehow defies the purpose of supporting scripts, which are used for the primary application, but are not part of it in the strict sense.

If you have your project virtual environment set up and primary application resides under for example /myawesomeproject/ and your supporting scripts reside under /myawesomeproject/scripts, and some model modules for supporting scripts reside under /myawesomeproject/scripts/lib then if you’re importing modules from this lib subdirectory you’d have to do:

from myawesomeproject.scripts.lib import somemodule

But when you want this script to be used as a standalone, without reliance on the virtual environment of the primary application, you would certainly want to use:

from lib import somemodule

And here’s where the dir_path variable comes to exactly serve this purpose. When executing your supporting script, you can do:

PYTHONPATH=$PYTHONPATH:. python myscript.py

But then, it can be a problem for your users, because they have to remember they should force Python system path change. However with dir_path you can put this code into the beginning of your main script’s module to automatically set correct system path, so the imports would work without any problems:

import sys, os path = os.path.abspath(__file__) dir_path = os.path.dirname(path) sys.path.append(dir_path)

And that’s all. After putting this code at the very beginning of your main module for the script, you can proceed with other imports (they will treat main module path as import root) and your script’s program code.

PS. Just remember to put empty __init__.py files to each subdirectory you would use as a collection for your imports for this to work correctly.

Posted in Coding, Programming, Python | Tagged python path system relative module | Leave a comment