Sunday, May 24, 2009

Python runtimes for OpenERP, Jython, Unladen Swallow, where do we stand?

Hi,

little time ago, I decided to showcase the world a proof of concept of what I assume the most mature/promising oss ERP so far - OpenERP - running on Jython, the Java based Python interpreter. I made a branch, adapting the OpenERP database connection use zXJDBC+JDBC Java Postgres drivers, escaped code that were not running on Jython yet (C libs wrappers). That attempt has been related here:

But before all, just to make it clear: I'm posting here an update of the Python runtime situation for OpenERP. That doesn't mean I consider this is a very important issue for OpenERP, it's just that there are reasons to hope for a convergence toward the Java platform over the year, nothing more. On the contrary, my only very hot topic for OpenERP now is code quality, testing and business code refactoring/cleaning. Still, that might be for an other post, I'll now only post an update about the Java convergence and Python runtime perf status.
Now, 3 months after my first post, were do we stand? There are mostly plenty of great news as I'll expose.

But again, why Jython?
1) Speed in the long run
I prefer to make it clear: for one, OpenERP v5 is now fast enough for a lots of situations. I can't see any performance issue to deploy it in large companies and it's actually being done in several already (it might not always compete in term of native features or quality in large companies yet however). And second, yes I know that in most enterprise systems, the bottleneck rather lies in the database load and in the client/server bandwith usage/latency. Now this is especially true when the language layer is fast, for instance with Java. In Python, and OpenERP especially, the language layer might not be the bottleneck, it's still common to see 30% of CPU usage eaten by Python. So any significant improvement here will also help.

And SQL requests of OpenERP are already quite optimized in v5. For instance they try as much as possible to handle n records in an O(1) or O(log(n)) time. They also have database transactional cache thanks to their fields.function + store=True/Hash of invalidation trigger feature. They also model SQL hierarchy using MPTT. So at then end having a faster Python runtime, whatever it is will certainly help.

2) Enterprise support
Speed is one thing, but an other thing that Jython would bring to large corporations is: easy enterprise integration. In Jython, any Java class/lib can be seamlessly invoked. And Java, thanks to its design by comity mantra has at least all sort of mature features the enterprise world like to rely on: good SOAP integration (we recently had some pain with the Python Soappy implementation on the contrary), standard ESB's, JCR, JDBC, JMX, BI tools, ETL's...


Google Unladen Swallow is likely to provide the best short and mid term performance:
Still, OpenERP is probably to enjoy the best short and mid term performance thanks to Google commitment with their 'Unladen Swallow' branch of CPython. In an interview, the creators of Unladen Swallow explained that Google has a lot of "legacy" code where the high level plumbing is done in Python while the low level algorithms are done in C++ and invoked via Swig wrappers. So they said they would certainly have look in Jython for long term performance, but they also need short term performance for standard CPython and that's why they launched the Unladen Swallow project.

Unladen Swallow, wants to bring a simple JIT (Just In Time) compiler to Python and do other standard optimizations. They claim they could rich 5x perf increase by the end of 2009. Those improvements will be merged back in the standard CPython distros. So far they already have a solid 20% perf improvement. They might even try to remove the Python GIL efficiently.

In the long run, even if they relied upon the LLVM, that's however unlikely they build a Virtual Machine as sophisticated as the Java one (often considered as the best VM, especially now that they added support to dynamic languages with the InvokeDynamic JSR). So if Jython then focus on performance as they announced at PyCon 2009, Jython might win the perf war in the long run. Anyway the enterprise integration I was talking about would only be achieved on Jython.


OpenERP on Jython might be a reality by Q1 of 2010.
Yes, several very good news allow me to hope this could be done. Let's enumerate them:
most of the C libs wrappers will be avoided, Tiny, the editor of OpenERP took that commitment:
- Turbogears for the web-client layer -> done! The full blown Turbogears is no more required, on the web-client trunk branch (already very stable), Tiny completely removed the Turbogears dependency and only depends on CherryPy (pur Python) now. At the discretion of few Jython bugs, OpenERP web-client should now run unchanced on Jython.
- mx.DateTime: Tiny took the commitment to remove it in the very next months!
- the minidom XML lib: Tiny took the commitment to replace it by the Etree lib which is implemented on Jython. This as already been started in community and editor branches:
https://code.launchpad.net/~ajm-tech/openobject-server/server-50-lxml-fields-view-get
https://code.launchpad.net/~openerp-commiter/openobject-server/pap-etree-trunk
- libxml usage will be removed too.
- I investigated a bit around ModJy, the Jython standard Java servlet wrapper, and it's should be quite easy to put the OpenERP server on ModJy using a WSGI wrapper.


The best is what is being done at the database/ORM level: OpenERP is moving to SQLAlchemy!
My work on the Jython branch has been to workaround the psycopg2 native driver and replace it by zXJDBC, the Jython wrapper for JDBC + Python DPAPI2. Well the good news is that OpenERP is going away from psycopg2. They will indeed move all the SQL generation logic to the standard and awesome SQLAlchemy ORM. And SQLAlchemy is Jython compatible out of the box. That's great to see them coming back to standards (SQLAlchemy was not mature when OpenERP was already in business).
Tiny already began that work on a dedicated branch:
https://code.launchpad.net/~openerp-commiter/openobject-server/server-sa
and I encourage the community to help them do it right. Also notice that they already used SQLAlchmey for SQL generation in their emerging BI OLAP cube.

SQLAlchemy is not only a good news for Jython support. It also means that OpenERP domain (SQL filtering) logic is going to become more subtle and powerful. Currently, that part of OpenERP was not as the level of the best ORM's around such as SQLAlchemy or Hibernate. Even ActiveRecord (Rails) was doing a smarter joint job. Still, among the mature ERP's it was probably the best available, years ahead Compiere or Openbravo with their millions of pure pl/SQL legacy code (Openbravo recently supported Hibernate in their 2.50 platform, but their business code remains millions pl/SQL code lines lurking in XML CDATA statements and will still take years to be migrated to Java if ever it's to be achieved).

Oh and yes, That SQLAlchmey transition will support database independance as there are SQLAlchemy adapters for most of the market SGBD's. Especially, MySQL and Oracle will be supported. That's great because it will strengthen the OpenERP community. In some organizations Oracle is not an option (and it can arguably provide marginally superior perf), while MySQL has a much larger commuity penetration than PostgreSQL.

Unlike most large software projects, OpenERP technical quality improves slowly but keeps improving
With all that refactoring being done, OpenERP belongs to those rare kind of compex enterprise projects where the code improves over the time rather than those going desperately to chaos as businessmen add layers of crap over crap as the commercial deals are made. Also beyond the ORM transition and the webclient refactoring (removed Turbogears+changed for Mako templates), Tiny also rewrote their reporting engine an just allowed the Mako engine (a powerful standard).

My vision of software engineering is indeed pretty much a thermodynamic vision: code quality has to improve over the time much like a living system should not generate entropy in its scope. Design mistakes can very easily be done along the road and once tons of business code is build on non optimal models, then you are all screwed up: any energy you will invest in trying to correct a part of the system will actually result in a larger energy waste, pretty much like if you would like to refresh your home by letting your fridge door open: all you will get is actually get it warmer.

Why this claim? Because in business driven software, it's always easier in the short term (the only business scope as the current economic crisis teaches us), to make money supporting your existing customer base. Meaning that once mistakes are done, it's always too expensive - and thus never attempted - to re-think the core abstractions. Instead, those companies tend to make several abstractions cohabit, prefer invest in building chaotic non abstracted code instead of re-factoring the concepts that could be abstracted in the core platform. And once you have millions of lines of chaos code, you are never going to factor them back into clean/intelligible/maintainable concepts. That's pretty much how the money has been wasted in today's proprietary ERP's and even a in few claimed open ones. So they might have lot's of working features appealing to collapsing traditionnal industries, they are facing exploding maintenance costs and can't adapt further to the new emerging business as will the few viable oss ERP's do.


So overall, by early 2010, I think we should be able to see an OpenERP distributed also as a war archive you'll deploy in one click on your enterprise webserver such as Glassfish. This might shift OpenERP adoption from small SME's to large organizations which is always good because it will fuel more heavy engineering inside the project. And at the end, even the small businesses will benefit the resulting quality/usability improvement. Still I don't expect extra perf from Jython before Q2 2010 at least.


OK, enough said about Jython and Python runtimes. All that enthusiasm should also not shadow my very urgent press to Tiny to increase their quality (less bugs, less regressions). An efficient way to achieve this will be by taking the test first approach way more seriously than they have been doing so far. They just took a commitment to this too at their May Community meeting so we will see. Given what they did in the recent past (transition to international, transition to an open distributed forge, English documentation, transition to an editor business model, performance issues fixes...), I hope they will manage that challenge too.