Introduction
Every time I start a new large web project I am tempted to switch languages. Surely there’s something better? Faster? More fun? More robust? It seems the grass is always greener.
But then just as I am on the point of switching, there always appear good reasons not to switch. Sometimes it’s fear of the unknown, so I end up on an endless trail of late night web-surfing. Half constructed websites that promise the world’s greatest framework, appserver, or template language. I get really excited! These guys have the right idea, just what I’ve been looking for! Then I check the news section that hasn’t been updated in months or years and I sigh. A product is no use to me if there’s nobody to support it. I will find bugs in it so I just can’t take that risk.
You could say that language shouldn’t matter, and your software should be componentized and accessible over RPC or language bridges. Unfortunately those architectures have their own issues. Firstly the burden of having multiple versions of the same APIs, not to mention multiple developer skillsets.
So language really matters. Having downloaded and played with all the crap under the sun, what is the best language choice?
Disclaimer
What matters to me
The areas of assessment I have chosen to focus on are personal. They are insufficiently specified. They are not orthogonal (e.g. cacheing is much easier in a long running server) but I have still allocated separate sections of equal importance because they are equally important at a high level.
High traffic
It is often quoted that performance is a secondary consideration. For me it is not. It is primary, alongside speed of development, not below it in the order of priorities. That’s because I am not interested in “mom and pop” sites (playing around in development) but in real world, large scale deployments (e.g. 1000 hits per second). If I just wanted a simple web framework to sit on a single colo box this assessment would be very different, and would probably end with Ruby on Rails or Django.
First draft
I haven’t looked into all the areas in detail of each language yet. This is a work in progress. There are many details to my assessments that I have left out, and many gaps in my knowledge when it comes to PHP and Python production deployments (where that occurs I have given the benefit of the doubt and allocated a reasonable score, not a harsh one just because I haven’t found anything good yet).
Dynamic languages versus Java
The divide is really between Java on one side and the dynamic languages on the other. Java has great appservers that take advantage of Java’s easy stable threading. The dynamic languages currently run on lower quality VMs and have various threading issues and harder to use thread libraries.
Embedding dynamic language interpreters in Apache is the most common way of deploying these solutions. The difficulties in getting good per-machine performance are illustrated by the restrictions mentioned in the following extract (for mod_python but the same principles apply to mod_XXX)
mod_python
- Where shared data needs to be visible to all handlers, regardless of which child process they execute in, and changes made to the data by one handler are immediately available to another, including any executing in another child process, an external data store such as a database or shared memory must be used. Global variables in normal Python modules cannot be used for this purpose.
- Access to and modification of shared data in an external data store must be protected so as to prevent multiple threads in the same or different processes from interfering with each other. This would normally be achieved through a locking mechanism visible to all child processes.
- A handler must be re-entrant, or simply put, be able to be called concurrently by multiple threads at the same time. Data which needs to exist for the life of the request, would need to be stored as stack base data, thread local data, or cached in the request object. Global variables within the actual handler module cannot be used for this purpose.
- Where global data in a module local to a child process is still used, for example as a cache, access to and modification of the global data must be protected by local thread locking mechanisms.
Assessment area
Each area is scored from 1-5, apart from base language which is scored 1-10.
- base language (e.g. ability to evaluate code from strings is great for dynamic template creation in a database and is something sorely lacking from Java)
- setup (always more important than you give it credit for. How many environments do you need to create? Easy to upgrade in packages? Is it supported at your host?)
- standards (best if you’re trying to hire people if the community has rallied around 1 or 2 frameworks not 10 competing ones)
- templates (simple powerful templating language is a must)
- server (nice to have at least the option of a long running server)
- database drivers (is there a robust fast driver for my choice of database)
- persistence APIs (this is a big deal)
- cacheing (it would be nice if I could keep some stuff like config info in memory, and some other stuff like sessions in remote caches, clustering options)
Java (Score: 31)
Base language Score: 2
Unicode since day 1 makes me add another point here (unicode won’t be in PHP until PHP 6). Flexible in theory: multiple languages run on the JVM (PHP, Python etc) so technically you can dump “Java” the language but still use the JVM and write your web code in PHP or whatever (e.g. see caucho’s PHP implementation). In practice there are a lot of issues with this. In general, Java’s too low level for web work (think rigid “interface->abstract base->multiple implementation” code patterns). It just feels clunky and slow. JSP recompilation takes an age. If you go the servlet way, reloading your webapp takes even longer with a big model. Not to mention massive memory usage.
Setup Score: 2
Installing a JVM and appserver is easy enough but somehow it’s still a hassle.
Standards Score: 3
Options are few. There’s just one JSTL and JSP which is a good thing. Frameworks are mostly crap (very slow to develop in) so will ignore those.
Templates Score: 4
A few choices. Powerful and robust. JSP/JSTL well established. JSP should be nice and dynamic but ironically suffers from not being object oriented (due to the way classloader works). Easy tag creation. Alternatives are the well established Velocity or freemarker. etc
Server Score: 5
Plenty of very fast, well proven, multi-threaded app servers available.
Database Drivers Score: 5
Solid vendor approved, portable, JDBC. Scrollable result sets etc.
Persistence APIs Score: 5
Hibernate all the way.
Cacheing Score: 5
Great choice of excellent caches (ehcache etc)
Python (Score: 30)
Language Score: 8
Nice but not good for embedding due to whitespace significance.
Setup Score: 3
Easy enough to install mod_python. App servers add a layer of hassle.
Standards Score: 2
Always liked the look of Django and not much else. Too many web frameworks, templating systems and app servers, most out of date, all pretty hacky and over-stretched for developer time.
Templates Score: 3
Havent found one I like yet. Spyce makes grandiose claims for itself but at first glance this seems unjustified. It seems very lacking in features to me (check out their “core” tag library). Mind you, it’s nicer than PSP – have you seen how you end a block? That’s thanks to whitespace significance (which works ok elsewhere in Python but not in this context).
Server Score: 4
If you go the appserver route what options are there?
medusa looks cool. Webware is quite well documented at least but as an example when you run an install script to create an app space, it generates a “sessions/” folder on your filesystem (is it targetted at play development, not large production sites?). TwistedMatrix http server has a great name and should perform well (as Twisted does for general network programming) but their own docs admit the new version is not ready for primetime.
Database Score: 4
C stuff
Persistence APIs Score: 3
DB Api is reasonable but PyPersist and other OO mapping?
_Cacheing: Score: 3
Limited cacheing options.
PHP (Score: 31)
Base language Score: 8
Quite ugly but there’s no denying it gets the job done in the web arena. The single namespace is a pain at first glance but is not in practice. PHP5 copied some good bits from Java.
Setup Score: 5
mod_php or CGI mode very easy to setup. Feels lightweight. Well suported by distros.
Standards Score: 4
Frameworks are mostly crap (feature poor) so will ignore those. If Zend framework (when finished) becomes standard that will be an awesome boost. Unfortunately I dont like it much but I can live with it if it becomes well supported.
Templates Score: 4
PHP and smarty. Both quite good.
Server Score: 1
No decent appserver that I’ve found. You could try Quercus I guess.
Database Drivers Score: 4
C drivers.
Persistence APIs Score: 3
Pear DB stuff and other persistence frameworks are reasonable.
Cacheing Score: 2
Limited to memcached really.
Conclusion
There isn’t a winner. Just as I suspected. The grass is always greener.