94f7 axeorcat: Coding
Go to content Go to navigation Go to search

PHP MDB2 Quick Guide

The basics


require_once 'MDB2.php';

// normally want just a single connection instance
$db =& MDB2::singleton('pgsql://user:pwd@server/dbname');

// for autoExecute, MDB2_AUTOQUERY, limitQuery, etc
$db->loadModule('Extended');

$db->escape( $s );

// core selects
$db->queryAll('SELECT * FROM people');
$db->queryRow('SELECT * FROM people WHERE id = 1');
$db->queryCol('SELECT name FROM people');

// using Extended module
$db->getOne()
$db->getRow()
$db->getCol()
$db->getAll()
$db->getAssoc()


// MDB2 "types" are different from PHP types 'text' => '', 'boolean' => true, 'integer' => 0, 'decimal' => 0.0, 'float' => 0.0, 'timestamp' => '1970-01-01 00:00:00', 'time' => '00:00:00', 'date' => '1970-01-01', 'clob' => '', 'blob' => '', )

Transaction blocks


if ($mdb2->supports('transactions')) { $mdb2->beginTransaction();

}
$result = $mdb2->query('DELETE people');
if (PEAR::isError($result)) {

if ($mdb2->in_transaction) { $mdb2->rollback(); // echo 'rollback'; }
} else { if ($mdb2->in_transaction) { $mdb2->commit(); // echo 'commit'; }
}

Extras


// to allow empty strings
$mdb2->setOption( 'portability', MDB2_PORTABILITY_ALL ^ MDB2_PORTABILITY_EMPTY_TO_NULL
);

Alternatively, get higher level and check out symfony

Ruby performance notes

FROM mongrel users

The mysql.rb is just for development, and it sucks horribly.

If you use inject() anywhere, benchmark that usage. inject() can be
elegant as heck, but it's also often as slow as it is elegant, and it
tends to create a lot of junk objects that have to be garbage
collected. It can sometimes be phenomenally slow compared to a
solution that doesn't use it.

Do you do any string creation, aggregating from smaller pieces? Use
<< instead of +.

In hash lookups with strings, hash dups the key. If you give it a
frozen string it's faster. If you do a lot of hash lookups this can
add up.

In general, try to be aware of your frivolous object creation. Ruby
isn't terribly slow with object creation, but it's still more
expensive to use a new object than to reuse an old one, especially
when garbage collection is taken into account.

*best* off using mod_xsendfile from
apache, which lets you write the giganto data to a file and then put an
"X-Sendfile" header in the response pointing at this file. Apache will
then pick the file up as the response body and shoot it to the client
without bothering you.

First, the memory leak was because of a bug in how the GC in Ruby was collecting threads after they used a Mutex. Don't ask me why, but switching to Sync fixed it. Problem is, this also causes the leak on Win32. Still working on that problem.

explain this:
http://pastie.caboo.se/10194
vs.
http://pastie.caboo.se/10317
First one leaks, second one doesn't (with graphs even).

Zed Shaw:
1) Threads suck ass in Ruby.
2) The GC sucks even worse.
3) Combining two forms of suckage makes for ultra suckage.

Ruby templating

I want a ruby templating system that is

  • clean syntax, ideally with customizable special characters, and which ideally allows the template itself to be valid XML if i want
  • quick and dirty (i.e. no XML based parsing step required)
  • works for any type of text (i.e. no XML based parsing step required!!)
  • support for templates as files (dont want to have to supply strings in memory)
  • easy way to include other templates
  • all the basic constructs must work well (conditionals, loops etc)
  • powerful & extendable

All this you would expect if you are coming from Java or Python or Perl which have several excellent choices which tick all the boxes.

As of 25 Sep 2006 this seems to be the state of play:

erubis
More powerful and faster version of erb.

erb/eruby
Builtin to ruby distro. Insists on evaluating from a binding. eruby is marginally faster in my experience. Default ugly ASP/JSP like syntax. Has few features itself but you can craft on stuff as it uses the power of Ruby.

cs/Template :: http://cstemplate.rubyforge.org/
written in C but can’t do a simple conditional like [ if foo.length > 2 ]

PageTemplate :: http://coolnamehere.com/products/pagetemplate/
reasonable syntax, immature, unsupported, include other files? Proper object path navigation missing

Canny :: http://canny.sourceforge.net/documentation.shtml
Smarty port. Easy, file support (includes), conditionals. Unsupported.

Kwartz :: http://www.kuwata-lab.com/kwartz/
Tries to be too clever. Slow. Nesting gets complicated.

Amrita :: http://amrita.sourceforge.jp/
Tries to be too clever. Slow.

XTemplate :: http://xtemplate.sourceforge.net/
Tries to be too clever. Slow.

Conclusion

erubis seems to be the winner. Some of the other systems could be ok but have no momentum behind them.

Obfuscated Ruby

#!/usr/local/bin/ruby
puts(
  ( `svn pl -R`.scan(/\S.*'(.*)':\n((?:  .*\n)+)/)\
    .inject({}) { |h, (d, p)| h[d] = p.strip.split(/\s+/); h }\
    .select { |d, ps| ps.include? 'svn:externals' }\
    .map { |xd, ps| [xd, `svn pg svn:externals #{xd}`] }\
    .map { |xd, exts| exts.strip.split
       (/\s*\n/).map { |l| xd + '/' + l.split(/\s+/).first } }\
    .inject { |a, b| a + b }\
    .map { |d| "cd #{d} && svn up 2>&1" } \
    << 'svn up . --ignore-externals 2>&1'
  )\
  .map { |cmd| [cmd, Thread.new { `#{cmd}` }] }\
  .map { |cmd, thread| "#{cmd}\n#{thread.value}" }.join("\n")
)
From here

Database object or page cacheing?

I’ve found in practice (in common with a lot of other people) that database object cacheing like Hibernate’s 2nd level object and query caches are not that useful in general for my applications (it’s still too low level) and can complicate matters. I really want to cache higher level output, such as entire blocks of page content and let pages be rebuilt from the db without intervening db caches.

If your cache supports tagging, an interesting idea for web page cacheing is to work out which db objects a page depends on, and apply id tags to the cached page for those objects. When any of those db objects are updated, an interceptor can automatically expire tagged content from the page cache.

So for example, a page that depends on Article id 3445 and Message id 3938 in schema db is cached with tags [“db.Article.3445”, “db.Message.3938”].

The page builder generates the cache key and these tags, and any others such as generic labels like “content.latest” which can be used for periodic expiry of large numbers of cached articles which are tagged with that tag.

Java vs Python vs PHP vs Ruby. Best language for web programming?

Introduction

Every time I start a new large web project I am tempted to switch languages. Surely there’s something better? Faster? More fun? More robust? It seems the grass is always greener.

But then just as I am on the point of switching, there always appear good reasons not to switch. Sometimes it’s fear of the unknown, so I end up on an endless trail of late night web-surfing. Half constructed websites that promise the world’s greatest framework, appserver, or template language. I get really excited! These guys have the right idea, just what I’ve been looking for! Then I check the news section that hasn’t been updated in months or years and I sigh. A product is no use to me if there’s nobody to support it. I will find bugs in it so I just can’t take that risk.

You could say that language shouldn’t matter, and your software should be componentized and accessible over RPC or language bridges. Unfortunately those architectures have their own issues. Firstly the burden of having multiple versions of the same APIs, not to mention multiple developer skillsets.

So language really matters. Having downloaded and played with all the crap under the sun, what is the best language choice?

Disclaimer

What matters to me
The areas of assessment I have chosen to focus on are personal. They are insufficiently specified. They are not orthogonal (e.g. cacheing is much easier in a long running server) but I have still allocated separate sections of equal importance because they are equally important at a high level.

High traffic
It is often quoted that performance is a secondary consideration. For me it is not. It is primary, alongside speed of development, not below it in the order of priorities. That’s because I am not interested in “mom and pop” sites (playing around in development) but in real world, large scale deployments (e.g. 1000 hits per second). If I just wanted a simple web framework to sit on a single colo box this assessment would be very different, and would probably end with Ruby on Rails or Django.

First draft
I haven’t looked into all the areas in detail of each language yet. This is a work in progress. There are many details to my assessments that I have left out, and many gaps in my knowledge when it comes to PHP and Python production deployments (where that occurs I have given the benefit of the doubt and allocated a reasonable score, not a harsh one just because I haven’t found anything good yet).

Dynamic languages versus Java

The divide is really between Java on one side and the dynamic languages on the other. Java has great appservers that take advantage of Java’s easy stable threading. The dynamic languages currently run on lower quality VMs and have various threading issues and harder to use thread libraries.

Embedding dynamic language interpreters in Apache is the most common way of deploying these solutions. The difficulties in getting good per-machine performance are illustrated by the restrictions mentioned in the following extract (for mod_python but the same principles apply to mod_XXX)

mod_python


  • Where shared data needs to be visible to all handlers, regardless of which child process they execute in, and changes made to the data by one handler are immediately available to another, including any executing in another child process, an external data store such as a database or shared memory must be used. Global variables in normal Python modules cannot be used for this purpose.

  • Access to and modification of shared data in an external data store must be protected so as to prevent multiple threads in the same or different processes from interfering with each other. This would normally be achieved through a locking mechanism visible to all child processes.

  • A handler must be re-entrant, or simply put, be able to be called concurrently by multiple threads at the same time. Data which needs to exist for the life of the request, would need to be stored as stack base data, thread local data, or cached in the request object. Global variables within the actual handler module cannot be used for this purpose.

  • Where global data in a module local to a child process is still used, for example as a cache, access to and modification of the global data must be protected by local thread locking mechanisms.

Assessment area

Each area is scored from 1-5, apart from base language which is scored 1-10.

  • base language (e.g. ability to evaluate code from strings is great for dynamic template creation in a database and is something sorely lacking from Java)
  • setup (always more important than you give it credit for. How many environments do you need to create? Easy to upgrade in packages? Is it supported at your host?)
  • standards (best if you’re trying to hire people if the community has rallied around 1 or 2 frameworks not 10 competing ones)
  • templates (simple powerful templating language is a must)
  • server (nice to have at least the option of a long running server)
  • database drivers (is there a robust fast driver for my choice of database)
  • persistence APIs (this is a big deal)
  • cacheing (it would be nice if I could keep some stuff like config info in memory, and some other stuff like sessions in remote caches, clustering options)

Java (Score: 31)

Base language Score: 2
Unicode since day 1 makes me add another point here (unicode won’t be in PHP until PHP 6). Flexible in theory: multiple languages run on the JVM (PHP, Python etc) so technically you can dump “Java” the language but still use the JVM and write your web code in PHP or whatever (e.g. see caucho’s PHP implementation). In practice there are a lot of issues with this. In general, Java’s too low level for web work (think rigid “interface->abstract base->multiple implementation” code patterns). It just feels clunky and slow. JSP recompilation takes an age. If you go the servlet way, reloading your webapp takes even longer with a big model. Not to mention massive memory usage.

Setup Score: 2
Installing a JVM and appserver is easy enough but somehow it’s still a hassle.

Standards Score: 3
Options are few. There’s just one JSTL and JSP which is a good thing. Frameworks are mostly crap (very slow to develop in) so will ignore those.

Templates Score: 4
A few choices. Powerful and robust. JSP/JSTL well established. JSP should be nice and dynamic but ironically suffers from not being object oriented (due to the way classloader works). Easy tag creation. Alternatives are the well established Velocity or freemarker. etc

Server Score: 5
Plenty of very fast, well proven, multi-threaded app servers available.

Database Drivers Score: 5
Solid vendor approved, portable, JDBC. Scrollable result sets etc.

Persistence APIs Score: 5
Hibernate all the way.

Cacheing Score: 5
Great choice of excellent caches (ehcache etc)

Python (Score: 30)

Language Score: 8
Nice but not good for embedding due to whitespace significance.

Setup Score: 3
Easy enough to install mod_python. App servers add a layer of hassle.

Standards Score: 2
Always liked the look of Django and not much else. Too many web frameworks, templating systems and app servers, most out of date, all pretty hacky and over-stretched for developer time.

Templates Score: 3
Havent found one I like yet. Spyce makes grandiose claims for itself but at first glance this seems unjustified. It seems very lacking in features to me (check out their “core” tag library). Mind you, it’s nicer than PSP – have you seen how you end a block? That’s thanks to whitespace significance (which works ok elsewhere in Python but not in this context).

Server Score: 4
If you go the appserver route what options are there?
medusa looks cool. Webware is quite well documented at least but as an example when you run an install script to create an app space, it generates a “sessions/” folder on your filesystem (is it targetted at play development, not large production sites?). TwistedMatrix http server has a great name and should perform well (as Twisted does for general network programming) but their own docs admit the new version is not ready for primetime.

Database Score: 4
C stuff

Persistence APIs Score: 3
DB Api is reasonable but PyPersist and other OO mapping?

_Cacheing: Score: 3
Limited cacheing options.

PHP (Score: 31)

Base language Score: 8
Quite ugly but there’s no denying it gets the job done in the web arena. The single namespace is a pain at first glance but is not in practice. PHP5 copied some good bits from Java.

Setup Score: 5
mod_php or CGI mode very easy to setup. Feels lightweight. Well suported by distros.

Standards Score: 4
Frameworks are mostly crap (feature poor) so will ignore those. If Zend framework (when finished) becomes standard that will be an awesome boost. Unfortunately I dont like it much but I can live with it if it becomes well supported.

Templates Score: 4
PHP and smarty. Both quite good.

Server Score: 1
No decent appserver that I’ve found. You could try Quercus I guess.

Database Drivers Score: 4
C drivers.

Persistence APIs Score: 3
Pear DB stuff and other persistence frameworks are reasonable.

Cacheing Score: 2
Limited to memcached really.

Conclusion

There isn’t a winner. Just as I suspected. The grass is always greener.

The default is lax security

Time and time again with products intended for server applications, the default settings are for development rather than production. This just hast to be wrong thinking - I'd expect it frin Microsoft perhaps.

_Apache_ Why would you want Apache revealing its and its modules version numbers to the worl? ServerTokens should be Prod by default, not as an option. _PHP_
;;;;;;;;;;; ; WARNING ; ;;;;;;;;;;; ; This is the default settings file for new PHP installations. ; By default, PHP installs itself with a configuration suitable for ; development purposes, and *NOT* for production purposes. ; For several security-oriented considerations that should be taken ; before going online with your site, please consult php.ini-recommended ; and http://php.net/manual/en/security.php.
If your application does not catch the exception thrown from the PDO constructor, the default action taken by the zend engine is to terminate the script and display a back trace. This back trace will likely reveal the full database connection details, including the username and password. It is your responsibility to catch this exception, either explicitly (via a catch statement) or implicitly via set_exception_handler().

Damn Java Gotchas

1. Commons BeanUtils.populate() is a very useful function but it

  • does not set public properties
  • will not call a setter method if its return type is not void

So if your class has a nicely chained setter method like


MyClass setFoo(String s) { this.foo = s; return this; }

this test will fail


map.put( "foo", "bar" );
BeanUtils.populate( myobj, map );
assertNotNull( myobj.getFoo() ); // fail!!

BeanUtils says “mmm, let’s see… I have a key ‘foo’ in my map and a ‘setFoo’ method on the class…mmm…let’s see…not sure what to do, so er, I won’t do anything at all”.

Hibernate Performance

Instead of

SELECT a ...

use

SELECT DISTINCT a INNER JOIN FETCH a.rels ...

Then each iteration over as will not hit the DB.
for( A a : as ) { for( Rel r : a.getRels() ) { ... } }

Speed up java xml validation with DTD cacheing

To cache DTDs on the filesystem

  • download them into a specific shared directory /mydtddir and create an Oasis standard catalog.xml file in the directory which refers to each of your saved DTDs.
  • tell your SAXBuilder to use the Apache commons XML resolver.
  • Specify where the resolver should look for the catalog.xml file using either props file on the classpath called CatalogManager.properties or by setting the system property xml.catalog.files.

Hey presto, all your java apps which use the commons resolver will then use cached DTDs for massive performance gains.

The next obvious question is how/where to obtain and store the DTDs on your system. In theory under Debian you can just install a package of the DTDs you need such as for xhtml

apt-get install w3c-dtd-xhtml

and then just point your resolver at /etc/xml/catalog.

Unfortunately (as usual) the OS package is overly complicated. It relies on a large set of files distributed all over the place which delegate to each other. Apart from making resolution slower due to multiple parses/file opens. at the time of writing, the files simply seem to be broken and contain unresolvable oasis extension DTDs.

e.g. the default catalog for xhtml is /usr/share/xml/xhtml/schema/dtd/1.0/catalog.xml.
This has a doctype which refers to an unresolvable GlobalTransCorp DTD. This fails to resolve:

$ xmlcatalog /etc/xml/catalog
"-//GlobalTransCorp//DTD XML Catalogs V1.0-Based Extension V1.0//EN"
No entry for PUBLIC
-//GlobalTransCorp//DTD XML Catalogs V1.0-Based Extension V1.0//EN

Therefore I usually choose to manage my own dtds under /etc/dtds.

memcached

Spent a while looking at memcached as its simplicity and cross platform nature is very appealing. It’s something you just make a socket to and dump some data into, and it just keeps it in memory. Problem is that memcached itself doesn’t do much other than that – put stuff in memory and define a protocol.

Some of the work you’d expect of a cacheing solution is left to the client implementation. Talking of which I tried out the perl and the java versions. The java client is bigger than you might think as it contains non-blocking code with quite complicated thread queues. Not sure I’d want that bloat in my application, but sensibly the default getInstance returns a blocking instance by default. The java client is not the worst code I’ve ever seen but it’s not exactly great either. There’s quite a lot of repetition, especially the code which sends each call type over the network. No big deal, I or anyone else could easily tidy it up and optimize it.

From previous experience I expect the overhead of all the non-blocking client code is actually not worth it, and that was confirmed by a (very simple) test class I wrote.

The perl test application I wrote performed better than the Java one in general. I expect this is due to some bloat in the Java client. In particular the Java client performed very poorly for large cache values, whilst the perl client scaled O(1).

For PHP, perl and so on, memcached could be useful just running on the same box as a semi persistent store for cross request data sharing, but you need to keep a very close eye on the client code, and probably write your own wrapper to implement all the other cache stuff you need in practice.

For Java, there are great cacheing solutions that run in-process of course, like ehcache, which I have had good results with (although I haven’t benchmarked them much yet), and several Java distributed cache solutions.

Vertical Centering in XHTML

Use XHTML doctype and this technique to center vertically.

Playlist generation JSP

Simple JSP to generate asx, pls and m3u content.

Video/Audio Info (Standards and Formats)

Format, standards etc for mp4, H264, etc

Windows video compression tools

Windows MP4 video processing software

Postgres Quick Reference

Postgres help sheet

Character Encoding in Forms & Java

Some fearsome reading on form textareas.

Escaping & - when to use &amp?

Demo of when plain ampersands in URLs go wrong

Dont trust PHP? Use wget to mirror to static HTML

Use a mirror tool to create HTML files from a dynamic system.

Analog script from mailing list

A script found on pipermail

Tomcat JSP Strangeness

More web.xml wierdness.

Ongoing Drupal Issues

I record Drupal experiences in this post which changes over time.

Choosing a free CMS

Looking at some open source CMS software. Textpattern, Drupal, Bricolage, Mambo, Xoops, ezPublish…

hibernate, cglib etc

Found that final modifier stopped CGLIB proxies working.
Also, critical difference between s.load( Class, id ) and s.load( obj, id ) is that the former is the only one which will not hit the DB for a lazy object.

0