Virtual Surreality

It's too real to be true

Browsing Posts in Rant

Object/Relational Mappings (ORM’s) are in the wrong place in the architecture.

An application should have minimal impedance mismatch with the persistence of its own data. External or ancillary systems should bear the cost of mapping between paradigms. If you are writing an application with the concepts modelled in an object-oriented fashion, then the state of those objects should be stored in an object-oriented fashion. Mapping from the object-oriented structure to a relational structure should be done where and when it’s needed. If you want to access application data in a relational way for reports, do the mapping for the report.

There has been a low uptake of OODBMS’s. Why is outside the scope of this post. However, there are other, older (albeit feeling newer) data storage techniques available that more readily persist an object graph than a relational database. A flurry of open-source implementations have come along. You may have heard of MongoDB, neo4j, Redis, Hadoop, CouchDB, Memcached, and Raven DB (a native .NET document database). These are all under active development, and almost all have been proven on many large commercial projects (Raven was being developed by Oren while he was at speakerconf 2010!).

For an introductory look at evaluating the landscape, check out Sarah Mei’s RailsConf presentation slides.

One of the projects I’ve worked on at Hashrocket involved a large MongoDB database. The reasons for the choice for MongoDB were clear:

  • the way the users handled most of the data more naturally fitted an embedded document model;
  • the scale and speed required were far greater than relational databases could achieve; and
  • the need for ACID transactions was almost nil

Additionally, the data was mostly read, rather than written (although MongoDB writes blazingly fast too). I’m hoping to address the equivocations in those reasons in a subsequent post.

MongoDB uses a binary form of JSON. As JSON is a serialization format for object state, it’s a good candidate for persisting an object graph. JavaScript, remember, uses composition rather than inheritance, so there is also a tendency to “embed” document instances within other document instances. So, for MongoDB, most of the object graph is stored a tree structure. Cyclic graphs and associated objects (those that aren’t aggregated with composition) are handled with references to identifiers.

As a consequence of using Rails and MongoDB, the team looked to various ways of manipulating the JSON structures from Ruby: directly through the driver, MongoMapper, MongoDoc, and Mongoid. The rationales behind these libraries and their different and interesting approaches are the subject of additional blogs (hopefully jointly with Nunemaker, Hill, and Jordan respectively). The needs and makeup of the project we were on fell into the lap of mongoid to solve.

DBAs want to interact directly with the database. Because of this “new-fangled” approach to data storage that has been around since before SQL, some are unwilling or unable to learn the query language to interact directly with the database. So, to make their world a less scary place and let them retire without the stress of intellectual endeavour, I built a simple, declarative language where the mapping from the tree structure of MongoDB to the tuple space in an RDBMS can be scripted.

Durran Jordan (author of Mongoid) and I paired on “squealer” (and on emptying bottles) in our evenings while he was stationed in the Chicago office of Hashrocket. The name came from “SQL” (which some of us still pronounce “squeal” rather than “sequel”) and the facetious notion that it turned the data into a pig.

We decided to focus on mySQL as the target and MongoDB as the source as these are the technologies we were using. Upcoming targets will include PostgreSQL, SQL Server, Oracle. Upcoming sources will include Redis and CouchDB.

UPDATE: squealer 2.2 now supports PostgreSQL!

Matt Yoho provided multiple insights into implementing the Ruby DSL, and Bernerd Schaefer helped me reduce the DSL syntax and provide sensible defaults (as well as the righteous progress bar). Bernerd then struck on the idea of reflecting on the Mongoid declarations in the Rails domain model classes to generate an initial squealer script and SQL DDL to build the target SQL database. He then set about writing it overnight and we tweaked it over lunch. We called this tool “skewer”, because that’s what you do to something to make it squeal ;-) . It’s part of the squealer RubyGem.

Next time: how to use skewer and squealer.

Recently, Les Hatton wrote a compelling article entitled The Chimera of Software Quality. (Les Hatton, “The Chimera of Software Quality,” Computer, vol. 40, no. 8, pp. 104, 102-103, Aug., 2007)

A section of the article headed The cost of poor quality points out the economic impact of poor software quality, and the almost belligerently ignorant and/or apathetic approach to it by people and businesses in the technological nations.

He summarises this in what he postulates as Hatton’s Law:

The technological societies will collectively trash around [US]$250 per person per year on systems which will never see the light of day or, if they do, do not come close to what their users wanted, assuming they were asked in the first place. This they will ignore.

(emphasis mine.)

If you aren’t flabbergasted or at least peeved by that figure (except because you’ve already come to the sad realisation), stop reading now and go and defragment your hard disk, ignoramus.

As we say Downunder, “Goodonya, Les!”. Although he didn’t suggest where that money should go, he did indicate that it’s not just about the money, it’s about the engineering legacy we’re leaving. I have no idea if he shares my views, but other engineering disciplines have helped make the world better by avoiding the cost of poor quality. Pastoral tools, houses, clothes, toilets, aqueducts, fishing nets, boats, glass, the list goes on (and will probably not include the iPhone, people!).

Of course in software there are tactical actions taken, especially when shareholders start raising concerns over short-term cost-benefits of IT (or, more accurately, when executive bonuses are impacted by it by more than a few percentage points). But this is not enough.

Sometimes, I’m accused of being aggressive in the indignation I display when I see woeful technological systems. I suppose that’s because something inside of me has always been thinking that the money spent on building that pile of steaming crap could have been used for much more constructive purposes. In that sense, I’m not nearly aggressive enough. Perhaps I’ll stop calling it “technical debt” and start calling it “karmic debt”.

Stop writing shitty software and you could do something even better than increase shareholder value or boost your annual bonus. You could help take a stand against poverty – in your own country and around the world.

There are many corporations who divert some resources to social responsibility. Think how much more they could do if they did some of the simplest, mind-numbingly basic things to avoid spending massive sums on poor software. Those things aren’t just throwing money at a problem, but the money or more freedom from cost constraints allows corporations and their employees to do so much more.

Even corporations in industries you’d normally least expect to understand are being more socially responsible in a way that doesn’t raise the eyebrows of the cynics like me. I’ve personally witnessed the activities undertaken by employees at Westpac, a large Australian bank. I also know that avoiding the cost of poor software quality would allow them to do more. ThoughtWorks is not immune either. We use a couple of systems internally that are so fantastically crappy and impact both our effectiveness and efficiency that it beggars belief (I must point out this is relative to a benchmark one would expect of a company with our ideals, and would be otherwise more than reasonable almost anywhere else — we have the luxury of being very picky when critiquing internally). One of my clients laughed at me the other day when he found out what email client we use, because he made a lot of money (and stomach ulcers) supporting it in years past and he knew the impact that its poor quality has on its users (not that most of the alternatives are a benchmark for excellence).

ThoughtWorks is on an ambitious mission to change the nature of IT. In doing so, we hope to contribute to a better world. Avoiding the cost of poor software quality is one of the things we’re doing to help.

I still hate the handling of spaces – especially when you’re throwing in mixed path delimiters using nice GNU tools on DOS/NTFS.

If you ever want to get rid of SVN or CVS folders (or whatever) in a large source tree after someone has zipped it and sent it to you straight from their workspace, and Explorer’s Search window barfs with that many matching folders all over the place, then try this:

  1. Get unxutils
  2. Use the zsh included
  3. Realise zsh has bipolar tendancies when dealing with “DOS backslash to delimit directories in paths” and “UNIX slash to delimit directories in paths” and “UNIX backslash to quote special characters”
  4. Put unxutil’s usr/local/wbin at the front of your PATH (to get the GNU find, not the DOS one)
  5. Unfortunately -exec ls {} \; doesn’t do the trick because ls can’t handle the spaces and/or backslashes and -exec ls “{}” \; doesn’t have any affect
  6. Use the following neat trick (after 30 mins of mucking around with the bipolar tendancies)

  7. find . -name 'CVS' -print | tr '\\' '/' | while read filename
    do
    rm -r "$filename"
    done

  8. The little translate sorts out GNU find -print returning paths with backslashes and the while loop allows you to do lots of things, including removing the directory :-)

It’s slower than -exec or xargs but it works.

It’s 2007 – you’d reckon everyone would be able to handle spaces in directory names on Windows by now.

You’d reckon the command program on Windows would be able to interpret stuff too so sensible command line arguments would be sent to the program being executed:


C:\Program Files\Java\jdk1.5.0no_not_again\bin\java -cp C:\Program Files\My App\classes you.goddabe.kidding

And even it’s own inbuilts:


dir %JAVA_HOME%

Perhaps the dweeb who decided to call it progra~1 should have decided on ProgramFiles instead. Who wants to clean up batch files with double-quotes around every possible environment variable expansion point?

Javaâ„¢ isn’t really the success we think it is, you know.

Java programs written the way Gosling et al originally intended would clearly be using the object-oriented paradigm (sure, they’d be applets in a browser, but that’s another story).

What happens when the COBOL crowd (of which I was one) gets their working-storage sections and procedure divisions pulled from under their feet and replaced with interfaces and immutable objects?

What we have, ladies and gentlemen, is a fantastic volume of COBOL programs re-written using Java syntax but with few of those pesky OO semantics.

Look out for RuBOL – the next big thing, where you can MOVE SPACES TO boolean just like in the good ol’ days.