Saturday, June 14, 2008 1:42:23 PM (GMT Standard Time, UTC+00:00)

My checklist has taken a hit this week, with a busy social calendar (organised by me a couple of months ago in the past apparently).

Not only that but I hit some technical issues after I migrated the server-side install of the Scrobbles system across to a new version, along with the database (a process that took over six hours, there being now over a year's data in there).  A migration that was supposed to make the whole system a lot more scaleable and maneagable in the future. De-coupling page requests from page generation and making a whole load of the resource intensive operations parallelisable.

The new system also removed the data-loss incurred by using discrete blocks of time, and moved the system across to an entirely continuous method of time-based data storage. And most importantly was meant to make it a lot easier to form complicated queries by storing a lot of this data in the original XML format, for querying using the XQuery support built into SQL Server 2005. I thought that there was no way I could do anything better than this with my limited knowledge, and that SQL Server's magic black box would just index this data and keep things fast and nifty for me.

In my trials, this seemed valid, and page generation was no faster or slower than the previous iterations of the querying system. On migrating across a larger data set, SQL Server started eating ridiculous amounts of Ram and CPU, before finally giving up in a big heap. Back to the drawing board again, for probably the fourth time.

It should go without saying that I still think the XQuery support in SQL Server 2005 is fantastic, and I think that the solution was simply not compatible with my needs. It was educational to play with however, and I can think of a few projects which would benefit from having the masses of XML on the hard drive transferred into a database which can then do the hard work of actually querying it for information.

I downloaded a full copy of the working database for testing and set about trying to index this massive amount of data myself. Many conversations were had with different people with varying levels of expertise - my colleagues (Karsten and Pat) were helpful over coffee and a pad of paper, and we came up with a new design which should hopefully have been more efficient. I also had quite a few conversations with Paul Evans who let me know of the oh so many potential pitfalls before I even began work on the new prototype system. Sadly, these pitfalls seemed to pop up all too soon and my query graphs were soon looking just as complicated, if not more complicated than the original XML driven attempt.

I think I've finally come to a proper solution now, which involves creating tables on the fly to fit the needs of the system as it evolves. This is a scary solution for me, as it gives away some of the intelligence of the database design to an indirect process rather than directly from me. It also increases the complexity of the code by quite a bit - and I was hoping by just having the one-size fits all database solution to avoid that. 

Sadly in the world where speed and efficiency counts more than anything else, and as I was originally warned at the start of the week, this seems to be quite a standard compromise in the world of database design.