Tuesday, June 24, 2008 10:53:41 AM (GMT Standard Time, UTC+00:00)

Last week on a rather long train journey to Morecambe to see British Sea Power, I decided to write the next layer underneath the top level querying system, to cache 'collections' of individual common queries, which the results of could then also be queried in order to speed things up.

This has been planned since the original xml based querying system, but I had decided that writing it on top of that system would be like slapping a band aid on a broken neck. When something is that unworkable, trying to speed things up just by adding additional caching levels is a lost cause.

Happily, also on the train I wrote yet another version of the core querying system, which used a ridiculous level of nested inner joins to whittle down arbitrary data into a workable form. This is entirely counter intuitive, as you're told academically that joins are expensive, and nesting queries is expensive - and that by combining them you must be in for a world of pain. Imagine my surprise on running the query against SQL Server 2005's rather excellent query visualiser, that the resulting path was not only incredibly simple, but also incredibly fast due to the rapidly diminishing size of the data.

Because of the analytics code I already had written, it was very easy for the initial inner statement to cut down the size of the data by a huge amount in one simple query. Because quite a lot of the core queries tend to revolve around single keys, and single values (for example, "get me all the keys/values where KEY=WoWActionName and VALUE=MINING). Caching the results of this query, and even subsequent queries can result in replacing that part of the overall nested query with ("get me all the keys/values where the query = <>").

Running some trials, with the twenty snippets I have written for testing purposes, resulted in just 10 actual unique cached queries. It is important to note that in the original system I was filtering by date/time in the inner-most query - resulting in a unique query for every single page, and in this version I moved the date/time filter to the outermost query. This meant the results of this query could be re-used across all pages generated for that user.

This cut down the time taken to generate a typical 'all-time' page from 50-60 seconds, to 3-5 seconds. A rather massive increase of performance. (The fifteen seconds listed below wasn't when doing it concurrently)

These cached queries could then be refreshed or culled periodically using the background windows service I have written for Scrobbles, and most importantly rather than these resource intensive queries happening on different threads and really killing the server from over-zealous usage of RAM and therefore over-zealous paging and hard disk thashing, I can keep them on a single pipeline until I have a server that can deal with parallelising the whole shebang.

I foresee this not being enough when I have a huge amount of data in there for each user, the amount of data generated for an eight hour session of World of Warcraft is immense, and I only probably have sixty hours of data in there myself. I'll be able to add date and time to those cached queries later, and combine the daily results across a month, and the monthly results across a year and the yearly results across 'all time'. So I'm not worried about that.

I ran some tests on my laptop on the train, and was able to generate 2000 pages (all the pages possible for my single user over a year) in under ten minutes. This was using a threadpool to generate pages concurrently, as that is how it will happen once on the internet. At any one point during this, thirty pages were being generated simultaneously and taking seconds to complete. SQL Server's ram usage didn't even get above 300mb so I'm fairly confident this will work for my first few users. (Going forward!).

I appreciate the above is probably quite hard to follow, so here is a picture of the whole process - showing how each process has been decoupled to allow parallelisation as required in the future.

In other news, life is really busy at the moment and Scrobbles is having a week or so of hiatus. This weekend saw me at the Natural History Museum watching British Sea Power (again, yes I know), culminating in me lying on the stage drunkely trying to play the cornet which I had 'borrowed' from the band, tonight sees me at Victoria Park for Radiohead, as does tomorrow, Friday sees me at Koko seeing a band I saw a few weeks ago called "Die Die Die" (Awful name), but they were quite cool, and Saturday sees me up in Manchester seeing My Bloody Valentine, the legends that they are.

Yes, I have purchased and am using ear plugs...

Tuesday, June 24, 2008 11:42:43 AM (GMT Standard Time, UTC+00:00)
Loving your blogs about Scrobbles development. Question: You moved onto using Stored Procedures yet?
Wednesday, June 25, 2008 12:44:04 AM (GMT Standard Time, UTC+00:00)
Nah, I haven't seen a real need for them.

I am however setting proper constraints in my DB you'll be pleased to hear. No crap data in my db even by accident!
Thursday, June 26, 2008 10:07:33 PM (GMT Standard Time, UTC+00:00)
You should see a performance boost using stored procedures. SQL Server will store the execution plan so it doesn't need to be created each time, which using dynamic SQL like you are it is needing to. Also you arn't needing to pass the SQL from application to SQL Server each time.

Its also far easier to maintain and write complex stored procedures than it is with SQL embedded within your VB/C#. Use the database to its advantage. Securer against SQL injection also, as you can't accidentally just concatenate SQL with user entered values, they have to be passed as parameters.

Tho I've never looked into it but LINQ to SQL basically does the opposite of what I have said above (with the exception of SQL injection)...Nevermind you get to use some cool funky syntax to do that, and its far easier than writing SQL to do the same thing, so it's ok ;)
All comments require the approval of the site owner before being displayed.
Name
E-mail
Home page

Comment (HTML not allowed)  

Enter the code shown (prevents robots):