Monday, June 16, 2008 1:45:17 AM (GMT Standard Time, UTC+00:00)
There we go, all that effort has paid off.
Time taken to generate a single 'one day' page of even the more complicated Scrobbles info is down to under a second, and down to about fifteen seconds for an 'all time' page view. (I cache these with quite a high longevity - think of the single default view that last.fm gives you and how often that updates..). On the old system with the current amount of data, this was going into the 'several minutes' for even a single day page. (Yeah, XQuery not so good...)
Getting the right balance of normalisation was key to this success, and I have written a lot more code than I would have originally liked to, most of it not being used in the final solution. Some of it quite experimental and rather cool though - and perhaps the key to future attempts in optimising Scrobbles. One of my solutions worked out 'groups' of keys which could isolate the data for most requests down over 90%, to generate permanent tables for crunching data into - and this may end up being a good way to go if my current solution doesn't last the distance. With the data for the entire year, this algorithm only generated ten tables - and I was able to index the entire year's data through this process in under five minutes so it was quite scaleable.
That was a bit hard to integrate with the query system as it stood however, so I'm bypassing it and going straight to the core data store for now.
Should be grand though, the background service can happily be constantly ticking over the 'all time' pages on a low priority (Well of course I have a priority queue based system!), and generating the one-day views as they are requested - and perhaps some sort of balance over the monthly views and yearly views based on what month or year it currently is.
At say, three pages per user, and 5 views to be updated constantly, that's fifteen views taking about 120 seconds in processing time altogether. I could still update every single users page more often than last.fm does (for its users' music pages) and support 500 users on the rather underpowered server this system currently sits on. Not that I would of course, because there is little point in updating pages unless people look at them sometimes.
Lots more work to do yet on making things even faster - but with a firm database design to now stand on, I feel a bit more confident about pushing ahead with the real development of the system.