Wednesday, August 13, 2008 4:43:43 PM (GMT Standard Time, UTC+00:00)
After getting a few silly, random things exorcised from my system, I sat down last night and set to work on the Scrobbles client and libraries, as these have been neglected while I have been working on the core server-side stuff.
Last time I left them, I was dealing with some problems to do with the situation where the user upgrades the Scrobbles client, and that client uses a newer database schema for the local cache, or the web services change so data can no longer be submitted the old way. This is quite a rare occurrence, but it just so happens that in my latest overhaul a couple of months ago I completely changed the web services and now have 500mb of unuploaded data sat on my hard drive needing migrating one way or another. And if I'm going to do it nice, I may as well write a system that can cope if I have to change things again in the future rather than just doing a one-time migration on my computer alone.
I had a few options to choose from, that I could think of.
- When a new version of Scrobbles is installed, do an in-place migration from the old cache to the new cache
- Keep the old web services intact and add newer ones seperately, with migration happening server-side per submission
- Write an adapter for each new schema, mapping old data into new data before passing through any new code
Each of these had its own pros and cons, chiefly to do with the resources that each method would require from either the client computer or the server, but also to do with the maintainability and reliability of each method.
- Doing an in-place migration would require that it be capable of migration from any previous version to the modern version, and potentially have to migrate across several hundred thousand rows - this is hardly a background operation and would be prone to problems if migration was cancelled by the user.
- Not breaking older clients wouldn't give users an incentive to upgrade, and the server would have to start having to do quite a bit of work to translate older requests into newer ones, and having resolved to make these web service calls as thin as possible this would go against that.
- Adapting the data client-side moves the burden of translation from the server to the client, and while translation from any previous version would still be required, not being done in bulk would mean this could be a transparent process.
All the above would require that the client would have to be capable of dealing with there being multiple cache files present, and be able to find out the version of each cache file. Initially this was going to be achieved through naming the cache file by its version, but I've never been one for naming conventions having been completely previously disgusted by the heavy reliance of them in Lionhead's The Movies. I instead added a Metadata table to the cache database and set a version in that. This means this can be checked for with a simple query on opening the cache and the relevant actions chosen.
I decided in the end to go with the final option, of creating adapters around existing code, mapping various methods and classes through a common interface. It involves a bit of work anytime I have to change the data structure between storage and uploading, but it means not having to modify existing code that already works when upgrading. It also means that each client can deal with software that uses older versions of the client library to create old cache databases.
It doesn't strike me as the best solution because it doesn't feel as elegant as I generally like things to be, but it shall suffice as I don't expect to be changing things too often anyway!!
In other news, I have resigned from my post at the University of Reading, and have taken up employment elsewhere. This saddens me slightly, but the new company does look like it's going to provide some interesting times. Because it's a real job with a scary looking contract, I'll refrain from mentioning who they are until I know what their blogging policy is. I don't want to get in trouble by suddenly becoming googleable to those concerned.
Anyway, onwards to a great deal more money, and to a more structured day - it should be interesting (at least, until Scrobbles makes me a millionaire..
)
Wednesday, July 02, 2008 12:00:57 AM (GMT Standard Time, UTC+00:00)
Spent most of the evening designing a flag for the upcoming Kendal's Calling and Tan Hill festivals, with my partner in crime Jo, who last worked with me to create the artwork for a present when visiting a band we like called The Witch and the Robot. The flag was a lot harder to come up with an idea for, having a canvas which is so many square feet in diameter. We settled with something we liked in the end though, so we'll have fun sewing and painting that later this week, stand by for photos!
Then the rest of the evening wasted in Diablo II with Owen, but that's all fun and nostalgic so why worry about that eh?
Because of the above, I've barely gotten any Scrobbles done tonight, but I did just sit down and polish off the embedded data from external websites, a very simple fix with a fresh mind and it was done.
So, this is the product of that.
This is a snippet, with nothing in it but an application calling into an external service.
Click here for xml in a new window
Walking through that from top to bottom:
- We have some inputs. These are entered by the user on adding the snippet to their profile. This allows the user to add the snippet multiple times and configure it differently each time.
- We then have the actual application itself, inside a Data section, which simply calls a third party web page and passes it the Character and the Realm originally entered by the user.
- In the Layout section, we simply say "I want this Application's data here". This way, several applications can be utilised within the same snippet, along with all the ordinary stuff you might find in a snippet.
Behind the aspx page being called, the following code can be found:
Click here for code in a new window
All I'm doing here is
- Ensuring the page returns plain old XML
- Creating a context from the sdk which takes care of authentication for us
- Pulling the variables out and pushing them into a query which
- Outputting the results of along with a bunch of text as html. (Well, a limited subset of html, of course).
This results in the following being displayed in the user's page. It's a really trivial example so the amount of overhead required for two lines of text doesn't seem worth it, but I guess that's always the way with trivial examples.

With this, the possibilities for representing data are endless, I can think of hundreds of ways I'd like to use this system so hopefully so will my users. If I get any that is.
It's kinda cool to get this working, as this was one of my original visions when I started last year.
Tomorrow I'll be going all out on Scrobbles again, I think I'm going to work through the documentation and get that completely sorted out, with perhaps a few more API tweaks to allow for direct access to pre-generated queries from external websites.
Friday, June 27, 2008 1:21:08 AM (GMT Standard Time, UTC+00:00)
Whew.
Spent this evening firstly trying to get PHP to play nicely with my nice Windows 2003 Server, and then intalling PHPBB on top of that. Once that was done, I wanted to tie together PHPBB with my current authentication system. And that is where the fun really started.
There isn't a lot out there on writing authentication modules for PHPBB, most people seem to write plug-ins for other web software to authenticate against PHPBB rather than the other way around. I decided therefore that the best way would be to take the LDAP plug-in provided with PHPBB and rip the relevant bits out, replacing them with a SOAP web service call to my existing authentication system.
That's right, I'm authenticating across a web service instead of going directly into the database. I couldn't be faffed playing about with the hashing method I use in the ASP.NET application for salting the passwords and storing them in the database, so I decided to use my existing .NET code to do the legwork for me.
It wasn't actually that hard in the end, although I came across a few problems that I should probably document in case I ever have to try this again.
- Creating a SoapClient around my ASP.NET WSDL endpoint - remember to add ?wsdl to the end of the url for the web service, so that the SoapClient actually gets the wsdl instead of the html placeholder...
- Calling into the Webservice method.
- $authToken = $Client->Login($username, $password); did not work
- $authToken = $Client->Login( array( 'username' => $username, 'password' => $password, ) ); did work. I don't know why this is the case, as the docs didn't demonstrate this way of calling the method.
- My web service method returns a 'string', but in PHP this is represented as a standard StdClass, meant to deal with potentially complex return types from the web service. The actual string, can be found in $authToken->LoginResult - go figure...
- In PHP, the md5 method returns a lower-case hexadecimal string. In .NET, most examples tend to use the format string "X2", which creates an upper-case hexadecimal string. Wrapping up the password hash with strotupper before passing it to .NET solved this.
The actual process of writing the authentication plug-in couldn't be simpler using the LDAP plug-in as a base. Simply take the username and password, and attempt to authenticate against the web service. If this fails, then try going through the database directly. If it succeeds and the user doesn't exist in PHPBB, then tell PHPBB to create the user. Store the password in a PHPBB hash and retrieve the user's details from the web service. If it exists, then just make sure the password is up to date and carry on.
This way, if the web service goes down due to Scrobbles being updated or whatever, my users can still log in and complain on the forums. Happy days.
Monday, June 16, 2008 1:45:17 AM (GMT Standard Time, UTC+00:00)
There we go, all that effort has paid off.
Time taken to generate a single 'one day' page of even the more complicated Scrobbles info is down to under a second, and down to about fifteen seconds for an 'all time' page view. (I cache these with quite a high longevity - think of the single default view that last.fm gives you and how often that updates..). On the old system with the current amount of data, this was going into the 'several minutes' for even a single day page. (Yeah, XQuery not so good...)
Getting the right balance of normalisation was key to this success, and I have written a lot more code than I would have originally liked to, most of it not being used in the final solution. Some of it quite experimental and rather cool though - and perhaps the key to future attempts in optimising Scrobbles. One of my solutions worked out 'groups' of keys which could isolate the data for most requests down over 90%, to generate permanent tables for crunching data into - and this may end up being a good way to go if my current solution doesn't last the distance. With the data for the entire year, this algorithm only generated ten tables - and I was able to index the entire year's data through this process in under five minutes so it was quite scaleable.
That was a bit hard to integrate with the query system as it stood however, so I'm bypassing it and going straight to the core data store for now.
Should be grand though, the background service can happily be constantly ticking over the 'all time' pages on a low priority (Well of course I have a priority queue based system!), and generating the one-day views as they are requested - and perhaps some sort of balance over the monthly views and yearly views based on what month or year it currently is.
At say, three pages per user, and 5 views to be updated constantly, that's fifteen views taking about 120 seconds in processing time altogether. I could still update every single users page more often than last.fm does (for its users' music pages) and support 500 users on the rather underpowered server this system currently sits on. Not that I would of course, because there is little point in updating pages unless people look at them sometimes.
Lots more work to do yet on making things even faster - but with a firm database design to now stand on, I feel a bit more confident about pushing ahead with the real development of the system.
Saturday, June 14, 2008 1:42:23 PM (GMT Standard Time, UTC+00:00)
My checklist has taken a hit this week, with a busy social calendar (organised by me a couple of months ago in the past apparently).
Not only that but I hit some technical issues after I migrated the server-side install of the Scrobbles system across to a new version, along with the database (a process that took over six hours, there being now over a year's data in there). A migration that was supposed to make the whole system a lot more scaleable and maneagable in the future. De-coupling page requests from page generation and making a whole load of the resource intensive operations parallelisable.
The new system also removed the data-loss incurred by using discrete blocks of time, and moved the system across to an entirely continuous method of time-based data storage. And most importantly was meant to make it a lot easier to form complicated queries by storing a lot of this data in the original XML format, for querying using the XQuery support built into SQL Server 2005. I thought that there was no way I could do anything better than this with my limited knowledge, and that SQL Server's magic black box would just index this data and keep things fast and nifty for me.
In my trials, this seemed valid, and page generation was no faster or slower than the previous iterations of the querying system. On migrating across a larger data set, SQL Server started eating ridiculous amounts of Ram and CPU, before finally giving up in a big heap. Back to the drawing board again, for probably the fourth time.
It should go without saying that I still think the XQuery support in SQL Server 2005 is fantastic, and I think that the solution was simply not compatible with my needs. It was educational to play with however, and I can think of a few projects which would benefit from having the masses of XML on the hard drive transferred into a database which can then do the hard work of actually querying it for information.
I downloaded a full copy of the working database for testing and set about trying to index this massive amount of data myself. Many conversations were had with different people with varying levels of expertise - my colleagues (Karsten and Pat) were helpful over coffee and a pad of paper, and we came up with a new design which should hopefully have been more efficient. I also had quite a few conversations with Paul Evans who let me know of the oh so many potential pitfalls before I even began work on the new prototype system. Sadly, these pitfalls seemed to pop up all too soon and my query graphs were soon looking just as complicated, if not more complicated than the original XML driven attempt.
I think I've finally come to a proper solution now, which involves creating tables on the fly to fit the needs of the system as it evolves. This is a scary solution for me, as it gives away some of the intelligence of the database design to an indirect process rather than directly from me. It also increases the complexity of the code by quite a bit - and I was hoping by just having the one-size fits all database solution to avoid that.
Sadly in the world where speed and efficiency counts more than anything else, and as I was originally warned at the start of the week, this seems to be quite a standard compromise in the world of database design.
Friday, June 06, 2008 2:19:35 PM (GMT Standard Time, UTC+00:00)
With work on Scrobbles swiftly moving ahead, I find myself thinking about the inevitable and yet seemingly unreachable release date.
"Users are going to want this, so I should add it now.."
How many times have I now said that to myself? "Users are going to want to customize their pages", "Users are going to want to submit custom data", "Users are going to want to embed content in their pages", "Users will need snippets to be configurable so they don't need to write a new one for each 'key' or 'value'"... Each time I do this, it's for a good reason - I don't want to fail as a service, and therefore I need to be the best service around.
As mentioned previously, my main competitor (in the World of Warcraft arena anyway) is probably WWS (WoW Web Stats) - who have a mature, but nowhere near as flexible system as the one I have written. The keyword there however, is "mature". I took a look earlier and the wealth of information available from it is astounding. I can of course do better, and I do aim to write snippets which emulate the statistics that it throws out.
However, I then need to think about the groups of people who will be involved in these events, and think that perhaps they will want to combine their data and compare each other during raids, I need to possibly write a system that allows snippets to link to further in depth data based on a keyword in that snippet, I need to to write a system that allows users to create 'views' of their data between a user-defined period of time, with inputs coming through the existing snippet data. It needs to be really easy to create these views, possibly from templates so that data about a raid can be retrieved within a set period of time.
What about those casual users who are wanting statistics not about raids, but about their day to day activities? There is still a wealth of data that I am still not collecting, and I'm going to have to create a character of each class and profession in order to find out about them. It is absolutely terrifying how much stuff that I might "miss out" in the initial release of the software.
And there is the clincher, if I get it wrong, there is the chance that a future version of the WoW stuff might make previous data invalid. I can't be having that, so it has to be perfect, or at least - forward compatible to begin with - do I need a system for this??
What about those users for whom stats don't mean too much, I need to write that 'third party' website, WoWScrolls.com, so they can see the potential of throwing all their data at Scrobbles. (Public services are *awesome*). How do I achieve that? My colleagues at work have suggested that I use something like AIML to generate the blog posts and I can see their point, but it still leaves me with the daunting task of actually populating the database with "witty" phrases about each location, each task, each type of character and etc - nevermind creating the actual profiles from the data available at the WoW Armory.
My head is full of ideas, and getting that final feature list is not easy - because the moment I allow people to use Scrobbles they're going to start having even more ideas than I can deal with, and being the sole developer it's going to be very hard to keep up with the demand for features - nevermind technical support, complaints and all the normal day to day problems that come with running a website.
There is also that niggling issue, that releasing a service like this feels a bit like throwing down your cards at the end of a poker round, there is always the risk that your opponents might have a full house - and then what do you do?
Where do I call it quits? I could do with Scrobbles being out before the summer holidays so I could just prioritise my list of ideas and just work on them as much as possible before then, throwing it out in whatever form it has at that time. (Limiting the total users so I have time to assess server load and start thinking about monetizing the operation so I can spend more time on it - World of Warcraft is not the be all and end all of this system after all!).
I start to understand why games and software in general can often take such a long time to get out the door, there is always that one little thing that you just know the software will not be complete without. At some point, the users need to start leading the development strategy, and if their ideas conflict with mine - what on earth do I do then?
Sunday, December 30, 2007 12:12:40 PM (GMT Standard Time, UTC+00:00)
Picture the scene: It's the 23rd of December and I'm sat quietly with a bottle of wine, the remains of the chilli that my housemate and I had shared earlier and I was armed with Visual Studio. I was in a good coding mood.
I'd done quite a lot of code and I was interested in seeing how much work that actually was, so I headed over to Robstats to take a look at the past few days of carnage to satisfy my curiousity, because I could. Except I couldn't because for some reason the server was being very slow and the data hadn't been updated for some reason.
I remoted in, expecting that perhaps the doomsday had finally come and Robstats had finally exceeded its capability of data crunching after 6 months of data. I was correct, but I didn't realise to the extent that this was going to kill muy web server. Half a million xml files storing 6 months data, things were bad. I knew when I was writing it that it was going to be bad, I just expected to have its replacement completed a long time ago.
The generation utility was eating memory (200mb on a 256mb server) and it was only halfway through its cycle. The server was groaning under the sheer weight of data and I had to stop it before I lost communication with the server altogether. Too late! I was disconnected and thus began the next few days of woe.
The crash caused a complete memory dump, there wasn't enough resources to create that memory dump and the server completely locked up. I popped off a support request to SynergyWorks and they gave me a hard-reboot. Cue file corruption. Cue explorer corruption. Cue lots of corruption. Over the next few days, things kept crashing and locking up and I had to ask SynergyWorks to reboot several times - just making the problem worse.
I decided to complete Lolstats and Scrobbles, the proper version of all this. Lolstats: A multi-user extensible and very generic stats crunching backend with a fairly flexible xml based data-transformation and presentation layer; and Scrobbles: A public website and frontend for managing your lolstats.
The only thing preventing me from completing it was that I was focusing on the multi-user and ease of use side of things. It was taking up too much time without enough personal reward or gain. I suddenly had a reason to do it for myself and I got cracking on the bits that would be important to just display the data that Robstats had collected.
I got it working and by the 29th of December the server was completely dead. I got in touch with SynergyWorks and they decided that it would be best to give me a clean server, double the ram on the machine and mount the old data as a disk so I could recover what I could. Not bad support for a Saturday afternoon eh? </Shameless plug>
I went through the checklist and backed up the websites and data, and began an XCOPY of the half million odd xml files that constituted the 'backend' of Robstats. I had a migration utility and I was prepared to use it. Half a million xml files into half a million database tables, ready to be crunched into less than 20,000 records!
Some files were corrupt, some files had missing data because the format of the Robstats backend had changed over the months that the service had been up. I kept tweaking the migration utility until it could finally process all half million files and chuck them into the Scrobbles backend "processing queue". Actually converting the data and adding it to the database only took about 20 minutes, so it wasn't that painful.
As I write this blog post, the Lolstats backend is crunching the data into a more managable form. I've written the windows background tasks and Visual Studio 2005/2008 addin to capture the data and send it to Scrobbles instead of Robstats. The backend is processing about 1000 records every minute and a half, which means with a few improvements the current server should be able to deal with a few hundred users. It's going to take 10 hours to crunch the six months data into those hourly chunks and the pages on the site are now getting generated dynamically per-request (Yes, there is some complex caching going on too) through the Lolstats data presentation system.
I'll be documenting the web services and documenting about the xml format needed to pull the arbitrary data out and display it sometime over the next month. That should appear on the Lolstats domain once I've set that up.
I first need to write a decent website around the backend and provide a front-end for editing the pages being displayed. There is a lot of potential for third-party development and a lot of potential for a lot of data capture systems (games and various). I'd also quite like to use the Lolstats backend to capture data about the Scrobbles website and services. "Because I can".
I also need to purchase some decent graphing software. The freeware I'm currently using to generate graphs is nice and everything, but it's not really suitable for moving forward. If you want to be a part of the beta program, e-mail me on robashton@codeofrob.com and I'll consider you. I'd prefer to only have people who I know won't need help getting things set up. I haven't got some for support just yet.
Friday, January 19, 2007 11:16:55 AM (GMT Standard Time, UTC+00:00)
While I may have misforgivings about how dasBlog does certain things, I will hand it to the dasBlog team that the blogging engine is not only very powerful but appears to have a very nice API.
It is with this in mind that I am now writing a presentation layer as part of the main application that drives the rest of this site so I can maintain my standards compliancy(it's become a bit of an obsession for some reason) and keep my constistent theme.
Once I have done this, I may consider refactoring it for public consumption for those wishing a more themeable dasBlog. Because I'm making no modifications to dasBlog itself it can be considered as a plug-in on steroids :)
Thursday, January 18, 2007 1:04:12 PM (GMT Standard Time, UTC+00:00)
Some of you have noticed that the blog history on the right hand side is weirdly formatted in contrast with the neat left-align of all other content menus.
This is not something I'm currently able to do anything about without writing my own macro for generating the history menu.
The reason for this deficiency is that the dasBlog macro which generates the table, does just that. It generates a table, with inline style!
I thought the whole point of theming was that a theme provided the style, but obviously this is not the case. I have a great respect for Scott Hanselman, having been to one of his talks at TechED, but whoever wrote the code for generating these menus - shame on you.
Of course dasBlog also doesn't comply with XHTML standards, in either content body (hard to do) or even in those macros themselves, so I already know it's not perfect - you may have also noticed I don't have the standards menu to the left hand side of this page either.
I just feel they should have approached the design of this slightly differently. I know that the majority of people out there just use the default themes, but I'm not like that. I like consistency, it makes me happy and I've taken great care in making sure the theme for the blog matches that on the rest of the site.
Foiled by somebody elses mistake. It's always the way :)
Sunday, January 14, 2007 8:00:00 AM (GMT Standard Time, UTC+00:00)
So, I've moved house - to the somewhat more sensible domain of CodeOfRob.com, and from now on I'll be keeping my sensible entries here, along with my sensible projects and my sensible pages.
For those of you who are easily worried - rest assured I have not grown up and will shortly be doing something childish and amusing with the old domain of IReallyDontCare.com.
My next post will be on the inadequacies of Internet Explorer, and also the inadequacies of Firefox. If I was bothered about Opera I'm sure I'd find something to say about that browser - but I'm not. It is not a major player and I don't care whether it works with this site or not :).