RavenDB – Basic usage considerations

Note: The interfaces have been updated since this entry was written, and there is now Linq query support built into the .NET client, I’ve updated these posts to use the LuceneQuery syntax but that’s probably not the preferred way of doing things

There will be plenty more of these to talk about as I carry on developing this application against RavenDB, but there are a few immediate concepts that I thought would be worth writing about to do with the basic manner in which you interact with RavenDB.

DocumentSession vs DocumentStore

This is the most basic consideration:

  • When do you create a DocumentStore
  • When do you create a DocumentSession

The simple answer, is you create a DocumentStore on application start-up, and you create a document session for every unit of work following that.

In an MS MVC web application, this would be

  • Create a DocumentStore in Application_Start
  • Create a DocumentSession on BeginRequest
  • Destroy the DocumentSession on EndRequest

Creating Indices

Because every example written as a tutorial of how to use RavenDB will no doubt include index creation as a part of it, the temptation will be there to get into the habit of invoking the code to create indexes every time your application is run (Or simply forget that you started off this way and leave the code in there).

   1:  documentStore.DatabaseCommands.PutIndex(
   2:      "BookByTitle",
   3:      new IndexDefinition<Book, Book>()
   4:      {
   5:          Map = docs => from doc in docs
   6:                        where doc.Title != null
   7:                        select new
   8:                        {
   9:                            Title = doc.Title
  10:                        },
  11:          Stores = { { x => x.Title, FieldStorage.Yes } }
  12:      });

As data is added to the system or modified, RavenDB will (in its own time) run that dirty data across those indexes, and the application will use those indexes to pull the data out for display and manipulation purposes.

If an index is re-created, all of that indexed data becomes obsolete, and thus RavenDB must re-run *all* of the data in the system against that index. If your application is re-creating indexes or simply creating indexes on the fly as a regular action then performance will suffer.

The best practise is to treat these indices as a management function, something that is done once when the document database is first created – and then updated as part of maintenance/upgrades – like database changes in a  traditional system (only somewhat easier!).

I have a simple script to create all the indexes in a blank, freshly created RavenDB instance so while I’m developing against the application I can start from scratch again anytime. The important thing of note is that I don’t run this every time I start the application up – just when I’ve made changes to those indexes.

I might talk about this in a future blog post as I’ve ended up with a nice structure that involves disposing of the magic strings that form the names of the indexes in RavenDB and that can’t be a bad thing.

Saving new objects

This actually goes for most operations such as deletion, updates to objects etc – but saving objects is probably  more complete proposal from this collection. None of this is too dissimilar to the considerations we’d apply when working against a traditional RDBMS and an ORM, but it’s worth re-iterating for those who are unfamiliar with the concepts.

Consider a simple repository for entities in our system whose interface looks something like this.

   1:      public interface IBookRepository
   2:      {
   3:          Book Get(string id);
   4:          void Save(Book book);
   5:      }

A sample implementation of this repository might look like this:

   1:      public class BookRepository : IBookRepository
   2:      {
   3:          private IDocumentSession mDocumentSession;
   4:   
   5:          public BookRepository(IDocumentSession documentSession)
   6:          {
   7:              mDocumentSession = documentSession;
   8:          }
   9:          public Book Get(string id)
  10:          {
  11:              return mDocumentSession.Load<Book>(id);
  12:          }
  13:   
  14:          public void Save(Book book)
  15:          {
  16:              mDocumentSession.Store(book);
  17:          }
  18:      }

Ignoring the rest of the repository, there are decisions to be made at this point about what the Save method should actually do.

Consider a basic use of the repository like so:

   1:          public void PublishBook(Book book)
   2:          {
   3:              mRepository.Save(book);
   4:              mEventInvoker.RaiseEvent(new BookPublishedEvent(book.Id));
   5:          }
 

Ignoring the obvious (like this publish method isn’t actually publishing a book!), our problem here is that the created book does not yet have an Id because we haven’t called SaveChanges yet, and yet we’re attempting to use this Id as the argument for another action in our application.

The proposed fix? Change the repository so we call SaveChanges of course!

   1:  public void Save(Book book)
   2:  {
   3:      mDocumentSession.Store(book);
   4:      mDocumentSession.SaveChanges();
   5:  }

That appears to have fixed the problem, but in actual fact if we were using IDocumentSession to control our unit of work, calling SaveChanges just broke that because all the changes (including others made across the rest of the system) were just flushed across to the server.

We can fix that by wrapping our whole unit of work inside of a  TransactionScope (which RavenDB respects),  but we’ve still got one problem we need to be aware of:

   1:  foreach (Book book in booksToCreate)
   2:  {
   3:      mRepository.Save(book);
   4:  }

Now we’re saving a collection of books, let’s say there are 100 of them – that’s 100 calls to SaveChanges, which is 100 calls across the wire, and 100 calls to ‘whatever RavenDB does when you push an object to RavenDB’ (It’s expensive okay?).

That’s not to say you don’t do use this hammer to solve the problem, but you should think about it and do what makes sense in your application.

  • You could still add more interfaces/methods specifically for batch operations, and still call SaveChanges at that level
  • You could use your own client-side key generation code (RavenDB allows this) – and perhaps adopt something like HiLo against the Type of the document – thus negating the need to call SaveChanges at all until everything has been done that needs doing

I’m probably going to experiment with the second option and write a blog entry once I’ve worked out what is I want to achieve.

Update: I have since written a HiLo generator, and Oren has integrated this so HiLo is the default generator for RavenDB, this means a call to SaveChanges is not needed in order to get the id for an item so this bullet point is now almost irrelevant unless you override this behaviour to use keys generated by the server

Stale Data

Let’s say we have a top level page on our website which displays the top 20 books by popularity in a certain category. The following query is executed

   1:  Book[] categoryBooks = documentSession.LuceneQuery<Book>("BookByCategory")
   2:                          .WaitForNonStaleResults()
   3:                          .Where(String.Format("Category:{0}", category))
   4:                          .Take(20)
   5:                          .OrderBy("Popularity").ToArray();

The temptation is there to always use that call WaitforNonStaleResults because most demo code will do this as a matter of course (because invoking this will deterministically say “give me back the results I expect for this demo”).

The problem is, WaitForNonStaleResults will do exactly what it says, it will wait until the results coming back are no longer stale – which means your page request will hang, which means you won’t have a responsive application – and the whole point of using a database like RavenDB is that you want the application to be responsive!

There is a good reason that WaitForNonStaleResults is not the default – consider when you start writing it what it is you actually want. In this example, it really doesn’t matter if the data being displayed on this high traffic top level page is a bit out of date, and the call simply is not needed.

Paging

Let’s say there are 100,000 books in the document store and we invoke the following code:

   1:  Book[] books = documentSession.LuceneQuery<Book>()
   2:                           .ToArray();

How many books do you expect for there to be in that collection? 100,000? If 100,000 objects were returned into that collection, how long would it take? What would you be doing to those 100,000 objects? How much memory would they require to hold in memory all together like that? Yeah, it’s unlikely that you’d ever write the above code in your production application, because bringing back all the objects is rarely what the developer actually intends.

Thankfully RavenDB safeguards against this kind of sloppy code and automatically limits the number of results returned back. Both the .NET client and server have this behaviour built into them and this means you’ll only get (at the moment), 128 objects coming back for the above query. This is equally true for all types of queries, including queries against indices with where clauses and orderings and everything else you might want to put in a query.

Currently the server itself will only let you page 1024 objects at one time, so you can’t be lazy and make a call to Take(100000) because it won’t let you. I’ve actually got an extension method which *does* bring back *all* the objects for testing purposes, but I’ll leave that one out of this blog entry for fear of people actually using it!

Just be aware that paging is there to help you and don’t be surprised when you don’t get all the documents back when doing a blanket query. Use paging properly!



   


Print | posted on Wednesday, May 12, 2010 2:05 PM

Feedback

# RavenDB – Basic usage considerations

Left by DotNetShoutout at 5/12/2010 2:15 PM
Gravatar Thank you for submitting this cool story - Trackback from DotNetShoutout

# Storing PDF

Left by Jon at 5/12/2010 4:29 PM
Gravatar Hi
How would you store PDF, Words ... and is intended for that ?
thanks
John

# re: RavenDB – Basic usage considerations

Left by robashton at 5/12/2010 4:34 PM
Gravatar You can store such things as attachments (I'll make sure to mention these in a future blog entry).

It entirely depends what your usage is, if you're wanting to search and index those documents, then you'll have to write code to get their contents in some sort of text format and save them as a document rather than an attachment.

If you're just wanting to store them in there as files that can be retrieved, then they are effectively binary blobs.

The job of RavenDB isn't to break down and analyse those things for you.

I hope that makes sense

# re: RavenDB – Basic usage considerations

Left by David Kemp at 5/13/2010 10:54 AM
Gravatar @Jon

Don't be fooled by the name: a Document Database stores objects as documents; it's not (necessarily) for storing files in.

# re: RavenDB – Basic usage considerations

Left by josh at 5/13/2010 4:44 PM
Gravatar You just got another subscriber! Keep posting on ravendb. I helps me clarify my own thoughts as I build a sample app on it.

# 

Left by Web Dev .NET at 5/13/2010 7:51 PM
Gravatar Tech Tweets for 13-May-2010

# re: RavenDB – Basic usage considerations

Left by Robert The Grey at 5/13/2010 8:35 PM
Gravatar Seriously Rob, this post is the Dogs' Danglies!

I love reading posts that flow straight from the page and into my cognitive process. Your explanations were so clear and concise that I had no choice but to nod my head all the way through.

I loved the approach of using "typical demo" code and pointing out why that may not be ideal in real world scenarios.

Nuggets like these are hard to come by - keep up the series, I look forward to it.

All the best
RobertTheGrey

P.S. I'll be at the RavenDB launch next Tues night if you're around then - maybe grab a pint and chew the fat.

# re: RavenDB – Basic usage considerations

Left by robashton at 5/13/2010 8:44 PM
Gravatar Unfortunately I'm running a training evening for our developers at work that night - but I'm hoping we'll finish early enough for me to get a train into london and meet everybody at the pub after the event.

A pint and chat will definitely be in order (Although I really want to get hold of Oren's ear at some point in the evening for some serious in depth discussion!)

# re: RavenDB – Basic usage considerations

Left by robashton at 5/13/2010 8:44 PM
Gravatar And thanks! :D :)

# re: RavenDB – Basic usage considerations

Left by Jon at 5/14/2010 7:57 AM
Gravatar i have been fooled... I truly thought that DocumentDB were built to store DOCS.

It looks like such DB exists only for performance where RDBMS are overlkilling performance. Is that true ?

# re: RavenDB – Basic usage considerations

Left by robashton at 5/14/2010 8:48 AM
Gravatar There is no short answer to that question, because it's a "yes/no" answer.

I'd advise you go read other peoples opinions on what the NoSql movement is about, why we're better off now it's taking off, and why RDBMS vs DocDb usage has to be to thought about carefully when starting a new project :)

# re: RavenDB – Basic usage considerations

Left by jdn at 5/17/2010 12:58 PM
Gravatar "Thankfully RavenDB safeguards against this kind of sloppy code and automatically limits the number of results returned back."

-1 on this. Code should do what it says it does. If people write dumb code, then that's something they should work at.

This is, IMHO, equivalent to 'security by obscurity.'

# re: RavenDB – Basic usage considerations

Left by robashton at 5/17/2010 1:11 PM
Gravatar Funny, that's the argument I just made against limiting the number of calls made to the server by each individual session by throwing an exception if it detects bad behaviour :)

(That's gone in now)

I think this paging example isn't entirely by choice, and there are other pressing reasons as to why it's ended up in there.

I think Oren is dead set on preventing the developer doing dumb things against RavenDB, and while it's a choice that I might not necessarily support, it's one that I respect.

If you've got an opinion and want to start a debate, it's worth hitting the mailing list, because from my experience thus far he's quite open to it.

# re: RavenDB – Basic usage considerations

Left by jdn at 5/18/2010 3:01 PM
Gravatar Yeah, that was useful. "Submit a patch to work around my bad design."

Oh, well.

# RavenDB, and a brief design philosophy discussion with Ayende

Left by BlogCoward at 5/22/2010 1:44 AM
Gravatar RavenDB, and a brief design philosophy discussion with Ayende

Your comment:





 
Please add 4 and 1 and type the answer here:

Copyright © Rob Ashton

Design by Rob Ashton, Based On A Design By Bartosz Brzezinski