RavenDB & CouchDB – Basic Queries

Previous entries in the series

Once you have a number of documents in the database, you soon want to do more complex operations than simply retrieving a list of them.

Consider therefore the following and rather over-used example document:

   1:  {
   2:      title: "Another blog entry",
   3:      content: 'blah blah blah',
   4:      category: 'code',
   5:      author: 'robashton'
   6:  }

Our example query would be to get all of the documents from the database that were written by a particular author AND in a certain category.

Obviously querying all the blogs written by a single author, or all the blogs in a certain category would be fairly expected queries too.

Indexes in RavenDB

In order to perform any queries whatsoever in RavenDB, we first need to create an index.

   1:  from doc in docs
   2:  select new {
   3:       doc.author,
   4:       doc.category
   5:  };

This is effectively a map function written as a LINQ query which returns a single value, an object that is a map of the values to be indexed.

Get all the documents by author and category

indexes/entriesByAuthorAndCategory?query=category:tech AND author:robashton

Get all the documents by category

indexes/entriesByAuthorAndCategory?query=category:tech

Get all the documents by author

indexes/entriesByAuthorAndCategory?query=author:robashton

Those queries will return a list of whole documents which match the queries passed in.

Indexes in CouchDB

The same goes for CouchDB, only map functions in CouchDB have two outputs, and are written in JavaScript.

   1:  function(doc) {
   2:    emit([doc.category, doc.author], doc);
   3:  }

Return values are specified by calling emit, and emit can be called more than once for each document, thus multiple keys can be created for each document with a single map function. The first parameter in Emit is the “key” to be searched on, and the second parameter is the data associated with that key (in this case, the document).

Get all the documents by author and category

blogs/_view/byAuthorAndCategory?startkey=["tech","robashton"]

Get all the documents by category

blogs/_view/byAuthorAndCategory?startkey=["tech"]

Get all the documents by author

Ah. This suddenly a bit more complicated. I’ve not actually managed to come to a convenient solution, as far as I can understand from the docs, if you want to query specific fields within the key, you have to submit a POST request containing a JSON document with the fields you wish to search.

So it’s either that or create specific indexes for the queries you wish to perform. Performance-wise this is probably optimal but I don’t actually know for sure.

Paging in RavenDB

Paging in RavenDB is as simple as appending a start + pageSize to the query string

indexes/entriesByAuthorAndCategory?query=category:tech&start=10&pageSize=10

This will perform the query across the entire index and only retrieve the documents requested, this is an operation with trivial expense.

Paging in CouchDB

In CouchDb, a similar query string can be used, using “skip” and “count parameters, but these are considered expensive and instead to perform paging you should:

  • Get the first collection of documents, limiting by count(+1)
  • Get the next collection of documents, starting at the last document in the first collection, limiting by count (+1)
  • Etc

Summary

This really is just a whistle-stop of some basic functionality in these two systems, although it does highlight some fairly major differences in basic functionality between them.

Next up some more advanced functionality will be covered, going over the differences between writing reduce functions in the two



   


Print | posted on Wednesday, June 02, 2010 8:00 AM

Feedback

# re: RavenDB & CouchDB – Basic Queries

Left by Peter Curd at 6/2/2010 8:35 AM
Gravatar Excellent summary! From this I can see that RavenDB makes more sense to me and supports notation I can understand.

Great introduction for document database newbies like me :)

What are the practical implications of having to index before you can select from a field? Does index maintenance become costly?

# re: RavenDB & CouchDB – Basic Queries

Left by robashton at 6/2/2010 8:39 AM
Gravatar Does index maintenance become more costly?

In RavenDB, indexing is strictly a background process, the idea being that they'll eventually become consistent (providing you aren't hammering updates in constantly).

This is why it's great for low-write, high-read scenarios, because reads just look at an already computed index and are incredibly cheap.

I wouldn't say it becomes *costly*, because of it being a background task, but sure - the more indexes you add, the more effort the server will have to make to keep them up to date as data is inserted/modified. That's when you look at scaling out by adding more servers and perhaps splitting your data/indexes out over those servers and using sharding strategies or whatever makes sense.

# re: RavenDB & CouchDB – Basic Queries

Left by Simone Chiaretta at 6/2/2010 11:09 AM
Gravatar Indexes here are not like indexes in relational databases: they looks to me a lot like stored procedures.
What happens if I want to do the same query without having the index? Will it just scan the whole document library looking for objects with the given properties?
Wouldn't it be possible to have the server itself understands the query performed and create the indexes automatically?

# re: RavenDB & CouchDB – Basic Queries

Left by robashton at 6/2/2010 11:14 AM
Gravatar Very astute, indexes in these systems can be considered more like materialized views or indeed stored procedures.

Put simply, you don't perform the query without having the index - Couch does support temporary indexes but they're *slow* and the point of being up front about your querying needs is to make the act of reading data from the database *cheap*

Sure it would be possible to get the server to understand the query, and create the index and then query against that index - but that would be expensive and defeat the point of moving to a system like this :)

# re: RavenDB & CouchDB – Basic Queries

Left by Simone Chiaretta at 6/2/2010 11:44 AM
Gravatar With create the index automatically I meant like, creating the index just the first time, and then use it in subsequent queries like if it was defined up-front. So, after a warm up period the system will be as fast as with the manually created indexes.

# re: RavenDB & CouchDB – Basic Queries

Left by robashton at 6/2/2010 11:46 AM
Gravatar Neither of the systems support this.

It would be a bit hard to do it in a sensible way too.

# re: RavenDB & CouchDB – Basic Queries

Left by Simone Chiaretta at 6/2/2010 3:11 PM
Gravatar Nothing is too hard for Ayende :)

# re: RavenDB & CouchDB – Basic Queries

Left by robashton at 6/2/2010 3:12 PM
Gravatar I'm sure the discussion would be welcome on the mailing list!

http://groups.google.com/group/ravendb/topics

# re: RavenDB & CouchDB – Basic Queries

Left by Simone Chiaretta at 6/2/2010 3:46 PM
Gravatar Moved the conversation over there

# 

Left by Web Dev .NET at 6/3/2010 3:07 PM
Gravatar Tech Tweets for 2-Jun-2010

# A quick collection of useful .NET related links

Left by Ross Hawkins at 6/6/2010 5:24 PM
Gravatar A quick collection of useful .NET related links

Your comment:





 
Please add 2 and 7 and type the answer here:

Copyright © Rob Ashton

Design by Rob Ashton, Based On A Design By Bartosz Brzezinski