FredZvt.WriteLines();

No-SQL – MongoDB – From introduction to high level usage in C# with NoRM.

with 14 comments

Clique aqui para ler a versão original desse artigo em português.

First I would like to apologize in advance for my bad English. This is a translation of my original article in Portuguese. I will be grateful for any corrections.

In this article I’ll try to make an introduction to the No-SQL movement, the document oriented database MongoDB and three ways to use it: via shell, via the C# driver mongodb-csharp and via the NoRM library.

The objective of this article is that the reader becomes aware of what is and the basics of how to use MongoDB. For that, the examples that we will cover are intentionally simple and does not seek to explore the various ways to perform an operation on MongoDB either to demonstrate it’s full capacity.

Please keep in mind that I’m still studying the software and still do not have practical experience in projects using MongoDB, with my limited knowledge of my private studies, and although I’ve made every effort, using reliable sources, to ensure the veracity of informations until the time of this writing and tested on my computer all the code examples cited here, I can not give full guarantees. Constructive criticism and corrections are extremely welcome.

Contents

No-SQL

Today there are several databases that break with the requirements met by traditional relational database management systems and these are their main characteristics:

  • The disuse of SQL as the query API. (Examples of APIs used include JSON, BSON, REST, RPC, etc.).
  • Does not guarantee atomic operations (non-ACID).
  • Distributed and horizontally scalable.
  • Abandonment to predefined schemas.
  • Non-tabular data storing (eg: key-value, object, graphs, etc.).

Note: Not all databases categorized as No-SQL possess all these characteristics, some have one, others another, some all.

The effort of development and evangelization community of these new technological approaches was named No-SQL and have been doing a lot of noise since early 2009. No-SQL should be understood as “Not only SQL” although this interpretation is not very obvious. In my opinion, it would make more sense if it had been labeled No-Relational.

The most complete listing of databases that can be framed as part of the No-SQL that I found is in http://nosql-database.org/. When I wrote this article, there were 47 databases distributed among different categories.

There is a nice introduction (in Portuguese) to No-SQL databases from the point of view of scalability in blog Escalabilidade.

Here I focus on MongoDB who, along with CouchDB, interested me the most since I started researching the subject.

No-SQL or relational databases!? Both!

Note that non-relational databases do not represent a complete substitution from relational databases. Ideally you must know the pros and cons of each approach and use the most adequate to your needs in the scenarios that we encounter.

I believe that the most obvious scenario that can be target of criticism is the scenario of reports generation. This cases are excellently resolved giving the best use of relational databases and Data Warehouse systems doing frequent aggregations of production data stored in non-relational databases.

Rob Conery wrote an excellent article about this topic.

MongoDB

MongoDB is a schemaless, document oriented, high performance, scalable database that tries to give the best of key-value stores, where documents are stored as JSON structured documents, and relational database management systems with features like many indexing models and dynamic queries.

His name comes from “humongous” (something giant), perhaps with a sarcastic tone to be such a tiny piece of software and so powerful or perhaps because it can scale up to become something really monstrous. Who knows…

Today is used in production for at least 40 sites of high performance and access, including sites like SourceForge, GitHub, EA and The New York Times.

MongoDB vs CouchDB

Although MongoDB and CouchDB have differences in their implementation approaches in aspects such as versioning and concurrency control, scalability, query API, etc, and although both are open source projects, I still consider them as competitors by offering similar features. I believe MongoDB has a great advantage and will develop more quickly because it is written in C++ and not in Erlang, like CouchDB.

Installation and initialization of MongoDB server

MongoDB can be downloaded in compiled versions for Linux, OS X and Windows. For the examples in this article I use version 1.4.0 compiled for Windows 64-bit but I believe that there are no differences using the 32-bit version.

When you finish downloading the zip file, unzip it into any folder on your computer. Four directories will be created: bin, include, lib and lib64. The folders include, lib and lib64 are used for those who will use MongoDB with C++, in our case what matters is the application mongod.exe inside the bin folder, which is the server of MongoDB.

By default, MongoDB uses the c:\data\db as default path to their databases files but this folder is not automatically created, so create them before running the server. If you prefer to use another folder to store your databases files, boot the server with the parameter ––dbpath [path]. I personally like to also use the parameter ––directoryperdb that causes MongoDB create a subfolder for each database that is created. Finally, running mongod.exe the server will initialize by default on localhost, port 27017.

Console with mongod.exe started and waiting for connections.

Next we’ll perform the four basic operations for manipulating data (CRUD) and simple queries on a test base in MongoDB using Mongo shell, in C# with driver mongodb-csharp and with the high-level library NoRM.

Using MongoDB with Mongo shell

The executable mongo.exe is the mongo shell. Run it without parameters and it’ll be initialized connected to the server in its default port (27017).

Enter the command show dbs and Mongo will list all databases that exist. By default there is admin, local and test. To change the database that will receive our commands, use the command use [database name]. The creation of new databases occurs automatically when we add some data to a database that doesn’t exists.

We’ll now insert our first document in our new base, effectively creating it. Enter the command use mybase, this will inform the shell that we will send commands to the database mybase, even if it not exists yet. At this time the base does not exist, you can confirm this executing the command show dbs again. Now we’ll use the command db.[collection name].save( [JSON document] ) to create and save a new document, note that the document should be written using JavaScript Object Notation (JSON). Enter the command: db.People.save({ name: "Fred", age: 28 }) and now the command db.People.save({ name: "Carlos", age: 30 }). Okay, now we have a base named mybase with a collection named People and two JSON documents in this collection: one representing a person named Fred with 28 years old and one representing a person named Carlos with 30 years old.

There are two additional limitations on JSON documents written to be stored by MongoDB: the names of the keys can not start with $ or have . (dots) in any place. Thus, documents such as { $test : 1 } or { “test.key” : “value” } will be invalid.

One of the most interesting features of some No-SQL document oriented databases (including Mongo) is the disuse of predefined schemas in documents of the same collection. In other words, a collection can have documents whose structures are completely different. For those who are familiar with relational databases where each document (record) structure (columns) is restricted by the table that contains it, this is a huge paradigm shift. To illustrate this, let’s add a new document in our collection that is not structured as the previous two that we include. Enter the command db.People.save({ firstname: "Joaquim", lastname:"Silva", age: "15 years old" }). With this command we just insert a document in the collection People with a structure completely different from previous documents and this is absolutely valid.

Now that we have data stored, its time to do some queries on them. In shell, the queries are primarily conducted by the command db.[collection name].find(...). If we execute this command without any parameters it will list all documents of a collection, so db.People.find() will return the three documents that we just inserted. The find command can be used in many ways, allowing us to run any queries that we imagine in our collections. This is a feature that makes MongoDB stand out amongst other No-SQL databases. To exemplify a little (very little indeed!) this feature, we’ll run a query with a simple criterion: to return only the document that represents the person named Fred we would run the command db.People.find({ name: "Fred" }). The find command is one of many that uses a JSON document as criteria specification to ensure that Mongo will apply the desired effect just in the documents that met this criteria.

Note that the returned document has a property beyond that stated explicitly: the _id property. Every document stored in MongoDB collection has to have a unique _id property. We can declare it explicitly in the creation of the document or MongoDB will automatically create an ObjectID typed property called _id for us. This property is the primary identifier of a document in MongoDB and will be indexed by default in all collections. The identifier property has no obligation to be of type ObjectID and may be of any type supported by JSON documents (BSON actually), but the use type ObjectID is recommended to facilitate the scalability of MongoDB databases.

To delete a document in a collection, we will use the command db.[collection name].remove([criteria JSON document]), therefore, to remove the document that represents the person named Carlos, we would perform db.People.remove({ name: "Carlos" }).

To update a document of the collection, we need to get the desired document first, modify it and save it back to the database, effectively updating it. The shell provides us a very useful command when you need to get a single document from a collection to handle it: the command db.[collection name].findOne([criteria JSON document]), so, by running the command joaquim = db.People.findOne({ firstname: "Joaquim" }); we will assign to the variable joaquim the document returned by the database. Now we can change it and save it again in the collection: run joaquim.lastname = "Souza"; to change his last name and db.People.save(joaquim); to update the document in the database.

In my opinion, the shell is the main administration interface to MongoDB, although its main use is through a programming language specific driver. All the features of MongoDB can be accessed from the shell, becoming intimate to this tool will greatly assist you to work with MongoDB databases. See the documentation for the shell on the official MongoDB.

Using MongoDB with mongodb-csharp

As shell is the interface between human-MongoDB the programming language specific drivers are the interfaces between software-MongoDB. Today there are drivers for 25 programming languages. Only for .NET framework, there are 6 drivers/tools according to Bryan Migliorisi in an article on his blog, and among these the MongoDB-csharp driver, developed by a troupe led by Sam Corder, is the most mature to date.

A small criticism to the project goes to the poor documentation, I think at least the wiki project should contain some examples of the major features but it is possible to understand how it works analyzing the test projects. I believe this will be greatly improved with some time as the project is still very young.

In this article we will use the version 0.82.2 of MongoDB-csharp to run our examples. Its API is very similar to the shell, making our lives even easier.

Unpacking the download file, we’ll have three libraries: MongoDB.Driver.dll, MongoDB.GridFS.dll and MongoDB.Linq.dll. The one which we must add as a reference in our project is the MongoDB.Driver.dll. The other two are, respectively, an API to support the GridFS specification, which allows storage of binary objects over 4MB in MongoDB and basic support for LINQ.

The first thing we must do in our code to use MongoDB is connect to server:

using MongoDB.Driver;

Mongo mongo = new Mongo();
mongo.Connect();

Then we get a reference to the database and the collection that we want to handle:

Database db = mongo.GetDatabase("mybase");
IMongoCollection collection = db.GetCollection("People");

The class MongoDB.Driver.Document represents a MongoDB document and works much like a System.Collections.Generic.Dictionary<String, Object>:

Document doc = new Document();
doc.Add("name", "Guilherme");
doc.Add("age", 22);

To persist a the new document in the referenced collection:

collection.Save(doc);

When we run the find command in shell, what we get in return is actually a cursor that is automatically iterated by shell but we could assign it to a variable and manipulate it manually. MongoDB-csharp uses the same principle, but here the work of iterating over the values will always be manual.

ICursor cur = collection.FindAll();
foreach (Document iDoc in cur.Documents)
{
    Console.WriteLine("Name: {0}, Age: {1}", iDoc["name"], iDoc["age"]);
}

To update a document, the process is, again, similar to run on shell: get a reference to a specific document, modify it and save it back to the collection:

Document doc2 = collection.FindOne(new Document() {{"firstname", "Joaquim"}});
doc2["lastname"] = "Silva";

collection.Save(doc2);

And, finally, to delete a document:

collection.Delete(doc2);
collection.Delete(new Document() {{ "name", "Guilherme" }});

Could it be more simple? The driver MongoDB-csharp provides APIs for almost all the features of MongoDB and is very easy to use, as we saw.

Using MongoDB with NoRM

After met the NoRM library, when I use MongoDB-csharp to access a database I get a bitter taste of low-level in my mouth, I remember the time when I use Recordsets to access databases in classic ASP.

The NoRM library provides a strongly typed, high level abstraction for manipulating documents in MongoDB as well as a friendly API to send commands to the server.

NoRM, as well as MongoDB-csharp, is an open-source project hosted on github, has Andrew Theken as leader and primary developer and contributors of the weight of Rob Conery (the mind behind the project SubSonic). It’s a project so young that even has a release yet, but already attracting lots of attention!

Because there are no releases so far, we have to download the code to compile on our machine. In the github’s project page, click the “Download Source” and then select the compression format of your preference. Save and unpack it in any folder on your computer. Or make a clone of the Git repository. Open the solution NoRM.sln, compile the NoRM project and add reference to the generated DLL into your application or add the project NoRM/NoRM.csproj to your solution and reference the project in your application.

The library works in a strongly typed manner, automatic translating your C# classes in BSON documents to store in MongoDB and vice versa, so let’s create the class Person:

class Person
{
    public ObjectId ID { get; set; }
    public string Name { get; set; }
    public int Age { get; set; }
}

NoRM imposes certain conventions in our classes to run serialization/deserialization without problems, they are:

  • The supported types for serialization/deserialization (until the moment of this writing) are:
    • int
    • bool
    • double
    • float
      • Will be always serialized/deserialized as double
    • DateTime
    • string
    • byte[]
    • Norm.ObjectId
    • Regex
    • Guid
    • Enums
      • Must inherit from uint, int, ulong, long
      • (eg: public enum meu_enum : int {...})

    • Norm.ScopedCode
    • IEnumerable<T>
      • T must be of one of these supported types as well.
      • Will be always deserialized as List<T>
      • Update: As Andrew Theken inform: It seems that now other types of collections are supported.

  • Its recomended that, when use value types, we should use use use their nullable versions because when the document been deserialized hasn’t a property of your C# class, this will be assigned to null. (Eg: use int? instead of int)
  • Every C# class must possess a public property called _id or ID, being of one of the supported types, but if we use Guid or Norm.ObjectId it initialization will be automatic.
    • If we want to use another property as our identifier we could use the [MongoIdentifier] attribute, getting rid of the requirement to create the property _id or ID.
  • All public properties must have public getter and setter. Update: As Andrew Theken inform Karl Seguin’s work has made private/protected getters/setters available.
  • NoRM still don't recognizes ciclic references, leaving for us clients to take care of this. Be warned, if this occurs it will generate an inifinite loop in runtime.
  • NoRM still don't recognizes if you try to deserialize a value in an invalid property type. (Eg: A type convertion error will occur if you try to deserialize a BSON integer value in a Datetime property.
  • The size of a document cannot be greater than 4MB. (In fact, this is a limitation of MongoDB, not of NoRM. If you must store a binary greater than 4MB, use GridFS specification)
  • The precision of DateTime type is in milliseconds. If you need more precision, use doubles instead.

I know that at first glance seems to have many restrictions but, in my opinion, for a project that has his first commit in January 30 of 2010, its quite an achievement. Remember that this library is open source and if you miss a feature that was not implemented you could submit a patch. Besides, all I need to implement my domain entities in plain POCOs are being offered at the library.

With everything in place, lets open a connection with MongoDB server an get a reference to our collection:

using Norm;
using Norm.Collections;

Mongo mongo = new Mongo("mybase");
MongoCollection<Person> Collection = mongo.GetCollection<Person>("People");

To insert a new person in our collection we just need to create an instance of Person and save it in our collection:

Person Bruno = new Person();
Bruno.Name = "Bruno";
Bruno.Age = 35;

Collection.Save(Bruno);

To update a document, we first get the document from the collection but this time in a strongly typed way. To specify the criteria we'll use a anonymous type with only a property called Name. Then we'll update our instance of Person and save it back to the collection:

Person Bruno2Update = Collection.FindOne(new { Name = "Bruno" });
Bruno2Update.Age = 40;
Bruno2Update.Name = "Bruno Garcia";

Collection.Save(Bruno2Update);

For beautifully list the persons in the collection:

IEnumerable<Person> People = Collection.Find();
foreach (Person p in People)
{
    Console.WriteLine("Name: {0}, Age: {1}", p.Name, p.Age);
}

And, finally, could not be simpler to delete a person from the collection:

Collection.Delete(Bruno2Update);

Updates after the publication of this article

The wide acceptance of this article was a big surprise for me and I would like to thank everyone who gave me some kind of feedback.

I would add some notes and corrections sent in by readers:

First of all, I want to clarify that in this article I'm not really giving preference to the driver mongodb-csharp nor the NoRM library. For me, it was clear that the two libraries are proposed to different things. While mongodb-csharp undertakes to provide the API in a manner more similar to the shell, giving us greater flexibility, NoRM proposed to provide greater productivity more distant to shell approach and closer to the reality of developers, providing a direct mapping of their classes in C# so strongly typed, however less flexible.

I was happy to be informed by Craig Wilson that soon we'll have a release of mongodb-csharp that brings feature parity with NoRM plus a little extra.

If the project mongodb-csharp is now proposed to offer all the features offered by NoRM project and vice versa, at first I will give preference to mongodb-csharp since I consider it a more mature project. But in that case, I don't understand why the two teams did not come together to develop a unique library, taking the best of the two worlds.

Finally, I inform you that there may be updates in the article body in order to better inform the readers with the informations that I receive. Where there is a fix, I'll strike through text and do inline comment.

About these ads

Written by Frederico Zveiter

24 abr - 2010 at 16:26

Publicado em MongoDB, No-SQL, NoRM

Tagged with , , , , , ,

14 Respostas

Subscribe to comments with RSS.

  1. [...] 5 comentários Click here to read the english version of this [...]

  2. Great writeup – one thing you misse about NoRM is the LINQ stuff. You can create a LINQ query by…

    var query = new MongoQuery();
    from p in query where p.Name = “Steve” select p;
    :)

    Rob Conery

    24 abr - 2010 at 18:11

    • Hi Rob! Thanks for the comment, it’s a great honor!

      I did not go into the subject of support for LINQ intentionally to follow the objectives of this article that is just giving an introduction to the subject. I’m preparing a whole article only to consider ways to run queries on MongoDB and, then yes, I will quote the LINQ.

      fredzvt

      25 abr - 2010 at 12:31

  3. Thanks for the post! I should mention – we’ve had many awesome contributions/contributors on NoRM, so Rob and myself are hardly the only two. The restrictions on how the class needs to be defined are mostly non-existant now – Karl Seguin’s work has made private/protected getters/setters, non nullable types, other types of collections possible. The problem with pointing out the contributors of this project is that they have all made excellant contributions.

    Andrew Theken

    24 abr - 2010 at 18:52

    • Hi Andrew! Thanks for the comment, it’s a great honor too! :)

      I apologize to the other contributors, I didn’t want them out of the credits, just wanted to give the reader the weight of the contributors to this project citing you and Rob Conery. But of course without the contribution of all involved the project would not be the great success it is.

      That’s great news that restrictions are being reduced, in fact I expected that. Please let me know what restrictions are outdated for me to update the article.

      fredzvt

      25 abr - 2010 at 12:39

  4. [...] more from the original source: No-SQL – MongoDB – From introduction to high level usage in C# … If you enjoyed this article please consider sharing [...]

  5. [...] No-SQL – MongoDB – From introduction to high level usage in C# … Tags: and-three, and-via, article, driver-mongodb-csharp, library–, mongo, norm, [...]

  6. The restrictions on how the chic needs to be authentic are mostly non-existant now Karl Seguins plan has fabricated private/protected getters/setters, non nullable types, added types of collections possible.

    Fiona Smithe

    27 abr - 2010 at 06:45

  7. Thanks for the info. Any plans to describe how atomic operations are handled?

    Hadi

    29 abr - 2010 at 03:43

    • Hi Hadi, thanks for the comment.
      I’m preparing an article about ways to query mongodb and atomic operations maybe one of the topics but unfortunately I have been very busy lately and do not know when I’ll finish it. I hope that does not take long.

      fredzvt

      30 abr - 2010 at 08:41

  8. [...] No-SQL – MongoDB – From introduction to high level usage in C# with NoRM [...]

  9. [...] No-SQL – MongoDB – From introduction to high level usage in C# with NoRM [...]

  10. Nice article.. got me started with MongoDb c# Samus… How do I do indexing with this driver?

    – Lalith

    Lalith

    10 dez - 2010 at 18:03

  11. There is a nice collection of tips for translating SQL style queries into queries that MongoDB can understand. It helps a lot especially if you are new to NoSQL sql to mongo db

    arubyguy

    3 mar - 2011 at 07:03


Deixe um comentário

Preencha os seus dados abaixo ou clique em um ícone para log in:

Logotipo do WordPress.com

Você está comentando utilizando sua conta WordPress.com. Sair / Alterar )

Imagem do Twitter

Você está comentando utilizando sua conta Twitter. Sair / Alterar )

Foto do Facebook

Você está comentando utilizando sua conta Facebook. Sair / Alterar )

Foto do Google+

Você está comentando utilizando sua conta Google+. Sair / Alterar )

Conectando a %s

Seguir

Obtenha todo post novo entregue na sua caixa de entrada.

Junte-se a 71 outros seguidores