January 12th, 2013

MongoDB Database Design Research


MongoDB is a shared-nothing, document-oriented database management system. Unlike relational database systems, which store all data in tables defined by a schema, MongoDB stores data in schema-less JSON documents. Users perform queries against the database using procedural API calls, rather than using a declarative language like SQL.

The Brown Data Management Group and 10gen (developers of MongoDB) are developing an automatic database design tool that can create the optimal physical layout in a MongoDB cluster for an arbitrary database. This tool would be able to:

  1. Select the optimal shard keys to horizontally partition data.
  2. Select indexes for important, non-sharded keys.
  3. Optionally generate a normalization/denormalization scheme for JSON structures (e.g., decide to embed one document inside of another).

Because there is no fixed schema for documents in MongoDB, the design tool must also account for keys that may not exist in all documents for a collection.




Source Code

All of the source code for the project is available on Github.


This work is supported (in part) by an Amazon AWS Research Grant.