January 12th, 2013

MongoDB Database Design Research

Summary

MongoDB is a shared-nothing, document-oriented database management system. Unlike relational database systems, which store all data in tables defined by a schema, MongoDB stores data in schema-less JSON documents. Users perform queries against the database using procedural API calls, rather than using a declarative language like SQL.

The Brown Data Management Group and 10gen (developers of MongoDB) are developing an automatic database design tool that can create the optimal physical layout in a MongoDB cluster for an arbitrary database. This tool would be able to:

  1. Select the optimal shard keys to horizontally partition data.
  2. Select indexes for important, non-sharded keys.
  3. Optionally generate a normalization/denormalization scheme for JSON structures (e.g., decide to embed one document inside of another).

Because there is no fixed schema for documents in MongoDB, the design tool must also account for keys that may not exist in all documents for a collection.

People

Brown:

10Gen:

Source Code

All of the source code for the project is available on Github.

Acknowledgements

This work is supported (in part) by an Amazon AWS Research Grant.