MongoDB Best Practices

Author: Peter

The following are best practices to follow when designing and creating NOSQL databases such as MongoDB. For future reading, see References section.

Try to keep data all in one Document
Avoid large documents
Don’t store binary data in documents
Have shorter field names
Normalize data and avoid complex structures
Use indexing
Keep queries in mind when creating structures
Data Integrity

Keeping data in a Single Document

Part of the power of MongoDB is to focus on keeping all the data in a single document if possible. This will enable faster searching and fetching as there is only a single collection to load.

Avoid Large Documents

From the documentation: “MongoDB imposes a 4MB (16MB with 1.8) size limit on a single document.” While it may seem small, it should be able to hold most values and data as needed. With that in mind, we do not want to store large binaries in place.

Avoid Storing Large Binaries

In best practices, it is better to store a reference’s or links to larger files or binaries in a DB, this can help prevent database bloat and data corruption. File and other such data should be store in a NAS or other large data solution such as S3 for AWS.

Avoid Large Field Names

For the sake of readability and creating queries, keep fieldnames short if possible. Having a large document with many fields will benefit from this.

Normalize Data in Documents and avoid complex structures.

When a document gets large and needs to store multiple values of data within it, it’s good to split off and create another collection for the data. When this happens, we need to flatten and normalize the data the best we can. While creating nested and complex data structures within a document is possible, running queries on this nested data is not and can cause issues. Here instead of storing a larger complex structure, we can instead store a objectID in place and only load and query the data when needed. This way we still do application loading.

Make use of Indexing

As in a SQL Database, we want to make use of indexing unique data when possible, this way when we run large queries on data, we can reduce time complexity of a table query to O(1) in MongoDB we need to define these as we build them.

Keep queries in mind when creating structures

As we build forwarded, we want to keep in mind the queries we will be running. Using these practices and looking at what queries need to be run, we can reduce tech debt and latency when running queries on large datasets in the future.

Data Integrity

The practice of Data Integrity is foundation of Database design. There are 3 parts to this practice. WHO is accessing the data, WHAT can access it is the data and Where the data is located.

WHO

The first part is a practice of recording transactions within the data itself. This can be done with “created_by, updated_by, and timestamps” these are necessary fields to go into each document.

WHAT

What refers to what services have access to the data. Depending on the deployment, there should be limits and roles put on who can access this data. Limiting the number of services and users who can connect directly to the database can help limit vulnerabilities.

WHERE

WHERE is where the data is located. For development and deployment, there should be at least 3 levels of deployment.

PRD: Production, is Live, user created data. There should be a protocol in place for updating, migrating, backing up and deploying this data.

STG/TST: Staging and Testing, this database should be a mirror of Prod, meaning this data is as close to production database as we can get. Here is where new releases and features are set up to run on before released to production. Typical format is when PROD is backup, it is then updated to Staging and testing. This way we can ensure QA/QC on new features and releases as needed.

DEV/TMP: development/temporary environments are the wild west. Each developer should have a local database on their machine and can dump and reloads data as needed. They should have access to backups and staging to run up-to-date data on their platforms as needed.