JSON Schema Validation & Expressive Query Syntax in MongoDB 3.6

Share this article

JSON Schema Validation & Expressive Query Syntax in MongoDB 3.6

Key Takeaways

  • MongoDB 3.6 enhances data integrity and control by introducing JSON Schema Validation, a more robust and flexible method than the previous Document Validation, enabling developers to enforce schemas within arrays and restrict unapproved attributes.
  • JSON Schema Validation in MongoDB 3.6 supports the $jsonSchema operator, allowing detailed attribute specifications such as type, requirement status, and additional properties, thus preventing insertion of documents with misspelled or unauthorized attributes.
  • The new expressive query syntax in MongoDB 3.6 brings the power of aggregation expressions to query language, facilitating complex business validation rules and dynamic data checks that enhance data accuracy and integrity.
  • MongoDB 3.6’s JSON Schema Validation allows the enforcement of complex structures within documents, such as ensuring that arrays like ‘lineitems’ in orders contain required fields and adhere to specified data types and constraints.
  • The combination of JSON Schema Validation and expressive query syntax in MongoDB 3.6 moves significant portions of business logic from the application layer to the database layer, simplifying code bases and potentially reducing bugs and inconsistencies in data handling.

This article was originally published on MongoDB. Thank you for supporting the partners who make SitePoint possible.

One of MongoDB’s key strengths has always been developer empowerment: by relying on a flexible schema architecture, MongoDB makes it easier and faster for applications to move through the development stages from proof-of-concept to production and iterate over update cycles as requirements evolve.

However, as applications mature and scale, they tend to reach a stable stage where frequent schema changes are no longer critical or must be rolled out in a more controlled fashion, to prevent undesirable data from being inserted into the database. These controls are especially important when multiple applications write into the same database, or when analytics processes rely on predefined data structures to be accurate and useful.

MongoDB 3.2 was the first release to introduce Document Validation, one of the features that developers and DBAs who are accustomed to relational databases kept demanding. As MongoDB’s CTO, Eliot Horowitz, highlighted in Document Validation and What Dynamic Schemas Means:

Along with the rest of the 3.2 “schema when you need it” features, document validation gives MongoDB a new, powerful way to keep data clean. These are definitely not the final set of tools we will provide, but is rather an important step in how MongoDB handles schema.

Announcing JSON Schema Validation Support

Building upon MongoDB 3.2’s Document Validation functionality, MongoDB 3.6 introduces a more powerful way of enforcing schemas in the database, with its support of JSON Schema Validation, a specification which is part of IETF’s emerging JSON Schema standard.

JSON Schema Validation extends Document Validation in many different ways, including the ability to enforce schemas inside arrays and prevent unapproved attributes from being added. These are the new features we will focus on in this blog post, as well as the ability to build business validation rules.

Starting with MongoDB 3.6, JSON Schema is the recommended way of enforcing Schema Validation. The next section highlights the features and benefits of using JSON Schema Validation.

Switching from Document Validation to JSON Schema Validation

We will start by creating an orders collection (based on an example we published in the Document Validation tutorial blog post):

db.createCollection("orders", {
  validator: {
    item: { $type: "string" },
    price: { $type: "decimal" }
  }
});

With this document validation configuration, we not only make sure that both the item and price attributes are present in any order document, but also that item is a string and price a decimal (which is the recommended type for all currency and percentage values). Therefore, the following element cannot be inserted (because of the “rogue” price attribute):

db.orders.insert({
    "_id": 6666, 
    "item": "jkl", 
    "price": "rogue",
    "quantity": 1 });

However, the following document could be inserted (notice the misspelled “pryce” attribute):

db.orders.insert({
    "_id": 6667, 
    "item": "jkl", 
    "price": NumberDecimal("15.5"),
    "pryce": "rogue" });

Prior to MongoDB 3.6, you could not prevent the addition of misspelled or unauthorized attributes. Let’s see how JSON Schema Validation can prevent this behavior. To do so, we will use a new operator, $jsonSchema:

db.runCommand({
  collMod: "orders",
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["item", "price"],
      properties: {

       item: {
            bsonType: "string"
       },
       price: {
          bsonType: "decimal"
        }
      }
    }
  }
});

The JSON Schema above is the exact equivalent of the document validation rule we previously set above on the orders collection. Let’s check that our schema has indeed been updated to use the new $jsonSchema operator by using the db.getCollectionInfos() method in the Mongo shell:

db.getCollectionInfos({name:"orders"})

This command prints out a wealth of information about the orders collection. For the sake of readability, below is the section that includes the JSON Schema:

...
"options" : {
    "validator" : {
        "$jsonSchema" : {
            "bsonType" : "object",
            "required" : [
                "item",
                "price"
            ],
            "properties" : {
                "item" : {
                    "bsonType" : "string"
                },
                "price" : {
                    "bsonType" : "decimal"
                }
            }
        }
    },
    "validationLevel" : "strict",
    "validationAction" : "error"
}
...

Now, let’s enrich our JSON schema a bit to make better use of its powerful features:

db.runCommand({
  collMod: "orders",
  validator: {
    $jsonSchema: {
      bsonType: "object",
      <strong>additionalProperties: false</strong>,
      required: ["item", "price"],
      properties: {
       <strong>_id: {}</strong>,
       item: {
            bsonType: "string",
            description: "'item' must be a string and is required"
        },
        price: {
          bsonType: "decimal",
          description: "'price' must be a decimal and is required"
        },
        quantity: {
          <strong>bsonType: ["int", "long"]</strong>,
          minimum: 1,
          maximum: 100,
          exclusiveMaximum: true,
          description:
            "'quantity' must be short or long integer between 1 and 99"
        }
      }
    }
  }
});

Let’s go through the additions we made to our schema:

  • First, note the use of the additionalProperties:false attribute: it prevents us from adding any attribute other than those mentioned in the properties section. For example, it will no longer be possible to insert data containing a misspelled pryce attribute. As a result, the use of additionalProperties:false at the root level of the document also makes the declaration of the _id property mandatory: whether our insert code explicitly sets it or not, it is a field MongoDB requires and would automatically create, if not present. Thus, we must include it explicitly in the properties section of our schema.
  • Second, we have chosen to declare the quantity attribute as either a short or long integer between 1 and 99 (using the minimum, maximum and exclusiveMaximum attributes). Of course, because our schema only allows integers lower than 100, we could simply have set the bsonType property to int. But adding long as a valid type makes application code more flexible, especially if there might be plans to lift the maximum restriction.
  • Finally, note that the description attribute (present in the item, price, and quantity attribute declarations) is entirely optional and has no effect on the schema aside from documenting the schema for the reader.

With the schema above, the following documents can be inserted into our orders collection:

db.orders.insert({ 
    "item": "jkl", 
    "price": NumberDecimal(15.50),
    "quantity": NumberInt(99)
  });

  db.orders.insert({ 
    "item": "jklm", 
    "price": NumberDecimal(15.50),
    "quantity": NumberLong(99)
  });

However, the following documents are no longer considered valid:

db.orders.insert({ 
    "item": "jkl", 
    "price": NumberDecimal(15.50),
    <strong>"quantity": NumberInt(100)</strong>
  });
  db.orders.insert({ 
    "item": "jkl", 
    "price": NumberDecimal(15.50),
    <strong>"quantity": "98"</strong>
  });
  db.orders.insert({ 
    "item": "jkl", 
    <strong>"pryce": NumberDecimal(15.50),</strong>
    "quantity": NumberInt(99)
  });

You probably noticed that our orders above are seemingly odd: they only contain one single item. More realistically, an order consists of multiple items and a possible JSON structure might be as follows:

{
    _id: 10000,
    total: NumberDecimal(141),
    VAT: 0.20,
    totalWithVAT: NumberDecimal(169),
    lineitems: [
        {
            sku: "MDBTS001",
            name: "MongoDB Stitch T-shirt",
            quantity: NumberInt(10),
            unit_price:NumberDecimal(9)
        },
        {
            sku: "MDBTS002",
            quantity: NumberInt(5),
            unit_price: NumberDecimal(10)
        }
    ]
}

With MongoDB 3.6, we can now control the structure of the lineitems array, for instance with the following JSON Schema:

db.runCommand({
    collMod: "orders",
    validator: {
      $jsonSchema: {
        bsonType: "object",       
        required: ["lineitems"],
        properties: {
        lineitems: {
              <strong>bsonType: ["array"],</strong>
              minItems: 1,
              maxItems:10,
              items: {
                  required: ["unit_price", "sku", "quantity"],
                  bsonType: "object",
                  additionalProperties: false,
                  properties: {
                      sku: {
                        bsonType: "string",
                        description: "'sku' must be a string and is required"
                      },
                      name: {
                        bsonType: "string",
                        description: "'name' must be a string"
                      },
                      unit_price: {
                        bsonType: "decimal",
                        description: "'unit_price' must be a decimal and is required"
                      },
                      quantity: {
                        bsonType: ["int", "long"],
                        minimum: 0,
                        maximum: 100,
                        exclusiveMaximum: true,
                        description:
                          "'quantity' must be a short or long integer in [0, 100)"
                      },
                  }                    
              }
          }
        }
      }
    }
  });

With the schema above, we enforce that any order inserted or updated in the orders collection contain a lineitems array of 1 to 10 documents that all have sku, unit_price and quantity attributes (with quantity required to be an integer).

The schema would prevent inserting the following, badly formed document:

db.orders.insert({
        total: NumberDecimal(141),
        VAT: NumberDecimal(0.20),
        totalWithVAT: NumberDecimal(169),
        lineitems: [
            {
                sku: "MDBTS001",
                name: "MongoDB Stitch T-shirt",
                quantity: NumberInt(10),
                price: NumberDecimal(9) //this should be 'unit_price'
            },
            {
                name: "MDBTS002", //missing a 'sku' property
                quantity: NumberInt(5),
                unit_price: NumberDecimal(10)
            }
        ]
})

But it would allow inserting the following, schema-compliant document:

db.orders.insert({
        total: NumberDecimal(141),
        VAT: NumberDecimal(0.20),
        totalWithVAT: NumberDecimal(169),
        lineitems: [
            {
                sku: "MDBTS001",
                name: "MongoDB Stitch T-shirt",
                quantity: NumberInt(10),
                unit_price: NumberDecimal(9)
            },
            {
                sku: "MDBTS002",
                quantity: NumberInt(5),
                unit_price: NumberDecimal(10)
            }
        ]
})

However, if you pay close attention to the order above, you may notice that it contains a few errors:

  1. The totalWithVAT attribute value is incorrect (it should be equal to 141*1.20=169.2)
  2. The total attribute value is incorrect (it should be equal to the sum of each line item sub-total, (i.e. 10*9+10*5=140)

Is there any way to enforce that total and totalWithVAT values be correct using database validation rules, without relying solely on application logic?

Introducing MongoDB Expressive Query Syntax

Adding more complex business validation rules is now possible thanks to the expressive query syntax, a new feature of MongoDB 3.6.

One of the objectives of the expressive query syntax is to bring the power of MongoDB’s aggregation expressions to MongoDB’s query language. An interesting use case is the ability to compose dynamic validation rules that compute and compare multiple attribute values at runtime. Using the new $expr operator, it is possible to validate the value of the totalWithVAT attribute with the following validation expression:

$expr: {
   $eq: [
     "$totalWithVAT",
     {$multiply: [
       "$total", 
       {$sum: [1, "$VAT"]}
     ]}
   ]
}

The above expression checks that the totalWithVAT attribute value is equal to total * (1+VAT). In its compact form, here is how we could use it as a validation rule, alongside our JSON Schema validation:

db.runCommand({
    collMod: "orders",
    validator: {
 <strong>$expr:{$eq:[
           "$totalWithVAT",
           {$multiply:["$total", {$sum:[1,"$VAT"]}]}
             ]}</strong>,
      $jsonSchema: {
        bsonType: "object",       
        required: ["lineitems"],
        properties: {
          lineitems: {
              bsonType: ["array"],
              minItems: 1,
              maxItems:10,
              items: {
                  required: ["unit_price", "sku", "quantity"],
                  bsonType: "object",
                  additionalProperties: false,
                  properties: {
                      sku: {
                        bsonType: "string",
                        description: "'sku' must be a string and is required"
                      },
                      name: {
                        bsonType: "string",
                        description: "'name' must be a string"
                      },
                      unit_price: {
                        bsonType: "decimal",
                        description: "'unit_price' must be a decimal and is required"
                      },
                      quantity: {
                        bsonType: ["int", "long"],
                        minimum: 0,
                        maximum: 100,
                        exclusiveMaximum: true,
                        description:
                          "'quantity' must be a short or long integer in [0, 100)"
                      },
                  }                    
              }
          }
        }
      }
    }
  });

With the validator above, the following insert operation is no longer possible:

db.orders.insert({
        total: NumberDecimal(141),
        VAT: NumberDecimal(0.20),
        totalWithVAT: NumberDecimal(169),
        lineitems: [
            {
                sku: "MDBTS001",
                name: "MongoDB Stitch T-shirt",
                quantity: NumberInt(10),
                Unit_price: NumberDecimal(9)
            },
            {
                sku: "MDBTS002",
                quantity: NumberInt(5),
                unit_price: NumberDecimal(10)
            }
        ]
})

Instead, the totalWithVAT value must be adjusted according to our new VAT validation rule:

db.orders.insert({
    total: NumberDecimal(141),
    VAT: NumberDecimal(0.20),
    <strong>totalWithVAT: NumberDecimal(169.2)</strong>,
    lineitems: [
            {
                sku: "MDBTS001",
                name: "MongoDB Stitch T-shirt",
                quantity: NumberInt(10),
                unit_price: NumberDecimal(9)
            },
            {
                sku: "MDBTS002",
                quantity: NumberInt(5),
                unit_price: NumberDecimal(10)
            }
        ]
})

If we also want to make sure that the total value is the sum of each order line item value (i.e. quantityunit_price*), the following expression should be used:

$expr: { 
    $eq: [
       "$total", 
       {$sum: {
          $map: {
             "input": "$lineitems",
             "as": "item",
             "in": { 
                "$multiply": [
                   "$$item.quantity", 
                   "$$item.unit_price"
                ]
             } 
          }
       }}
    ]
  }

The above expression uses the $map operator to compute each line item’s sub-total, then sums all these sub-totals, and finally compares it to the total value. To make sure that both the Total and VAT validation rules are checked, we must combine them using the $and operator. Finally, our collection validator can be updated with the following command:

db.runCommand({
    collMod: "orders",
    validator: {
      $expr:{ $and:[
          {$eq:[ 
            "$totalWithVAT",
                   {$multiply:["$total", {$sum:[1,"$VAT"]}]}
          ]}, 
          {$eq: [
                   "$total", 
                {$sum: {$map: {
                    "input": "$lineitems",
                    "as": "item",
                    "in":{"$multiply":["$$item.quantity","$$item.unit_price"]}
                   }}}
             ]}
        ]},
      $jsonSchema: {
        bsonType: "object",       
        required: ["lineitems", "total", "VAT", "totalWithVAT"],
        properties: {
          total: { bsonType: "decimal" },
          VAT: { bsonType: "decimal" },
          totalWithVAT: { bsonType: "decimal" },
          lineitems: {
              bsonType: ["array"],
              minItems: 1,
              maxItems:10,
              items: {
                  required: ["unit_price", "sku", "quantity"],
                  bsonType: "object",
                  additionalProperties: false,
                  properties: {
                      sku: {bsonType: "string"},
                      name: {bsonType: "string"},
                      unit_price: {bsonType: "decimal"},
                      quantity: {
                        bsonType: ["int", "long"],
                        minimum: 0,
                        maximum: 100,
                        exclusiveMaximum: true

                      },
                  }                    
              }
          }
        }
      }
    }
  });

Accordingly, we must update the total and totalWithVAT properties to comply with our updated schema and business validation rules (without changing the lineitems array):

db.orders.insert({
      total: NumberDecimal(140),
      VAT: NumberDecimal(0.20),
      totalWithVAT: NumberDecimal(168),
      lineitems: [
          {
              sku: "MDBTS001",
              name: "MongoDB Stitch T-shirt",
              quantity: NumberInt(10),
              unit_price: NumberDecimal(9)
          },
          {
              sku: "MDBTS002",
              quantity: NumberInt(5),
              unit_price: NumberDecimal(10)
          }
      ]
  })

Next Steps

With the introduction of JSON Schema Validation in MongoDB 3.6, database administrators are now better equipped to address data governance requirements coming from compliance officers or regulators, while still benefiting from MongoDB’s flexible schema architecture.

Additionally, developers will find the new expressive query syntax useful to keep their application code base simpler by moving business logic from the application layer to the database layer.

If you want to learn more about everything new in MongoDB 3.6, download our What’s New guide.

If you want to get deeper on the technical side, visit the Schema Validation and Expressive Query Syntax pages in our official documentation.

If you want to get more practical, hands-on experience, take a look at this JSON Schema Validation hands-on lab. You can try it right away on the MongoDB Atlas database service, which supports MongoDB 3.6 since its general availability date.

Last but not least, sign up for our free MongoDB 3.6 training from MongoDB University.

Frequently Asked Questions on JSON Schema Validation and Expressive Query Syntax in MongoDB 3.6

How does JSON Schema Validation work in MongoDB 3.6?

JSON Schema Validation is a feature in MongoDB 3.6 that allows you to define the structure of your documents and the types of data that can be stored in your collections. It uses a JSON schema, which is a JSON document that describes the structure of other JSON documents. When you insert or update a document in a collection, MongoDB checks the document against the schema. If the document does not match the schema, MongoDB will reject the operation and return an error.

What is the expressive query syntax in MongoDB 3.6?

The expressive query syntax in MongoDB 3.6 is a powerful feature that allows you to specify complex queries and aggregations using a simple and intuitive syntax. It supports a wide range of operators and functions, including comparison, logical, array, and aggregation operators. This makes it easy to filter and manipulate your data in a variety of ways.

How can I specify query expression rules in MongoDB?

Query expression rules in MongoDB can be specified using the $expr operator. This operator allows you to use aggregation expressions within the query language. You can use it to compare fields from the same document, perform mathematical operations, and use aggregation functions.

How can I validate a MongoDB query syntax programmatically?

MongoDB does not provide a built-in function to validate a query syntax programmatically. However, you can use a try-catch block to execute the query and catch any errors. If the query is invalid, MongoDB will throw an error, which you can catch and handle in your code.

What is the db.collection.validate() method in MongoDB?

The db.collection.validate() method in MongoDB is a diagnostic method that checks a collection for errors and returns a document that describes the state of the collection. It checks the structure of the collection’s data files and indexes, and reports any inconsistencies or corruption.

How can I use the db.collection.validate() method?

You can use the db.collection.validate() method by calling it on a collection in your MongoDB database. The method takes an optional boolean argument that specifies whether to perform a more thorough, but slower, validation.

What is the purpose of schema validation in MongoDB?

Schema validation in MongoDB serves to enforce a specific document structure in a collection. It ensures that all documents in a collection have the same basic structure, and that the data stored in the documents is of the correct type. This can help prevent data corruption and make your application more robust and reliable.

How can I define a JSON schema in MongoDB?

You can define a JSON schema in MongoDB using the $jsonSchema operator. This operator allows you to specify a JSON schema that describes the structure of the documents in a collection. You can specify the types of the fields, whether they are required, and other constraints.

Can I change the JSON schema of a collection in MongoDB?

Yes, you can change the JSON schema of a collection in MongoDB using the collMod command with the validator option. This allows you to modify the schema of an existing collection without having to recreate it.

What are the limitations of JSON Schema Validation in MongoDB?

While JSON Schema Validation in MongoDB is a powerful feature, it does have some limitations. For example, it cannot enforce uniqueness of field values across documents, and it cannot validate the structure of embedded documents beyond the first level. Additionally, schema validation can add some overhead to write operations, which may impact performance.

Raphael LondnerRaphael Londner
View Author

Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space.

joelfmongodbsponsored
Share this article
Read Next
Get the freshest news and resources for developers, designers and digital creators in your inbox each week