Mastering MongoDB Schema: Using Variety for Validation and Analysis
This guide explains how to leverage MongoDB's flexible document model, introduces the open‑source Variety tool for schema analysis, demonstrates practical commands for sampling, depth control, filtering, sorting and result persistence, and covers MongoDB 3.2+ document validation features and their limitations.
Benefits of the MongoDB Document Model
MongoDB stores data as JSON‑like documents, which offers developers a natural way to persist objects without needing a predefined schema. The model provides high read/write performance because related data can be embedded or denormalized, reducing costly joins and random I/O typical of relational databases.
Variety – A Schema Analyzer for MongoDB
Variety is an open‑source utility that scans a collection, reports field types and their distribution, and generates a concise report that highlights potential schema issues.
Using Variety: Commands and Options
After creating a collection (see the illustration below), run Variety with the mongo shell:
A typical result shows which fields appear in each document and their percentages:
For large collections, limit the sample size to avoid long scans:
Control nesting depth with maxDepth to ignore overly deep embedded documents:
Filter by a condition, for example only documents where caredAbout is true:
$ mongo test --eval "var collection = 'users', query = {'caredAbout':true}" variety.jsSort results by a field:
$ mongo test --eval "var collection = 'users', sort = { updated_at : -1 }" variety.jsChoose output format (JSON or CSV):
$ mongo test --quiet --eval "var collection = 'users', outputFormat='json'" variety.jsRun analysis on a hidden secondary to avoid load on the primary:
$ mongo secondary.replicaset.member:31337/somedb --eval "var collection = 'users', slaveOk = true" variety.jsPersist the analysis results back into MongoDB:
$ mongo test --quiet --eval "var collection = 'users', persistResults=true" variety.jsAdditional parameters let you specify the destination database, collection, and authentication details:
resultsDatabase – target database name
resultsCollection – target collection name
resultsUser – username for the target instance
resultsPass – password for the target instance
mongo test --quiet --eval "var collection = 'users', persistResults=true, resultsDatabase='db.example.com/variety'" variety.jsWhy Use Variety?
Even though MongoDB is schema‑free, inconsistent field types can cause data quality issues, query errors, and missing information. Variety quickly reveals type mismatches, missing fields, and unexpected structures, helping teams enforce uniformity before problems surface.
Document Validation in MongoDB 3.2+
MongoDB 3.2 introduced Document Validation, allowing administrators to define rules that enforce data integrity while preserving the flexibility of a schema‑free system.
Example: a contacts collection where phone must be a string, email must end with @mongodb.com, and status must be either "Unknown" or "Incomplete".
For existing collections, validation rules can be added via collMod:
The validationLevel parameter controls when validation is applied:
strict – validates existing and future documents (default).
moderate – validates only existing documents.
The validationAction parameter defines the response to violations:
error – rejects offending inserts/updates (default).
warn – logs a warning but allows the operation.
Validation Limitations
Cannot be applied to collections in the admin, local, or config databases.
System collections (e.g., system.*) are excluded.
The article focuses on practical usage of Variety and MongoDB's built‑in validation to maintain data quality in production environments.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
