I realized about regular varieties when databases had been designed earlier than the functions that used them. At the moment, relational knowledge fashions centered on enterprise-wide entities, outlined earlier than entry patterns had been recognized, so future functions may share a steady, normalized schema.
Right this moment, we design databases for particular functions or bounded domains. As a substitute of defining a full mannequin up entrance, we add options incrementally, collect suggestions, and let the schema evolve with the applying.
Regular varieties aren’t simply relational concept—they describe actual knowledge dependencies. MongoDB’s doc mannequin does not exempt you from fascinated about normalization—it offers you extra flexibility in the way you apply it.
Instance: Pizzerias
We’re beginning a brand new enterprise: a big community of pizzerias throughout many areas with all kinds of pizzas. However let’s begin small.
Tabular: One Pizza, One Space
As a minimal viable product (MVP), every pizzeria has one supervisor, sells just one selection, and delivers to at least one space. You may select any database for this: key-value, relational, doc, or perhaps a spreadsheet. The selection will matter solely when your product evolves.
Right here is our first pizzeria:
{
title: "A1 Pizza",
supervisor: "Bob",
selection: "Thick Crust",
space: "Springfield"
}
With no repeating teams or multi-valued attributes, the mannequin is already in First Regular Type (1NF). As a result of the MVP knowledge mannequin is straightforward—one worth per attribute and a single key—there are not any dependencies that may violate increased regular varieties.
Many database designs begin out totally normalized, not as a result of the designer labored by each regular type, however as a result of the preliminary dataset is just too easy for complicated dependencies to exist.
Normalization turns into obligatory later, as enterprise guidelines evolve and new varieties, areas, and impartial attributes introduce dependencies that increased regular varieties tackle.
1NF: Extra Menu Choices
The enterprise began fairly nicely and evolves. A pizzeria can now supply a number of varieties.
The next, including a number of varieties in a single discipline, would violate 1NF:
{
title: "A1 Pizza",
supervisor: "Bob",
varieties: "Thick Crust, Stuffed Crust",
space: "Springfield"
}
1NF requires atomic values—every discipline ought to maintain one indivisible piece of information. A comma-separated string breaks this rule: you may’t simply question, index, or replace particular person varieties. You may manipulate it as a personality string, however you may’t deal with every entry as a definite pizza selection, and you’ll’t index it effectively.
SQL and NoSQL databases keep away from this sample for various causes. In a relational database, the logical mannequin should be impartial of cardinalities and entry patterns. As a result of the relational mannequin does not know whether or not there are two or a million pizza varieties, it treats each one-to-many relationship as unbounded and shops it in a separate desk as a set of pizzeria–selection relationships relatively than embedding varieties throughout the pizzeria entity.
As soon as we perceive the applying area, we are able to set real looking bounds. 1000’s of pizza varieties within the menu can be impractical from a enterprise perspective nicely earlier than hitting database limits, so storing the varieties collectively could be acceptable. When object-oriented functions use richer constructions than two-dimensional tables, it is higher to signify such lists as arrays relatively than comma-separated strings:
{
title: "A1 Pizza",
supervisor: "Bob",
e-mail: "bob@a1-pizza.it",
varieties: ["Thick Crust", "Stuffed Crust"]
}
Arrays of atomic values fulfill a document-oriented equal of 1NF—every component is atomic and independently addressable—regardless that the doc mannequin is not sure by the relational requirement of flat tuples. Whereas SQL databases present abstraction and logical-physical knowledge independence, MongoDB retains knowledge colocated right down to the storage and CPU caches for extra predictable efficiency.
Regular type definitions assume keys for every 1NF relation. In a doc mannequin, a number of relations can seem as embedded sub-documents or arrays. Treating the mother or father key and the array component collectively as a composite key lets us apply increased regular varieties to research partial and transitive dependencies inside a single doc.
2NF: Pizza Pricing
We need to add the worth of the pizzas to our database. If every pizzeria defines its personal base value, it may be added to the varieties gadgets:
{
title: "A1 Pizza",
supervisor: "Bob",
e-mail: "bob@a1-pizza.it",
varieties: [
{ name: "Thick Crust", basePrice: 10 },
{ name: "Stuffed Crust", basePrice: 12 }
]
}
Second Regular Type (2NF) builds on 1NF by requiring that each non-key attribute is dependent upon your entire main key, not simply a part of it. This solely turns into related when coping with composite keys.
In our embedded mannequin, think about the composite key (“pizzeria”, “selection”) for every merchandise within the varieties array. If the worth is dependent upon the pizzeria and selection collectively—that means completely different pizzerias can set completely different costs for a similar selection—then “basePrice” is dependent upon the total composite key, and we fulfill 2NF.
Nevertheless, if costs are standardized throughout all pizzerias—the identical selection prices the identical in every single place—then a partial dependency exists: “basePrice” relies upon solely on “selection”, not on the total (“pizzeria”, “selection”) key. This violates 2NF.
To resolve this, we outline pricing in a separate assortment the place the bottom value relies upon solely on the pizza selection:
{ selection: "Thick Crust", basePrice: 10 }
{ selection: "Stuffed Crust", basePrice: 12 }
We will take away the bottom value from the pizzeria’s varieties array and retrieve it from the pricing assortment at question time:
db.createView(
"pizzeriasWithPrices",
"pizzerias",
[
{ $unwind: "$varieties" },
{
$lookup: {
from: "pricing",
localField: "varieties.name",
foreignField: "variety",
as: "priceInfo"
}
},
{ $unwind: "$priceInfo" },
{ $addFields: { "varieties.basePrice": "$priceInfo.basePrice" } },
{ $project: { priceInfo: 0 } }
]
);
Alternatively, we are able to use the pricing assortment as a reference, the place the applying retrieves the worth and shops it within the pizzeria doc for sooner reads.
To keep away from replace anomalies, the applying updates all affected paperwork when a range’s value adjustments:
const session = db.getMongo().startSession();
const sessionDB = session.getDatabase(db.getName());
session.startTransaction();
sessionDB.getCollection("pricing").updateOne(
{ selection: "Thick Crust" },
{ $set: { basePrice: 11 } }
);
sessionDB.getCollection("pizzerias").updateMany(
{ "varieties.title": "Thick Crust" },
{ $set: { "varieties.$[v].basePrice": 11 } },
{ arrayFilters: [{ "v.name": "Thick Crust" }] }
);
session.commitTransaction();
SQL databases keep away from such a number of updates as a result of they’re designed for direct end-user entry, generally bypassing the applying layer. With out making use of regular varieties to interrupt dependencies into a number of tables, there is a danger of overlooking replicated knowledge. A doc database is up to date by an software service accountable for sustaining consistency.
Whereas normalizing to 2NF is feasible, it might not all the time be the only option in a domain-driven design. Protecting the worth embedded in every pizzeria permits asynchronous updates and helps future necessities the place some pizzerias might supply completely different costs—with out breaking integrity, as the applying enforces updates atomically.
In follow, many functions settle for this managed duplication when value adjustments are rare and like quick single-document reads over completely normalized writes.
3NF: Supervisor’s Contacts
Once we began, every pizzeria had a single e-mail contact:
{
title: "A1 Pizza",
supervisor: "Bob",
e-mail: "bob@a1-pizza.it",
varieties: [
{ name: "Thick Crust", basePrice: 10 },
{ name: "Stuffed Crust", basePrice: 12 }
]
}
Third Regular Type (3NF) builds on 2NF by requiring that non-key attributes rely solely on the first key, not on different non-key attributes. When a non-key attribute is dependent upon one other non-key attribute, now we have a transitive dependency.
Right here, the e-mail really belongs to the supervisor, not the pizzeria straight. This creates a transitive dependency: “pizzeria” → “supervisor” → “e-mail”. Since “e-mail” is dependent upon “supervisor” (a non-key attribute) relatively than straight on the pizzeria, this violates 3NF.
We will normalize this by grouping the supervisor’s attributes into an embedded subdocument:
{
title: "A1 Pizza",
supervisor: { title: "Bob", e-mail: "bob@a1-pizza.it" },
varieties: [
{ name: "Thick Crust", basePrice: 10 },
{ name: "Stuffed Crust", basePrice: 12 }
]
}
Now the e-mail is clearly an attribute of the supervisor entity embedded throughout the pizzeria. If a pizzeria has a number of managers, we are able to merely use an array of subdocuments with out creating new collections or altering index definitions.
A generic relational mannequin would in all probability break up this into a number of tables, with supervisor being a overseas key to a “contacts” desk. Nevertheless, in our enterprise area, we do not handle contacts outdoors of pizzerias. Even when the identical individual manages a number of pizzerias, they’re recorded as separate supervisor entries. Bob might have a number of emails and use completely different ones for every of his pizzerias.
4NF: Supply Areas
We need to report the areas the place a pizzeria can ship its pizza varieties:
{
title: "A1 Pizza",
supervisor: { title: "Bob", e-mail: "bob@a1-pizza.it" },
choices: [
{ variety: { name: "Thick Crust", basePrice: 10 }, area: "Springfield" },
{ variety: { name: "Thick Crust", basePrice: 10 }, area: "Franceville" }
]
}
Fourth Regular Type (4NF) addresses multi-valued dependencies. A multi-valued dependency exists when one attribute determines a set of values for an additional attribute, impartial of all different attributes. 4NF requires {that a} relation haven’t any non-trivial multi-valued dependencies besides on superkeys.
If varieties and areas had been dependent—for instance, if sure varieties had been solely accessible in sure areas—then storing (“selection”, “space”) combos would signify a single multi-valued truth, and there can be no 4NF violation.
Nevertheless, since our pizzerias ship all varieties to all areas, these are impartial multi-valued dependencies: “pizzeria” →→ “selection” and “pizzeria →→ space”. Storing all combos creates redundancy—if we add a brand new space, we should add entries for each selection.
We normalize by storing every impartial truth in a separate array:
{
title: "A1 Pizza",
supervisor: { title: "Bob", e-mail: "bob@a1-pizza.it" },
varieties: [
{ name: "Thick Crust", basePrice: 10 },
{ name: "Stuffed Crust", basePrice: 12 }
],
deliveryAreas: ["Springfield", "Franceville"]
}
With this schema, we keep away from violating 4NF as a result of supply areas and varieties are saved independently—regardless that the doc mannequin permits us to embed them collectively.
BCNF: Per-Space Pricing
Our community grows additional. Some pizzerias now cost completely different costs relying on the supply space—distant areas price extra:
{
title: "A1 Pizza",
supervisor: { title: "Bob", e-mail: "bob@a1-pizza.it" },
choices: [
{ variety: "Thick Crust", area: "Springfield", price: 10 },
{ variety: "Thick Crust", area: "Franceville", price: 11 },
{ variety: "Stuffed Crust", area: "Springfield", price: 12 },
{ variety: "Stuffed Crust", area: "Franceville", price: 13 }
]
}
The composite key for every providing is (“pizzeria”, “selection”, “space”). The value is dependent upon the total key, satisfying 2NF and 3NF.
Now our franchise assigns an space supervisor to every space—one supervisor per space, no matter pizzeria. We add it to our choices:
choices: [
{ variety: "Thick Crust", area: "Springfield", price: 10, areaManager: "Alice" },
{ variety: "Stuffed Crust", area: "Springfield", price: 12, areaManager: "Alice" },
{ variety: "Thick Crust", area: "Franceville", price: 11, areaManager: "Eve" },
{ variety: "Stuffed Crust", area: "Franceville", price: 13, areaManager: "Eve" }
]
Boyce-Codd Regular Type (BCNF) is a stricter model of 3NF. It requires that for each non-trivial useful dependency X → Y, the determinant X should be a superkey. In contrast to 3NF, BCNF does not make an exception for dependencies the place the dependent attribute is a part of a candidate key.
This mannequin passes 3NF however fails BCNF: the dependency “space” → “areaManager” has a determinant (“space”) that isn’t a superkey of the choices relation. The realm alone does not uniquely determine an providing—you want the total (“pizzeria”, “selection”, “space”) key for that.
The sensible downside: if Alice is changed by Carol for Springfield, we should replace each providing for that space throughout each pizzeria. The relational answer is to extract space managers to a separate desk.
In MongoDB, we are able to preserve the embedded construction and deal with updates explicitly:
db.pizzerias.updateMany(
{ "choices.space": "Springfield" },
{ $set: { "choices.$[o].areaManager": "Carol" } },
{ arrayFilters: [{ "o.area": "Springfield" }] }
)
This trades strict BCNF compliance for less complicated queries and sooner reads. The applying ensures consistency throughout updates.
5NF: Including Pizza Sizes
We now supply a number of sizes (Small, Medium, Massive). Sizes, varieties, and supply areas are all impartial—any mixture is legitimate.
Storing each mixture explodes rapidly:
choices: [
{ variety: "Thick Crust", size: "Large", area: "Springfield" },
{ variety: "Thick Crust", size: "Large", area: "Franceville" },
{ variety: "Thick Crust", size: "Medium", area: "Springfield" },
// ... 150 entries for 5 varieties × 3 sizes × 10 areas
]
Fifth Regular Type (5NF), additionally referred to as Challenge-Be part of Regular Type, addresses be part of dependencies. A relation is in 5NF if it can’t be decomposed into smaller relations that, when joined, reconstruct the unique—with out dropping data or introducing spurious tuples.
When legitimate combos could be reconstructed from impartial units (the Cartesian product of types, sizes, and areas), storing all combos explicitly creates redundancy and dangers inconsistency. This violates 5NF.
The repair shops every impartial truth individually:
{
title: "A1 Pizza",
varieties: ["Thick Crust", "Stuffed Crust"],
sizes: ["Large", "Medium"],
deliveryAreas: ["Springfield", "Franceville"]
}
Including a brand new dimension requires updating one array—not tons of of entries. The applying or question logic reconstructs legitimate combos when wanted.
6NF: Monitoring Worth Historical past
Our finance workforce wants to trace value adjustments over time. We may embed the historical past:
choices: [
{
variety: "Thick Crust",
area: "Springfield",
currentPrice: 12,
priceHistory: [
{ price: 10, effectiveDate: ISODate("2024-01-01") },
{ price: 11, effectiveDate: ISODate("2024-03-15") },
{ price: 12, effectiveDate: ISODate("2024-06-01") }
]
}
]
This works for reasonable historical past however grows unboundedly over time.
Sixth Regular Type (6NF) decomposes relations so that every shops a single non-key attribute together with its time dimension. Each row represents one truth at one time limit:
// price_history assortment
{ pizzeria: "A1 Pizza", selection: "Thick Crust", space: "Springfield", value: 10, effectiveDate: ISODate("2024-01-01") }
{ pizzeria: "A1 Pizza", selection: "Thick Crust", space: "Springfield", value: 11, effectiveDate: ISODate("2024-03-15") }
{ pizzeria: "A1 Pizza", selection: "Thick Crust", space: "Springfield", value: 12, effectiveDate: ISODate("2024-06-01") }
6NF isn’t used for operational knowledge as a result of it requires in depth joins for widespread queries. Nevertheless, for auditing, analytics, and temporal queries—the place you could reply “what was the worth on March tenth?”—it supplies a clear mannequin for monitoring adjustments over time.
Abstract
Regular varieties should not a relic of relational concept. They describe elementary knowledge dependencies current in any system, no matter storage expertise. MongoDB’s doc mannequin doesn’t take away the necessity to think about normalization. As a substitute, it enables you to resolve the place, when, and the way strictly to use it, based mostly on area boundaries and entry patterns.
In relational/SQL databases, schemas are normally designed as enterprise-wide data fashions. Many functions and customers share the identical database, accepting advert hoc SQL. To keep away from replace, insertion, and deletion anomalies on this shared atmosphere, the schema should implement useful dependencies, making increased regular varieties important. As a result of the database is the system of report, normalization centralizes integrity guidelines within the knowledge mannequin.
Trendy architectures, against this, typically comply with Area-Pushed Design (DDD). Every bounded context owns its knowledge mannequin, which evolves with the applying. With CQRS and microservices, every combination is up to date solely by a single software service that encapsulates enterprise guidelines. Right here, the database isn’t a shared integration level however a non-public persistence element of the service.
MongoDB suits this model nicely:
- Paperwork mannequin aggregates as they exist within the area
- Arrays seize bounded one-to-many relationships
- Denormalization and managed duplication enhance learn efficiency and scalability
- Consistency is enforced by software logic, not world database constraints
As a result of one service owns all updates, violating increased regular varieties could be acceptable—and generally helpful—offered the service preserves its invariants. Normalization turns into a design software, not a inflexible guidelines.
Briefly:
- Use relational normalization when the database is a shared, queryable system of report accessed by many functions and customers through SQL.
- Use doc modeling with selective denormalization when constructing domain-aligned providers with clear possession, CQRS, and microservices.
Regular varieties nonetheless matter—however in MongoDB, they information your decisions as a substitute of dictating your schema.
