When efficiency issues, the way you retailer your utility objects, write queries, and design indexes may be the distinction between scanning a group for minutes and getting ends in milliseconds—with out touching a single doc.
On this article, we’ll stroll by means of an instance in MongoDB. We’ll begin with a normalized mannequin, transfer to an embedded mannequin, add an index to keep away from a full assortment scan, after which refine each the index and question to learn solely what’s wanted for the consequence. The execution plan and execution statistics will information us at every step.
State of affairs
Our area is an e‑commerce order‑entry utility with clients, merchandise, orders, and order strains. We wish to:
Discover the newest order in Switzerland containing the e-book “MongoDB Knowledge Modeling and Schema Design”.
Preliminary mannequin: normalized with references
If we construct the info mannequin with out contemplating how purposes will entry it—or just migrate from a relational database by creating one assortment per desk—we find yourself with a mannequin the place a single area combination is break up throughout a number of collections.
On this mannequin, we insert a buyer, some merchandise (books), an order for this buyer, and the corresponding order strains for these books:
// Buyer
db.clients.insertOne({
"_id": "C0001",
"identify": "Franck Pachot",
"e mail": "franck.pachot@instance.com",
"tackle": { "metropolis": "Geneva", "nation": "Switzerland" }
})
// Merchandise
db.merchandise.insertOne({
"_id": "P0001",
"title": "MongoDB in Motion, Third Version: Constructing on the Atlas Knowledge Platform",
"creator": "Arek Borucki",
"value": 59.99, "class": "Database", "format": "Paperback"
})
db.merchandise.insertOne({
"_id": "P0002",
"title": "MongoDB Knowledge Modeling and Schema Design",
"creator": ["Daniel Coupal", "Pascal Desmarets", "Steve Hoberman"],
"value": 54.99, "class": "Database", "format": "Paperback"
})
db.merchandise.insertOne({
"_id": "P0003",
"title": "Excessive Efficiency with MongoDB: Greatest Practices for Efficiency Tuning, Scaling, and Structure",
"creator": ["Asya Kamsky", "Ger Hartnett", "Alex Bevilacqua"],
"value": 49.99, "class": "Database", "format": "Paperback"
})
// Order
db.orders.insertOne({
"_id": "O0001",
"customerId": "C0001",
"orderDate": ISODate("2024-10-22T09:00:00Z"),
"standing": "Processing"
})
// Order strains
db.orderlines.insertMany([
{ "orderId": "O0001", "productId": "P0001", "quantity": 1 },
{ "orderId": "O0001", "productId": "P0002", "quantity": 1 },
{ "orderId": "O0001", "productId": "P0003", "quantity": 1 }
])
It is a traditional relational database sample, however querying and optimizing it’s complicated as a result of joins should happen earlier than filtering.
In MongoDB, to rebuild the area object combination from such mannequin, we use an aggregation pipeline with $lookup joins:
db.orders.combination([
// Get customer info to filter by country
{ $lookup: {
from: "customers",
localField: "customerId",
foreignField: "_id",
as: "customer"
}},
{ $unwind: "$customer" },
// Get product info to filter by title
{ $lookup: {
from: "orderlines",
let: { orderIdVar: "$_id" },
pipeline: [
{ $match: { $expr: { $eq: ["$orderId", "$$orderIdVar"] } } },
{ $lookup: {
from: "merchandise",
localField: "productId",
foreignField: "_id",
as: "product"
}},
{ $unwind: "$product" }
],
as: "orderLines"
}},
// Apply filters
{ $match: {
"buyer.tackle.nation": "Switzerland",
"orderLines.product.title": "MongoDB Knowledge Modeling and Schema Design"
}},
// Kind and restrict
{ $type: { orderDate: -1 } },
{ $restrict: 1 }
])
This question works, however it scans the complete assortment. Indexes aren’t doable right here as a result of $match runs solely after the $lookup and $unwind levels.
With a million orders, it will learn all a million, carry out three million lookups, filter on nation and e-book title, type a big intermediate consequence, after which return only one doc.
That’s why joins may be sluggish: when information is break up throughout many tables or collections, you lose the good thing about environment friendly index entry—compound indexes can solely goal one assortment. It additionally requires to code each question within the area repository.
We have to rethink the info mannequin.
Step 1: Embedding to suit the area mannequin
Understanding the database is for an order‑entry system, we will design a doc mannequin that matches the area. An order must be a self‑contained doc with all buyer data and full product particulars for every order line—as a result of that’s the way it’s displayed, printed, or emailed. If it’s a single logical doc on paper, it must be a single doc in MongoDB.
Right here’s the embedded type:
db.orders.insertOne(
{
_id: 'O0002',
customerId: 'C0001',
orderDate: ISODate('2024-10-22T09:00:00.000Z'),
standing: 'Processing',
buyer: {
_id: 'C0001',
identify: 'Franck Pachot',
e mail: 'franck.pachot@instance.com',
tackle: { metropolis: 'Geneva', nation: 'Switzerland' }
},
orderLines: [
{
quantity: 1,
product: {
_id: 'P0001',
title: "'MongoDB in Action, Third Edition: Building on the Atlas Data Platform',"
author: [ 'Arek Borucki' ],
value: 59.99,
class: 'Database',
format: 'Paperback'
}
},
{
amount: 1,
product: {
_id: 'P0002',
title: "'MongoDB Knowledge Modeling and Schema Design',"
creator: [ 'Daniel Coupal', 'Pascal Desmarets', 'Steve Hoberman' ],
value: 54.99,
class: 'Database',
format: 'Paperback'
}
},
{
amount: 1,
product: {
_id: 'P0003',
title: "'Excessive Efficiency with MongoDB: Greatest practices for efficiency tuning, scaling, and structure',"
creator: [ 'Asya Kamsky', 'Ger Hartnett', 'Alex Bevilacqua' ],
value: 49.99,
class: 'Database',
format: 'Paperback'
}
}
]
}
);
Duplicated information isn’t a priority right here: paperwork are compressed on disk, and adjustments to product descriptions or buyer’s tackle don’t have an effect on accomplished orders. Techniques of document retailer information and relationships as they had been on the time of the enterprise occasion.
Querying turns into far less complicated:
db.orders.discover(
{
"buyer.tackle.nation": "Switzerland",
"orderLines.product.title": "MongoDB Knowledge Modeling and Schema Design"
}
).type({ orderDate: -1 }).restrict(1)
No lookups wanted. But it surely nonetheless does a group scan, filters, types, and eventually returns one doc.
I add a million paperwork and run the question with execution statistics:
db.orders.insertMany(Array.from({size: 333333},()=>({buyer:{tackle:{nation:"Switzerland"}}})))
db.orders.insertMany(Array.from({size: 666666},()=>({buyer:{tackle:{nation:"Germany"}}})))
db.orders.discover(
{
"buyer.tackle.nation": "Switzerland",
"orderLines.product.title": "MongoDB Knowledge Modeling and Schema Design"
}
).type({ orderDate: -1 }).restrict(1).clarify("executionStats").executionStats
{
executionSuccess: true,
nReturned: 1, -- 👀
executionTimeMillis: 559,
totalKeysExamined: 0, -- 👀
totalDocsExamined: 1000001, -- ❌
executionStages: {
isCached: false,
stage: 'SORT', -- 👀
nReturned: 1,
executionTimeMillisEstimate: 542,
works: 1000004, -- 👀
superior: 1,
needTime: 1000002,
needYield: 0,
saveState: 29,
restoreState: 29,
isEOF: 1,
sortPattern: { orderDate: -1 },
memLimit: 104857600,
limitAmount: 1,
sort: 'easy',
totalDataSizeSorted: 0,
usedDisk: false,
spills: 0,
spilledRecords: 0,
spilledBytes: 0,
spilledDataStorageSize: 0,
inputStage: {
stage: 'COLLSCAN', -- 👀
filter: { -- 👀
'$and': [
{ 'customer.address.country': { '$eq': 'Switzerland' } },
{
'orderLines.product.title': { '$eq': 'MongoDB Data Modeling and Schema Design' }
}
]
},
nReturned: 1, -- 👀
executionTimeMillisEstimate: 516,
works: 1000002, -- 👀
superior: 1,
needTime: 1000000,
needYield: 0,
saveState: 29,
restoreState: 29,
isEOF: 1,
route: 'ahead',
docsExamined: 1000001 -- 👀
}
}
}
In an effort to return a single doc (nReturned: 1), a million paperwork had been learn (totalDocsExamined: 1000001), and sorted (stage: 'SORT', works: 1000004).
Now we we have now multi function assortment, we will optimize with an index. Our goal is to scale back totalDocsExamined to nReturned.
Step 2: Indexing for filtering, sorting, and pagination
Our question filters by:
- Equality on buyer nation
- Equality on product title
- A spread on order date (as a result of
type().restrict()).
A really perfect compound index:
db.orders.createIndex({
"buyer.tackle.nation": 1, // equality
"orderLines.product.title": 1, // equality
"orderDate": 1 // type/vary
})
Right here is the execution plan:
db.orders.discover(
{
"buyer.tackle.nation": "Switzerland",
"orderLines.product.title": "MongoDB Knowledge Modeling and Schema Design"
}
).type({ orderDate: -1 }).restrict(1).clarify("executionStats").executionStats
{
executionSuccess: true,
nReturned: 1, -- 👀
executionTimeMillis: 0,
totalKeysExamined: 1, -- ✅
totalDocsExamined: 1, -- ✅
executionStages: {
isCached: false,
stage: 'LIMIT',
nReturned: 1,
executionTimeMillisEstimate: 0,
works: 2,
superior: 1,
needTime: 0,
needYield: 0,
saveState: 0,
restoreState: 0,
isEOF: 1,
limitAmount: 1,
inputStage: {
stage: 'FETCH', -- 👀
nReturned: 1,
executionTimeMillisEstimate: 0,
works: 1,
superior: 1,
needTime: 0,
needYield: 0,
saveState: 0,
restoreState: 0,
isEOF: 0,
docsExamined: 1,
alreadyHasObj: 0,
inputStage: {
stage: 'IXSCAN', -- 👀
nReturned: 1,
executionTimeMillisEstimate: 0,
works: 1,
superior: 1,
needTime: 0,
needYield: 0,
saveState: 0,
restoreState: 0,
isEOF: 0,
keyPattern: { -- 👀
'buyer.tackle.nation': 1,
'orderLines.product.title': 1,
orderDate: 1
},
indexName: 'buyer.tackle.country_1_orderLines.product.title_1_orderDate_1',
isMultiKey: true,
multiKeyPaths: {
'buyer.tackle.nation': [],
'orderLines.product.title': [ 'orderLines' ],
orderDate: []
},
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
route: 'backward', -- 👀
indexBounds: {
'buyer.tackle.nation': [ '["Switzerland", "Switzerland"]' ],
'orderLines.product.title': [
'["MongoDB Data Modeling and Schema Design", "MongoDB Data Modeling and Schema Design"]'
],
orderDate: [ '[MaxKey, MinKey]' ] -- 👀
},
keysExamined: 1, -- 👀
seeks: 1, -- 👀
dupsTested: 1,
dupsDropped: 0
}
}
}
}
With this index, the question jumps straight to the goal doc:
-
totalKeysExamined: 1from one vary (seeks: 1instage: 'IXSCAN) -
totalDocsExamined: 1fromstage: 'FETCH'with out the necessity to type nReturned: 1
The result’s retrieved in sorted order straight from the index, minimizing the variety of paperwork to learn. We will do higher. Our subsequent objective is to elimiate totalDocsExamined.
Step 3: Projection and masking index
If the applying solely wants the client’s identify and nation, we will mission:
db.orders.discover(
{
"buyer.tackle.nation": "Switzerland",
"orderLines.product.title": "MongoDB Knowledge Modeling and Schema Design"
},
{ "buyer.identify": 1, "buyer.tackle.nation": 1, _id: 0 }
).type({ orderDate: -1 })
[
{
customer: { name: 'Franck Pachot', address: { country: 'Switzerland' } }
}
]
Now, we will create a masking index:
db.orders.createIndex({
"buyer.tackle.nation": 1,
"orderLines.product.title": 1,
"orderDate": 1,
"buyer.identify": 1
})
Right here is the execution plan:
db.orders.discover(
{
"buyer.tackle.nation": "Switzerland",
"orderLines.product.title": "MongoDB Knowledge Modeling and Schema Design"
},
{ "buyer.identify": 1, "buyer.tackle.nation": 1, _id: 0 }
).type({ orderDate: -1}).restrict(1).clarify("executionStats").executionStats
{
executionSuccess: true,
nReturned: 1, -- 👀
executionTimeMillis: 1,
totalKeysExamined: 1, -- ✅
totalDocsExamined: 0, -- ✅
executionStages: {
isCached: false,
stage: 'LIMIT',
nReturned: 1,
executionTimeMillisEstimate: 0,
works: 2,
superior: 1,
needTime: 0,
needYield: 0,
saveState: 0,
restoreState: 0,
isEOF: 1,
limitAmount: 1,
inputStage: {
stage: 'PROJECTION_DEFAULT', -- 👀
nReturned: 1,
executionTimeMillisEstimate: 0,
works: 1,
superior: 1,
needTime: 0,
needYield: 0,
saveState: 0,
restoreState: 0,
isEOF: 0,
transformBy: { 'buyer.identify': 1, 'buyer.tackle.nation': 1, _id: 0 },
inputStage: {
stage: 'IXSCAN', -- 👀
nReturned: 1,
executionTimeMillisEstimate: 0,
works: 1,
superior: 1,
needTime: 0,
needYield: 0,
saveState: 0,
restoreState: 0,
isEOF: 0,
keyPattern: {
'buyer.tackle.nation': 1,
'orderLines.product.title': 1,
orderDate: 1,
'buyer.identify': 1
},
indexName: 'buyer.tackle.country_1_orderLines.product.title_1_orderDate_1_customer.name_1',
isMultiKey: true,
multiKeyPaths: {
'buyer.tackle.nation': [],
'orderLines.product.title': [ 'orderLines' ],
orderDate: [],
'buyer.identify': []
},
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
route: 'backward', -- 👀
indexBounds: {
'buyer.tackle.nation': [ '["Switzerland", "Switzerland"]' ],
'orderLines.product.title': [
'["MongoDB Data Modeling and Schema Design", "MongoDB Data Modeling and Schema Design"]'
],
orderDate: [ '[MaxKey, MinKey]' ],
'buyer.identify': [ '[MaxKey, MinKey]' ] -- 👀
},
keysExamined: 1,
seeks: 1,
dupsTested: 1,
dupsDropped: 0
}
}
}
}
That is the optimum plan the place all paperwork returned (nReturned: 1) come from the index entries that had been scanned (totalKeysExamined: 1) with none FETCH (totalDocsExamined: 0)
We will nonetheless optimize—for instance, by making the “orderDate” index descending, as ahead scans may be barely quicker than backward ones. However right here, the place just one index entry is learn, it doesn’t matter. What issues most is figuring out when to cease tuning and holding solely the minimal set of indexes that delivers the required efficiency in your utility.
This index helps a number of queries. It permits you to:
- Discover orders from a particular nation
- Discover orders containing a particular product title, mixed with the nation filter, or alone (with skip scan)
- Filter additional by date vary
- Kind orders by date (ascending or descending)
- Filter by buyer identify
- Retrieve the client identify, nation, and order date with out accessing the total doc
Conclusion
MongoDB question efficiency isn’t nearly including indexes—it’s about aligning your information mannequin and queries with how your utility actually makes use of information. Turning a four-collection be a part of with tens of millions of lookups right into a single index scan that runs in milliseconds comes down to 3 guidelines:
-
Mannequin paperwork as aggregates in your area layer
Embed information that’s at all times accessed collectively. If it suits on the identical “piece of paper” in the actual world—like an order with buyer and product particulars—it belongs in a single MongoDB doc. This simplifies queries, reducestotalDocsExaminedand add extra indexing potentialities. -
Design indexes in your question entry patterns
Use compound indexes that serve your filters and pagination order (ESR guideline). This reducestotalKeysExaminedandtotalDocsExamined. -
Refine repositories to question solely the required fields
Venture solely the fields you want. Totally coated queries (all fields within the index) can obtaintotalDocsExamined: 0and return outcomes straight from the index — quick and resource-efficient.
You don’t want to use this to all queries—use these steps solely when essential. Extreme lookups and restricted indexing choices drove us to optimize the info mannequin. After that, the remaining want for selective filters and the absence of an acceptable index led to creating a brand new one. A masking projection wasn’t wanted right here as a result of we solely fetch one doc. It turns into helpful primarily when doc fetch time is a major a part of the response time, or once we wish to keep away from loading massive paperwork that enhance reminiscence stress.
