Sunday, November 30, 2025

The Price of Not Figuring out MongoDB, Half 3: appV6R0 to appV6R4


Welcome to the third and ultimate a part of the collection “The Price of Not Figuring out MongoDB.” Constructing upon the foundational optimizations explored in Half 1 and Half 2, this text delves into superior MongoDB design patterns that may dramatically rework utility efficiency.

In Half 1, we improved utility efficiency by concatenating fields, altering information varieties, and shortening discipline names. In Half 2, we applied the Bucket Sample and Computed Sample and optimized the aggregation pipeline to attain even higher efficiency.

On this ultimate article, we tackle the points and enhancements recognized in appV5R4. Particularly, we concentrate on lowering the doc measurement in our utility to alleviate the disk throughput bottleneck on the MongoDB server. This discount will probably be achieved by adopting a dynamic schema and modifying the storage compression algorithm.

All the appliance variations and revisions from this text had been developed by a senior MongoDB developer, as they’re constructed on all of the earlier variations and make the most of the Dynamic Schema sample, which is not quite common to see.

Utility model 6 revision 0 (appV6R0): A dynamic month-to-month bucket doc

As talked about within the Points and Enhancements of appV5R4 from the earlier article, the first limitation of our MongoDB server is its disk throughput. To handle this, we have to cut back the dimensions of the paperwork being saved.

Take into account the next doc from appV5R3, which has offered one of the best efficiency to this point:

const doc = {
  _id: Buffer.from("...01202202"),
  gadgets: [
    { date: new Date("2022-06-05"), a: 10, n: 3 },
    { date: new Date("2022-06-16"), p: 1, r: 1 },
    { date: new Date("2022-06-27"), a: 5, r: 1 },
    { date: new Date("2022-06-29"), p: 1 },
  ],
};


The gadgets array on this doc incorporates solely 4 components, however on common, it’ll have round 10 components, and within the worst-case situation, it might have as much as 90 components. These components are the first contributors to the doc measurement, so they need to be the main target of our optimization efforts.

One commonality among the many components is the presence of the date discipline, with its worth together with the 12 months and month, for the earlier doc. By rethinking how this discipline and its worth might be saved, we are able to cut back storage necessities.

An unconventional answer we might use is:

  • Altering the gadgets discipline sort from an array to a doc.

  • Utilizing the date worth as the sphere identify within the gadgets doc.

  • Storing the standing totals as the worth for every date discipline.

Right here is the earlier doc represented utilizing the brand new schema concept:

const doc = {
  _id: Buffer.from("...01202202"),
  gadgets: {
    20220605: { a: 10, n: 3 },
    20220616: { p: 1, r: 1 },
    20220627: { a: 5, r: 1 },
    20220629: { p: 1 },
  },
};


Whereas this schema could not considerably cut back the doc measurement in comparison with appV5R3, we are able to additional optimize it by leveraging the truth that the 12 months is already embedded within the _id discipline. This eliminates the necessity to repeat the 12 months within the discipline names of the gadgets doc.

With this strategy, the gadgets doc adopts a Dynamic Schema, the place discipline names encode info and are usually not predefined.

To exhibit numerous implementation potentialities, we are going to revisit all of the bucketing standards used within the appV5RX implementations, beginning with appV5R0.

For appV6R0, which builds upon appV5R0 however makes use of a dynamic schema, information is bucketed by 12 months and month. The sphere names within the gadgets doc signify solely the day of the date, because the 12 months and month are already saved within the _id discipline.

An in depth clarification of the bucketing logic and features used to implement the present utility will be discovered within the appV5R0 introduction.

The next doc shops information for January 2022 (2022-01-XX), making use of the newly introduced concept:

const doc = {
  _id: Buffer.from("...01202201"),
  gadgets: {
    "05": { a: 10, n: 3 },
    16: { p: 1, r: 1 },
    27: { a: 5, r: 1 },
    29: { p: 1 },
  },
};


Schema

The appliance implementation introduced above would have the next TypeScript doc schema denominated SchemaV6R0:

export sort SchemaV6R0 = {&NewLine;  _id: Buffer;&NewLine;  gadgets: Document<&NewLine;    string,&NewLine;    {&NewLine;      a?: quantity;&NewLine;      n?: quantity;&NewLine;      p?: quantity;&NewLine;      r?: quantity;&NewLine;    }&NewLine;  >;&NewLine;};&NewLine;

Bulk upsert

Based mostly on the specification introduced, now we have the next updateOne operation for every occasion generated by this utility model:

const DD = getDD(occasion.date); // Extract the `day` from the `occasion.date`&NewLine;&NewLine;const operation = {&NewLine;  updateOne: {&NewLine;    filter: { _id: buildId(occasion.key, occasion.date) }, // key + 12 months + month&NewLine;    replace: {&NewLine;      $inc: {&NewLine;        [`items.${DD}.a`]: occasion.accepted,&NewLine;        [`items.${DD}.n`]: occasion.noFunds,&NewLine;        [`items.${DD}.p`]: occasion.pending,&NewLine;        [`items.${DD}.r`]: occasion.rejected,&NewLine;      },&NewLine;    },&NewLine;    upsert: true,&NewLine;  },&NewLine;};&NewLine;

filter:

  • Goal the doc the place the _id discipline matches the concatenated worth of key, 12 months, and month.

  • The buildId perform converts the important thing+12 months+month right into a binary format.

replace:

  • Makes use of the $inc operator to increment the fields similar to the identical DD because the occasion by the standing values offered.

  • If a discipline doesn’t exist within the gadgets doc and the occasion gives a price for it, $inc treats the non-existent discipline as having a price of 0 and performs the operation.

  • If a discipline exists within the gadgets doc however the occasion doesn’t present a price for it (i.e., undefined), $inc treats it as 0 and performs the operation.

upsert:

  • Ensures a brand new doc is created if no matching doc exists.

Get stories

To satisfy the Get Studies operation, 5 aggregation pipelines are required, one for every date interval. Every pipeline follows the identical construction, differing solely within the filtering standards within the $match stage:

const pipeline = [&NewLine;  { $match: docsFromKeyBetweenDate },&NewLine;  { $addFields: buildTotalsField },&NewLine;  { $group: groupSumTotals },&NewLine;  { $project: { _id: 0 } },&NewLine;];&NewLine;

The whole code for this aggregation pipeline is sort of difficult. Due to that, we can have only a pseudocode for it right here.

1: { $match: docsFromKeyBetweenDate }

  • Vary-filters paperwork by _id to retrieve solely buckets inside the report date vary. It has the identical logic as appV5R0.

2: { $addFields: buildTotalsField }

  • The logic is much like the one used within the Get Studies of appV5R3.

  • The $objectToArray operator is used to transform the gadgets doc into an array, enabling a $cut back operation.

  • Filtering the gadgets fields inside the report’s vary includes extracting the 12 months and month from the _id discipline and the day from the sphere names within the gadgets doc.

  • The next JavaScript code is logic equal to the true aggregation pipeline code.

// Equal JavaScript logic:&NewLine;const [MM] = _id.slice(-2).toString(); // Get month from _id&NewLine;const [YYYY] = _id.slice(-6, -2).toString(); // Get 12 months from _id&NewLine;const items_array = Object.entries(gadgets); // Convert the item to an array of [key, value]&NewLine;&NewLine;const totals = items_array.cut back(&NewLine;  (accumulator, [DD, status]) => {&NewLine;    let statusDate = new Date(`${YYYY}-${MM}-${DD}`);&NewLine;&NewLine;    if (statusDate >= reportStartDate && statusDate < reportEndDate)  0;&NewLine;    &NewLine;&NewLine;    return accumulator;&NewLine;  },&NewLine;  { a: 0, n: 0, p: 0, r: 0 }&NewLine;);&NewLine;

3: { $group: groupCountTotals }

  • Group the totals of every doc within the pipeline into ultimate standing totals utilizing $sum operations.

4: { $undertaking: { _id: 0 } }

  • Format the ensuing doc to have the stories format.

Indexes

No extra indexes are required, sustaining the only _id index strategy established within the appV4 implementation.

Preliminary situation statistics

Assortment statistics

To guage the efficiency of appV6R0, we inserted 500 million occasion paperwork into the gathering utilizing the schema and Bulk Upsert perform described earlier. For comparability, the tables under additionally embody statistics from earlier comparable utility variations:

Assortment Paperwork Knowledge Dimension Doc Dimension Storage Dimension Indexes Index Dimension
appV5R0 95,350,431 19.19GB 217B 5.06GB 1 2.95GB
appV5R3 33,429,492 11.96GB 385B 3.24GB 1 1.11GB
appV6R0 95,350,319 11.1GB 125B 3.33GB 1 3.13GB

Occasion statistics

To guage the storage effectivity per occasion, the Occasion Statistics are calculated by dividing the whole information measurement and index measurement by the five hundred million occasions.

Assortment Knowledge Dimension/Occasions Index Dimension/Occasions Complete Dimension/Occasions
appV5R0 41.2B 6.3B 47.5B
appV5R3 25.7B 2.4B 28.1B
appV6R0 23.8B 6.7B 30.5B

It’s difficult to make a direct comparability between appV6R0 and appV5R0 from a storage perspective. The appV5R0 implementation is the best bucketing attainable, the place occasion paperwork had been merely appended to the gadgets array with out bucketing by day, as is finished in appV6R0.

Nonetheless, we are able to try a comparability between appV6R0 and appV5R3, one of the best answer to this point. In appV6R0, information is bucketed by month, whereas in appV5R3, it’s bucketed by quarter. Assuming doc measurement scales linearly with the bucketing standards (although this isn’t fully correct), the appV6R0 doc could be roughly 3 * 125 = 375 bytes, which is 9.4% smaller than appV5R3.

One other indicator of enchancment is the Knowledge Dimension/Occasions metric within the Occasion Statistics desk. For appV6R0, every occasion makes use of a median of 23.8 bytes, in comparison with 27.7 bytes for appV5R3, representing a 14.1% discount in measurement.

Load check outcomes

Executing the load check for appV6R0 and plotting it alongside the outcomes for appV5R0 and Desired charges, now we have the next outcomes for Get Studies and Bulk Upsert.

Get Studies charges

The 2 variations exhibit very related fee efficiency, with appV6R0 exhibiting slight superiority within the second and third quarters, whereas appV5R0 is superior within the first and fourth quarters.

Determine 1. Graph exhibiting the charges of appV5R0 and appV6R0 when executing the load check for Get Studies performance. Each have related efficiency, however with out reaching the specified charges.

Get Studies latency

The 2 variations exhibit very related latency efficiency, with appV6R0 exhibiting slight benefits within the second and third quarters, whereas appV5R0 is superior within the first and fourth quarters.

Determine 2. Graph exhibiting the latency of appV5R0 and appV6R0 when executing the load check for Get Studies performance. appV5R0 has decrease latency than appV6R0.

Graph showing the latency of appV5R0 and appV6R0 when executing the load test for Get Reports functionality. appV5R0 has lower latency than appV6R0.

Bulk Upsert charges

Each variations have related fee values, however it may be seen that appV6R0 has a small edge in comparison with appV5R0.

Determine 3. Graph exhibiting the charges of appV5R0 and appV6R0 when executing the load check for Bulk Upsert performance. appV6R0 has higher charges than appV5R0, however with out reaching the specified charges.

Graph showing the rates of appV5R0 and appV6R0 when executing the load test for Bulk Upsert functionality. appV6R0 has better rates than appV5R0, but without reaching the desired rates.

Bulk Upsert latency

Though each variations have related latency values for the primary quarter of the check, for the ultimate three-quarters, appV6R0 has a transparent benefit over appV5R0.

Determine 4. Graph exhibiting the latency of appV5R0 and appV6R0 when executing the load check for Bulk Upsert performance. appV6R0 has decrease latency than appV5R0.

Graph showing the latency of appV5R0 and appV6R0 when executing the load test for Bulk Upsert functionality. appV6R0 has lower latency than appV5R0

Efficiency abstract

Regardless of the numerous discount in doc and storage measurement achieved by appV6R0, the efficiency enchancment was not as substantial as anticipated. This means that the bottleneck within the utility when bucketing information by month is probably not associated to disk throughput.

Inspecting the gathering stats desk reveals that the index measurement for each variations is near 3GB. That is close to the 4GB of obtainable reminiscence on the machine operating the database and exceeds the 1.5GB allotted by WiredTiger for cache. Due to this fact, it’s seemingly that the limiting issue on this case is reminiscence/cache relatively than doc measurement, which explains the dearth of a big efficiency enchancment.

Points and enhancements

To handle the restrictions noticed in appV6R0, we suggest adopting the identical line of enhancements utilized from appV5R0 to appV5R1. Particularly, we are going to bucket the occasions by quarter in appV6R1. This strategy not solely follows the established sample of enhancements but additionally aligns with the necessity to optimize efficiency additional.

As highlighted within the Load Take a look at Outcomes, the present bottleneck lies within the measurement of the index relative to the out there cache/reminiscence. By growing the bucketing interval from month to quarter, we are able to cut back the variety of paperwork by roughly an element of three. This discount will, in flip, lower the variety of index entries by the identical issue, resulting in a smaller index measurement.

Utility model 6 revision 1 (appV6R1): A dynamic quarter bucket doc

As mentioned within the earlier Points and Enhancements part, the first bottleneck in appV6R0 was the index measurement nearing the reminiscence capability of the machine operating MongoDB. To mitigate this subject, we suggest growing the bucketing interval from a month to 1 / 4 for appV6R1, following the strategy utilized in appV5R1.

This adjustment goals to cut back the variety of paperwork and index entries by roughly an element of three, thereby reducing the general index measurement. By adopting a quarter-based bucketing technique, we align with the established sample of enhancements utilized in appV5R1 variations whereas addressing the particular reminiscence/cache constraints recognized in appV6R0.

The implementation of appV6R1 retains a lot of the code from appV6R0, with the next key variations:

  • The _id discipline will now be composed of key+12 months+quarter.

  • The sphere names within the gadgets doc will encode each month and day, as this info is critical for filtering date ranges within the Get Studies operation.

The next instance demonstrates how information for June 2022 (2022-06-XX), inside the second quarter (Q2), is saved utilizing the brand new schema:

const doc = {&NewLine;  _id: Buffer.from("...01202202"),&NewLine;  gadgets: {&NewLine;    "0605": { a: 10, n: 3 },&NewLine;    "0616": { p: 1, r: 1 },&NewLine;    "0627": { a: 5, r: 1 },&NewLine;    "0629": { p: 1 },&NewLine;  },&NewLine;};&NewLine;

Schema

The appliance implementation introduced above would have the next TypeScript doc schema denominated SchemaV6R0:

export sort SchemaV6R0 = {&NewLine;  _id: Buffer;&NewLine;  gadgets: Document<&NewLine;    string,&NewLine;    {&NewLine;      a?: quantity;&NewLine;      n?: quantity;&NewLine;      p?: quantity;&NewLine;      r?: quantity;&NewLine;    }&NewLine;  >;&NewLine;};&NewLine;

Bulk upsert

Based mostly on the specification introduced, now we have the next updateOne operation for every occasion generated by this utility model:

const MMDD = getMMDD(occasion.date); // Extract the month (MM) and day(DD) from the `occasion.date`&NewLine;&NewLine;const operation = {&NewLine;  updateOne: {&NewLine;    filter: { _id: buildId(occasion.key, occasion.date) }, // key + 12 months + quarter&NewLine;    replace: {&NewLine;      $inc: {&NewLine;        [`items.${MMDD}.a`]: occasion.accepted,&NewLine;        [`items.${MMDD}.n`]: occasion.noFunds,&NewLine;        [`items.${MMDD}.p`]: occasion.pending,&NewLine;        [`items.${MMDD}.r`]: occasion.rejected,&NewLine;      },&NewLine;    },&NewLine;    upsert: true,&NewLine;  },&NewLine;};&NewLine;

This updateOne operation has the same logic to the one in appV6R0, with the one variations being the filter and replace standards.

filter:

  • Goal the doc the place the _id discipline matches the concatenated worth of key, 12 months, and quarter.

  • The buildId perform converts the important thing+12 months+quarter right into a binary format.

replace:

  • Makes use of the $inc operator to increment the fields similar to the identical MMDD because the occasion by the standing values offered.

Get stories

To satisfy the Get Studies operation, 5 aggregation pipelines are required, one for every date interval. Every pipeline follows the identical construction, differing solely within the filtering standards within the $match stage:

const pipeline = [&NewLine;  { $match: docsFromKeyBetweenDate },&NewLine;  { $addFields: buildTotalsField },&NewLine;  { $group: groupSumTotals },&NewLine;  { $project: { _id: 0 } },&NewLine;];&NewLine;

This aggregation operation has the same logic to the one in appV6R0, with the one variations being the implementation within the $addFields stage.

{ $addFields: itemsReduceAccumulator }:

  • An analogous implementation to the one in appV6R0

  • The distinction depends on extracting the worth of 12 months (YYYY) from the _id discipline and the month and day (MMDD) from the sphere identify.

  • The next JavaScript code is logic equal to the true aggregation pipeline code.

const [YYYY] = _id.slice(-6, -2).toString(); // Get 12 months from _id&NewLine;const items_array = Object.entries(gadgets); // Convert the item to an array of [key, value]&NewLine;&NewLine;const totals = items_array.cut back(&NewLine;  (accumulator, [MMDD, status]) => {&NewLine;    let [MM, DD] = [MMDD.slice(0, 2), MMDD.slice(2, 4)];&NewLine;    let statusDate = new Date(`${YYYY}-${MM}-${DD}`);&NewLine;&NewLine;    if (statusDate >= reportStartDate && statusDate < reportEndDate)  0;&NewLine;    &NewLine;&NewLine;    return accumulator;&NewLine;  },&NewLine;  { a: 0, n: 0, p: 0, r: 0 }&NewLine;);&NewLine;

Indexes

No extra indexes are required, sustaining the only _id index strategy established within the appV4 implementation.

Preliminary situation statistics

Assortment statistics

To guage the efficiency of appV6R1, we inserted 500 million occasion paperwork into the gathering utilizing the schema and Bulk Upsert perform described earlier. For comparability, the tables under additionally embody statistics from earlier comparable utility variations:

Assortment Paperwork Knowledge Dimension Doc Dimension Storage Dimension Indexes Index Dimension
appV5R3 33,429,492 11.96GB 385B 3.24GB 1 1.11GB
appV6R0 95,350,319 11.1GB 125B 3.33GB 1 3.13GB
appV6R1 33,429,366 8.19GB 264B 2.34GB 1 1.22GB

Occasion statistics

To guage the storage effectivity per occasion, the Occasion Statistics are calculated by dividing the whole information measurement and index measurement by the five hundred million occasions.

Assortment Knowledge Dimension/Occasions Index Dimension/Occasions Complete Dimension/Occasions
appV5R3 25.7B 2.4B 28.1B
appV6R0 23.8B 6.7B 30.5B
appV6R1 17.6B 2.6B 20.2B

Within the earlier Preliminary Situation Statistics evaluation, we assumed that doc measurement would scale linearly with the bucketing vary. Nonetheless, this assumption proved inaccurate. The typical doc measurement in appV6R1 is roughly twice as giant as in appV6R0, regardless that it shops 3 times extra information. Already a win for this new implementation.

Since appV6R1 buckets information by quarter on the doc degree and by day inside the gadgets sub-document, a good comparability could be with appV5R3, the best-performing model to this point. From the tables above, we observe a big enchancment in Doc Dimension and consequently Knowledge Dimension when transitioning from appV5R3 to appV6R1. Particularly, there was a 31.4% discount in Doc Dimension. From an index measurement perspective, there was no change, as each variations bucket occasions by quarter.

Load check outcomes

Executing the load check for appV6R0 and plotting it alongside the outcomes for appV5R0 and Desired charges, now we have the next outcomes for Get Studies and Bulk Upsert.

Get Studies charges

For the primary three-quarters of the check, each variations have related fee values, however, for the ultimate quarter, appV6R1 has a notable edge over appV5R3.

Determine 5. Graph exhibiting the charges of appV5R3 and appV6R1 when executing the load check for Get Studies performance. appV5R3 has higher charges than appV6R1, however with out reaching the specified charges.

Graph showing the rates of appV5R3 and appV6R1 when executing the load test for Get Reports functionality. appV5R3 has better rates than appV6R1, but without reaching the desired rates.

Get Studies latency

The 2 variations exhibit very related latency efficiency, with appV6R0 exhibiting slight benefits within the second and third quarters, whereas appV5R0 is superior within the first and fourth quarters.

Determine 6. Graph exhibiting the latency of appV5R0 and appV6R0 when executing the load check for Get Studies performance. appV5R0 has decrease latency than appV6R0.

Graph showing the latency of appV5R0 and appV6R0 when executing the load test for Get Reports functionality. appV5R0 has lower latency than appV6R0.

Bulk Upsert charges

Each variations have related fee values, however it may be seen that appV6R0 has a small edge in comparison with appV5R0.

Determine 7. Graph exhibiting the charges of appV5R0 and appV6R0 when executing the load check for Bulk Upsert performance. appV6R0 has higher charges than appV5R0, however with out reaching the specified charges.

Graph showing the rates of appV5R0 and appV6R0 when executing the load test for Bulk Upsert functionality. appV6R0 has better rates than appV5R0, but without reaching the desired rates.

Bulk Upsert latency

Though each variations have related latency values for the primary quarter of the check, for the ultimate three-quarters, appV6R0 has a transparent benefit over appV5R0.

Determine 8. Graph exhibiting the latency of appV5R0 and appV6R0 when executing the load check for Bulk Upsert performance. appV6R0 has decrease latency than appV5R0.

Graph showing the latency of appV5R0 and appV6R0 when executing the load test for Bulk Upsert functionality. appV6R0 has lower latency than appV5R0.

Efficiency abstract

Regardless of the numerous discount in doc and storage measurement achieved by appV6R0, the efficiency enchancment was not as substantial as anticipated. This means that the bottleneck within the utility when bucketing information by month is probably not associated to disk throughput.

Inspecting the gathering stats desk reveals that the index measurement for each variations is near 3GB. That is close to the 4GB of obtainable reminiscence on the machine operating the database and exceeds the 1.5GB allotted by WiredTiger for cache. Due to this fact, it’s seemingly that the limiting issue on this case is reminiscence/cache relatively than doc measurement, which explains the dearth of a big efficiency enchancment.

Points and enhancements

To handle the restrictions noticed in appV6R0, we suggest adopting the identical line of enhancements utilized from appV5R0 to appV5R1. Particularly, we are going to bucket the occasions by quarter in appV6R1. This strategy not solely follows the established sample of enhancements but additionally aligns with the necessity to optimize efficiency additional.

As highlighted within the Load Take a look at Outcomes, the present bottleneck lies within the measurement of the index relative to the out there cache/reminiscence. By growing the bucketing interval from month to quarter, we are able to cut back the variety of paperwork by roughly an element of three. This discount will, in flip, lower the variety of index entries by the identical issue, resulting in a smaller index measurement.

Utility model 6 revision 1 (appV6R1): A dynamic quarter bucket doc

As mentioned within the earlier Points and Enhancements part, the first bottleneck in appV6R0 was the index measurement nearing the reminiscence capability of the machine operating MongoDB. To mitigate this subject, we suggest growing the bucketing interval from a month to 1 / 4 for appV6R1, following the strategy utilized in appV5R1.

This adjustment goals to cut back the variety of paperwork and index entries by roughly an element of three, thereby reducing the general index measurement. By adopting a quarter-based bucketing technique, we align with the established sample of enhancements utilized in appV5R1 variations whereas addressing the particular reminiscence/cache constraints recognized in appV6R0.

The implementation of appV6R1 retains a lot of the code from appV6R0, with the next key variations:

  • The _id discipline will now be composed of key+12 months+quarter.

  • The sphere names within the gadgets doc will encode each month and day, as this info is critical for filtering date ranges within the Get Studies operation.

The next instance demonstrates how information for June 2022 (2022-06-XX), inside the second quarter (Q2), is saved utilizing the brand new schema:

const doc = {&NewLine;  _id: Buffer.from("...01202202"),&NewLine;  gadgets: {&NewLine;    "0605": { a: 10, n: 3 },&NewLine;    "0616": { p: 1, r: 1 },&NewLine;    "0627": { a: 5, r: 1 },&NewLine;    "0629": { p: 1 },&NewLine;  },&NewLine;};&NewLine;

Schema

The appliance implementation introduced above would have the next TypeScript doc schema denominated SchemaV6R0:

export sort SchemaV6R0 = {&NewLine;  _id: Buffer;&NewLine;  gadgets: Document<&NewLine;    string,&NewLine;    {&NewLine;      a?: quantity;&NewLine;      n?: quantity;&NewLine;      p?: quantity;&NewLine;      r?: quantity;&NewLine;    }&NewLine;  >;&NewLine;};&NewLine;

Bulk upsert

Based mostly on the specification introduced, now we have the next updateOne operation for every occasion generated by this utility model:

const MMDD = getMMDD(occasion.date); // Extract the month (MM) and day(DD) from the `occasion.date`&NewLine;&NewLine;const operation = {&NewLine;  updateOne: {&NewLine;    filter: { _id: buildId(occasion.key, occasion.date) }, // key + 12 months + quarter&NewLine;    replace: {&NewLine;      $inc: {&NewLine;        [`items.${MMDD}.a`]: occasion.accepted,&NewLine;        [`items.${MMDD}.n`]: occasion.noFunds,&NewLine;        [`items.${MMDD}.p`]: occasion.pending,&NewLine;        [`items.${MMDD}.r`]: occasion.rejected,&NewLine;      },&NewLine;    },&NewLine;    upsert: true,&NewLine;  },&NewLine;};&NewLine;

This updateOne operation has the same logic to the one in appV6R0, with the one variations being the filter and replace standards.

filter:

  • Goal the doc the place the _id discipline matches the concatenated worth of key, 12 months, and quarter.

  • The buildId perform converts the important thing+12 months+quarter right into a binary format.

replace:

  • Makes use of the $inc operator to increment the fields similar to the identical MMDD because the occasion by the standing values offered.

Get stories

To satisfy the Get Studies operation, 5 aggregation pipelines are required, one for every date interval. Every pipeline follows the identical construction, differing solely within the filtering standards within the $match stage:

const pipeline = [&NewLine;  { $match: docsFromKeyBetweenDate },&NewLine;  { $addFields: buildTotalsField },&NewLine;  { $group: groupSumTotals },&NewLine;  { $project: { _id: 0 } },&NewLine;];&NewLine;

This aggregation operation has the same logic to the one in appV6R0, with the one variations being the implementation within the $addFields stage.

{ $addFields: itemsReduceAccumulator }:

  • An analogous implementation to the one in appV6R0

  • The distinction depends on extracting the worth of 12 months (YYYY) from the _id discipline and the month and day (MMDD) from the sphere identify.

  • The next JavaScript code is logic equal to the true aggregation pipeline code.

const [YYYY] = _id.slice(-6, -2).toString(); // Get 12 months from _id&NewLine;const items_array = Object.entries(gadgets); // Convert the item to an array of [key, value]&NewLine;&NewLine;const totals = items_array.cut back(&NewLine;  (accumulator, [MMDD, status]) => {&NewLine;    let [MM, DD] = [MMDD.slice(0, 2), MMDD.slice(2, 4)];&NewLine;    let statusDate = new Date(`${YYYY}-${MM}-${DD}`);&NewLine;&NewLine;    if (statusDate >= reportStartDate && statusDate < reportEndDate)  0;&NewLine;    &NewLine;&NewLine;    return accumulator;&NewLine;  },&NewLine;  { a: 0, n: 0, p: 0, r: 0 }&NewLine;);&NewLine;

Indexes

No extra indexes are required, sustaining the only _id index strategy established within the appV4 implementation.

Preliminary situation statistics

Assortment statistics

To guage the efficiency of appV6R1, we inserted 500 million occasion paperwork into the gathering utilizing the schema and Bulk Upsert perform described earlier. For comparability, the tables under additionally embody statistics from earlier comparable utility variations:

Assortment Paperwork Knowledge Dimension Doc Dimension Storage Dimension Indexes Index Dimension
appV5R3 33,429,492 11.96GB 11.96GB 3.24GB 1 1.11GB
appV6R1 33,429,366 8.19GB 264B 2.34GB 1 1.22GB
appV6R2 33,429,207 9.11GB 293B 2.8GB 1 1.26GB

Occasion statistics

To guage the storage effectivity per occasion, the Occasion Statistics are calculated by dividing the whole information measurement and index measurement by the five hundred million occasions.

Assortment Knowledge Dimension/Occasions Index Dimension/Occasions Complete Dimension/Occasions
appV5R3 25.7B 2.4B 28.1B
appV6R1 17.6B 2.6B 20.2B
appV6R2 19.6B 2.7B 22.3B

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles