Monday, December 22, 2025

JSONB vs. BSON: Tracing PostgreSQL and MongoDB Wire Protocols


There may be an important distinction between MongoDB’s BSON and PostgreSQL’s JSONB. Each are binary JSON codecs, however they serve completely different roles. JSONB is only an inner storage format for JSON knowledge in PostgreSQL. BSON, alternatively, is MongoDB’s native knowledge format: it’s utilized by software drivers, over the community, in reminiscence, and on disk.



JSONB: PostgreSQL inner storage format

JSONB is a storage format, as outlined by the PostgreSQL documentation:

PostgreSQL presents two sorts for storing JSON knowledge: json and jsonb

PostgreSQL makes use of JSONB solely for inner storage, requiring the whole construction to be learn to entry a subject, as noticed in JSONB DeTOASTing (learn amplification).



BSON: MongoDB storage and trade format

BSON is used for storage and in addition as an trade format between the appliance and the database, as outlined within the BSON specification:

BSON might be com­pared to bin­ary inter­change for­mats, like Proto­col Buf­fers. BSON is extra “schema-less” than Proto­col Buf­fers

On the appliance facet, the MongoDB driver converts software objects to BSON, which helps extra knowledge sorts than JSON or JSONB, together with datetime and binary. This BSON is shipped and obtained over the community and saved and manipulated on the server as-is, with no additional parsing. Each the driving force and the database can effectively entry fields through the binary construction as a result of BSON contains metadata resembling subject size prefixes and express kind data, even for big or nested paperwork.



PostgreSQL protocol is JSON (textual content), not JSONB

As an example this, I’ve written a small Python program that inserts a doc right into a PostgreSQL desk with a JSONB column, and queries that desk to retrieve the doc:

from sqlalchemy import Column, Integer, create_engine
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import declarative_base, sessionmaker

Base = declarative_base()

class Merchandise(Base):
    __tablename__ = 'gadgets'
    id = Column(Integer, primary_key=True)
    knowledge = Column(JSONB)  # our JSONB column

# Hook up with Postgres
engine = create_engine('postgresql+psycopg2://', echo=True)
Session = sessionmaker(bind=engine)
session = Session()

# Create desk
Base.metadata.create_all(engine)

# Insert an object into JSONB column
obj = {"title": "widget", "value": 9.99, "tags": ["new", "sale"]}
session.add(Merchandise(knowledge=obj))
session.commit()

# Learn again the desk
for row in session.question(Merchandise).all():
    print(row.id, row.knowledge)

Enter fullscreen mode

Exit fullscreen mode

This system makes use of SQLAlchemy to ship and retrieve Python objects to and from PostgreSQL through the Psycopg2 driver. I’ve saved it in demo.py.

After I run this system, with python demo.py, earlier than it shows the ultimate outcome, it logs all SQL statements:

2025-12-21 12:50:22,484 INFO sqlalchemy.engine.Engine choose pg_catalog.model()
2025-12-21 12:50:22,485 INFO sqlalchemy.engine.Engine [raw sql] {}
2025-12-21 12:50:22,486 INFO sqlalchemy.engine.Engine choose current_schema()
2025-12-21 12:50:22,486 INFO sqlalchemy.engine.Engine [raw sql] {}
2025-12-21 12:50:22,486 INFO sqlalchemy.engine.Engine present standard_conforming_strings
2025-12-21 12:50:22,486 INFO sqlalchemy.engine.Engine [raw sql] {}
2025-12-21 12:50:22,487 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2025-12-21 12:50:22,488 INFO sqlalchemy.engine.Engine choose relname from pg_class c be part of pg_namespace n on n.oid=c.relnamespace the place pg_catalog.pg_table_is_visible(c.oid) and relname=%(title)s
2025-12-21 12:50:22,488 INFO sqlalchemy.engine.Engine [generated in 0.00015s] {'title': 'gadgets'}
2025-12-21 12:50:22,489 INFO sqlalchemy.engine.Engine
CREATE TABLE gadgets (
        id SERIAL NOT NULL,
        knowledge JSONB,
        PRIMARY KEY (id)
)


2025-12-21 12:50:22,489 INFO sqlalchemy.engine.Engine [no key 0.00011s] {}
2025-12-21 12:50:22,491 INFO sqlalchemy.engine.Engine COMMIT
2025-12-21 12:50:22,493 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2025-12-21 12:50:22,494 INFO sqlalchemy.engine.Engine INSERT INTO gadgets (knowledge) VALUES (%(knowledge)s) RETURNING gadgets.id
2025-12-21 12:50:22,494 INFO sqlalchemy.engine.Engine [generated in 0.00018s] {'knowledge': '{"title": "widget", "value": 9.99, "tags": ["new", "sale"]}'}
2025-12-21 12:50:22,495 INFO sqlalchemy.engine.Engine COMMIT
2025-12-21 12:50:22,497 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2025-12-21 12:50:22,498 INFO sqlalchemy.engine.Engine SELECT gadgets.id AS items_id, gadgets.knowledge AS items_data
FROM gadgets
2025-12-21 12:50:22,498 INFO sqlalchemy.engine.Engine [generated in 0.00013s] {}

1 {'title': 'widget', 'tags': ['new', 'sale'], 'value': 9.99}

Enter fullscreen mode

Exit fullscreen mode

To see what is shipped and obtained by means of the community by the PostgreSQL protocol, I run this system with strace, exhibiting the sendto and recv system calls with their arguments: strace -e hint=sendto,recvfrom -yy -s 1000 python demo.py.

Like most SQL database drivers, the protocol is fundamental: ship SQL instructions as textual content, and fetch a tabular outcome set. Within the PostgreSQL protocol’s messages, the primary letter is the message kind (Q for Easy Question Message, adopted by the size of the message, and the message in textual content, X to terminate the session, C for command completion standing, T abd D for the resultset).

Right here is the output, the traces beginning with the timestamp are the logs from SQL Alchemy, these beginning with sendto and recv are the community system calls with the message to the database, and the outcome from it

The place is the hint when inserting one doc:

2025-12-21 16:52:20,278 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2025-12-21 16:52:20,279 INFO sqlalchemy.engine.Engine INSERT INTO gadgets (knowledge) VALUES (%(knowledge)s) RETURNING gadgets.id
2025-12-21 16:52:20,279 INFO sqlalchemy.engine.Engine [generated in 0.00029s] {'knowledge': '{"title": "widget", "value": 9.99, "tags": ["new", "sale"]}'}

sendto(3<TCPv6:[[::1]:41858->[::1]:5432]>, "QnBEGIN", 11, MSG_NOSIGNAL, NULL, 0) = 11

recvfrom(3<TCPv6:[[::1]:41858->[::1]:5432]>, "CnBEGINZ5T", 16384, 0, NULL, NULL) = 17

sendto(3<TCPv6:[[::1]:41858->[::1]:5432]>, "QvINSERT INTO gadgets (knowledge) VALUES ('{"title": "widget", "value": 9.99, "tags": ["new", "sale"]}') RETURNING gadgets.id", 119, MSG_NOSIGNAL, NULL, 0) = 119

recvfrom(3<TCPv6:[[::1]:41858->[::1]:5432]>, "T331id@3101274377377377377Dv1011C17INSERT 0 1Z5T", 16384, 0, NULL, NULL) = 62
2025-12-21 16:52:20,281 INFO sqlalchemy.engine.Engine COMMIT

sendto(3<TCPv6:[[::1]:41858->[::1]:5432]>, "QvCOMMIT", 12, MSG_NOSIGNAL, NULL, 0) = 12

recvfrom(3<TCPv6:[[::1]:41858->[::1]:5432]>, "CvCOMMITZ5I", 16384, 0, NULL, NULL) = 18

Enter fullscreen mode

Exit fullscreen mode

It began a transaction (QnBEGIN), obtained command completion (CnBEGIN), then despatched the complete textual content of the INSERT command, together with the JSON payload (QvINSERT INTO gadgets (knowledge) VALUES ('{"title": "widget", "value": 9.99, "tags": ["new", "sale"]}). It subsequently obtained command completion (INSERT 0 1) and the returned ID (T331id, Dv101).

Right here is the hint once I question and fetch the doc:

2025-12-21 16:52:20,283 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2025-12-21 16:52:20,285 INFO sqlalchemy.engine.Engine SELECT gadgets.id AS items_id, gadgets.knowledge AS items_data
FROM gadgets
2025-12-21 16:52:20,285 INFO sqlalchemy.engine.Engine [generated in 0.00024s] {}

sendto(3<TCPv6:[[::1]:41858->[::1]:5432]>, "QnBEGIN", 11, MSG_NOSIGNAL, NULL, 0) = 11

recvfrom(3<TCPv6:[[::1]:41858->[::1]:5432]>, "CnBEGINZ5T", 16384, 0, NULL, NULL) = 17

sendto(3<TCPv6:[[::1]:41858->[::1]:5432]>, "QFSELECT gadgets.id AS items_id, gadgets.knowledge AS items_data nFROM gadgets", 71, MSG_NOSIGNAL, NULL, 0) = 71

recvfrom(3<TCPv6:[[::1]:41858->[::1]:5432]>, "T>2items_id@3101274377377377377items_data@310216332377377377377377377DI2011:{"title": "widget", "tags": ["new", "sale"], "value": 9.99}CrSELECT 1Z5T", 16384, 0, NULL, NULL) = 157
Enter fullscreen mode

Exit fullscreen mode

It began one other transaction, despatched the SELECT assertion as textual content and obtained the outcome as JSON textual content (DI2011:{"title": "widget", "tags": ["new", "sale"], "value": 9.99}).

Lastly, the transaction ends, and the sessionis disconnected:

sendto(3<TCPv6:[[::1]:41858->[::1]:5432]>, "QrROLLBACK", 14, MSG_NOSIGNAL, NULL, 0) = 14

recvfrom(3<TCPv6:[[::1]:41858->[::1]:5432]>, "CrROLLBACKZ5I", 16384, 0, NULL, NULL) = 20

sendto(3<TCPv6:[[::1]:41858->[::1]:5432]>, "X4", 5, MSG_NOSIGNAL, NULL, 0) = 5

Enter fullscreen mode

Exit fullscreen mode

If you wish to dig into the code, the server-side parsing is in jsonb_send and jsonb_recv (“The kind is shipped as textual content in binary mode”), and whereas it exams the model earlier than changing to textual content, there’s just one model. The client-side for Psycopg2 reveals that register_default_jsonb is similar as register_default_json



Evaluating with MongoDB (BSON from end-to-end)

To match with MongoDB, created the next demo-mongodb.py:

from pymongo import MongoClient
shopper = MongoClient("mongodb://127.0.0.1:27017")
db = shopper.my_database
insert_result = db.gadgets.insert_one({"title": "widget", "value": 9.99, "tags": ["new", "sale"]})
print("Inserted doc ID:", insert_result.inserted_id)
for doc in items_collection.discover():
    print(doc["_id"], doc)
Enter fullscreen mode

Exit fullscreen mode

I used the identical strace command, however displaying all characters as hexadecimal to have the ability to decode them with bsondump:

$ strace -e hint=sendto,recvfrom -xx -yy -s 1000 python demo-mongodb.py 2>&1
Enter fullscreen mode

Exit fullscreen mode

Right here is the community request for the insert:


sendto(5127.0.0.1:27017]>, "xd6x00x00x00x51xdcxb0x74x00x00x00x00xddx07x00x00x00x00x00x00x00x5ax00x00x00x02x69x6ex73x65x72x74x00x06x00x00x00x69x74x65x6dx73x00x08x6fx72x64x65x72x65x64x00x01x03x6cx73x69x64x00x1ex00x00x00x05x69x64x00x10x00x00x00x04x31xb8x9ax81xfdx35x42x1ax88x44xa8x69xe8xbax6fx30x00x02x24x64x62x00x0cx00x00x00x6dx79x5fx64x61x74x61x62x61x73x65x00x00x01x66x00x00x00x64x6fx63x75x6dx65x6ex74x73x00x58x00x00x00x07x5fx69x64x00x69x48x3fx7fx87x46xd5x2exe2x0bxbcx0bx02x6ex61x6dx65x00x07x00x00x00x77x69x64x67x65x74x00x01x70x72x69x63x65x00x7bx14xaex47xe1xfax23x40x04x74x61x67x73x00x1cx00x00x00x02x30x00x04x00x00x00x6ex65x77x00x02x31x00x05x00x00x00x73x61x6cx65x00x00x00", 214, 0, NULL, 0) = 214

recvfrom(5127.0.0.1:27017]>, "xsecondx00x00x00x06x00x00x00x51xdcxb0x74xddx07x00x00", 16, 0, NULL, NULL) = 16

recvfrom(5127.0.0.1:27017]>, "x00x00x00x00x00x18x00x00x00x10x6ex00x01x00x00x00x01x6fx6bx00x00x00x00x00x00x00xf0x3fx00", 29, 0, NULL, NULL) = 29

Inserted doc ID: 69483f7f8746d52ee20bbc0b
Enter fullscreen mode

Exit fullscreen mode

Right here is the fetch question that receives the doc:


sendto(5127.0.0.1:27017]>, "x70x00x00x00xffx5cx49x19x00x00x00x00xddx07x00x00x00x00x00x00x00x5bx00x00x00x02x66x69x6ex64x00x06x00x00x00x69x74x65x6dx73x00x03x66x69x6cx74x65x72x00x05x00x00x00x00x03x6cx73x69x64x00x1ex00x00x00x05x69x64x00x10x00x00x00x04x31xb8x9ax81xfdx35x42x1ax88x44xa8x69xe8xbax6fx30x00x02x24x64x62x00x0cx00x00x00x6dx79x5fx64x61x74x61x62x61x73x65x00x00", 112, 0, NULL, 0) = 112

recvfrom(5127.0.0.1:27017]>, "xc5x00x00x00x07x00x00x00xffx5cx49x19xddx07x00x00", 16, 0, NULL, NULL) = 16

recvfrom(5127.0.0.1:27017]>, "x00x00x00x00x00xb0x00x00x00x03x63x75x72x73x6fx72x00x97x00x00x00x04x66x69x72x73x74x42x61x74x63x68x00x60x00x00x00x0$x30x00x58x00x00x00x07x5fx69x64x00x69x48x3fx7fx87x46xd5x2exe2x0bxbcx0bx02x6ex61x6dx65x00x07x00x00x00x77x69x64x67x65x74x00x01x70x72x69x63x65x00x7bx14xaex47xe1xfax23x40x04x74x61x67x73x00x1cx00x00x00x02x30x00x04x00x00x00x6ex65x77x00x02x31x00x05x00x00x00x73x61x6cx65x00x00x00x00x12x69x64x00x00x00x00x00x00x00x00x00x02x6ex73x00x12x00x00x00x6dx79x5fx64x61x74x61x62x61x73x65x2ex69x74x65x6dx73x00x00x01x6fx6bx00x00x00x00x00x00x00xf0x3fx00", 181, 0, NULL, NULL) = 181

69483f7f8746d52ee20bbc0b {'_id': ObjectId('69483f7f8746d52ee20bbc0b'), 'title': 'widget', 'value': 9.99, 'tags': ['new', 'sale']}
Enter fullscreen mode

Exit fullscreen mode

I exploit bsondump, obtainable within the MongoDB container, to decode the messages.

Insert begins with a 20 bytes message header: Whole message measurement in little-endian = 0xd6 = 214 bytes xd6x00x00x00, requestID x51xdcxb0x74, responseTo (0 for client->server) x00x00x00x00, opCode = 2013 (OP_MSG) xddx07x00x00, x00x00x00x00 after which begins BSON:

root@9574ecd2d248:/# bsondump <(echo -ne 'x5ax00x00x00x02x69x6ex73x65x72x74x00x06x00x00x00x69x74x65x6dx73x00x08x6fx72x64x65x72x65x64x00x01x03x6cx73x69x64x00x1ex00x00x00x05x69x64x00x10x00x00x00x04x31xb8x9ax81xfdx35x42x1ax88x44xa8x69xe8xbax6fx30x00x02x24x64x62x00x0cx00x00x00x6dx79x5fx64x61x74x61x62x61x73x65x00x00x01x66x00x00x00x64x6fx63x75x6dx65x6ex74x73x00x58x00x00x00')
{  
  "insert": "gadgets",  
  "ordered": true,  
  "lsid": {  
    "id": {  
      "$binary": {  
        "base64": "Mbiagf01QhqIRKhp6LpvMA==",  
        "subType": "04"  
      }  
    }  
  },  
  "$db": "my_database"  
}  
2025-12-21T19:09:39.214+0000    1 objects discovered
2025-12-21T19:09:39.214+0000    sudden EOF
root@9574ecd2d248:/#
Enter fullscreen mode

Exit fullscreen mode

This reveals sudden EOF as a result of the “paperwork” array is definitely despatched within the subsequent part of the OP_MSG, not embedded right here. The second BSON part begins with its personal size subject (x58x00x00x00 = 88 bytes) and accommodates the precise doc to be inserted:

root@9574ecd2d248:/# bsondump <(echo -ne 'x58x00x00x00x07x5fx69x64x00x69x48x3fx7fx87x46xd5x2exe2x0bxbcx0bx02x6ex61x6dx65x00x07x00x00x00x77x69x64x67x65x74x00x01x70x72x69x63x65x00x9ax99x99x99x99x99x23x40x04x74x61x67x73x00x1cx00x00x00x02x30x00x04x00x00x00x6ex65x77x00x02x31x00x05x00x00x00x73x61x6cx65x00x00x00')
{  
  "_id": {  
    "$oid": "69483f7f8746d52ee20bbc0b"  
  },  
  "title": "widget",  
  "value": {  
    "$numberDouble": "9.8"  
  },  
  "tags": [  
    "new",  
    "sale"  
  ]  
}  
2025-12-21T19:09:49.278+0000    1 objects discovered
root@9574ecd2d248:/#
Enter fullscreen mode

Exit fullscreen mode

BSON holds the doc in a versatile binary format, together with all subject names, datatypes, and values, which is what’s exchanged between the appliance driver and the database server.

I can do the identical with the question

root@9574ecd2d248:/# bsondump <(echo -ne 'xb0x00x00x00x03x63x75x72x73x6fx72x00x97x00x00x00x04x66x69x72x73x74x42x61x74x63x68x00x60x00x00x00x03x30x00x58x00x00x00x07x5fx69x64x00x69x48x3fx7fx87x46xd5x2exe2x0bxbcx0bx02x6ex61x6dx65x00x07x00x00x00x77x69x64x67x65x74x00x01x70x72x69x63x65x00x7bx14xaex47xe1xfax23x40x04x74x61x67x73x00x1cx00x00x00x02x30x00x04x00x00x00x6ex65x77x00x02x31x00x05x00x00x00x73x61x6cx65x00x00x00x00x12x69x64x00x00x00x00x00x00x00x00x00x02x6ex73x00x12x00x00x00x6dx79x5fx64x61x74x61x62x61x73x65x2ex69x74x65x6dx73x00x00x01x6fx6bx00x00x00x00x00x00x00xf0x3fx00')
{  
  "cursor": {  
    "firstBatch": [  
      {  
        "_id": {  
          "$oid": "69483f7f8746d52ee20bbc0b"  
        },  
        "name": "widget",  
        "price": {  
          "$numberDouble": "9.99"  
        },  
        "tags": [  
          "new",  
          "sale"  
        ]  
      }  
    ],  
    "id": {  
      "$numberLong": "0"  
    },  
    "ns": "my_database.gadgets"  
  },  
  "okay": {  
    "$numberDouble": "1.0"  
  }  
}  
2025-12-21T18:44:08.110+0000    1 objects discovered
Enter fullscreen mode

Exit fullscreen mode

Once more, the doc is obtained in BSON format, which shops binary values with the proper datatypes.



Conclusion: no JSONB within the software

With PostgreSQL, the JSON textual content is seen within the community messages, even when it comes from a JSONB column:

In PostgreSQL, storing as TEXT, JSON, or JSONB impacts storage and indexing, however the wire protocol nonetheless sends and receives plain JSON textual content. Each question requires the shopper and server to parse and serialize it, including CPU overhead and risking a lack of kind constancy for big or advanced paperwork.

MongoDB makes use of BSON from finish to finish — in storage and on the wire. Drivers map BSON sorts on to software objects, preserving sorts like dates and binary fields with out additional parsing. This reduces CPU price on either side, improves scalability, and makes giant‑doc dealing with extra environment friendly.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles