Wednesday, February 4, 2026

Databases, Knowledge Lakes, And Encryption


The Evolution of Object Storage

Let’s begin by stating one thing actually apparent; object storage has turn out to be the preeminent storage system on the planet at present. Initially created to fulfill a must retailer massive quantities of occasionally accessed knowledge, it has since grown to the purpose of changing into the dominant archival medium for unstructured content material. Its significance is such that many conventional knowledge programs are being eclipsed by API pushed object storage in cloud-native environments.

Consequently, an unintended improvement is that individuals have come to understand some great benefits of not solely storing however, extra importantly, processing the info saved in object storage containers. And whereas it will probably appear extraordinarily advantageous over time reaching profitability turns into extra advanced, laborious and costly because the dataset grows. This hodgepodge of knowledge is named a knowledge swamp. The danger is that this swamp of knowledge can  turn out to be so disorganized that the price of processing it exceeds any potential financial profit. In a phrase it turns into unusable.

What to do? The answer? Knowledge Lakes!

Opposite to the info swamp, which is a disorganized repository of remoted knowledge sources, the Knowledge Lake is a centralized storage repository able to archiving extraordinarily massive datasets dealing with each structured and unstructured knowledge coming from varied DBMS.

Whereas Knowledge Lakes are an enormous step ahead, it however requires in depth planning not just for its storage however extra importantly accessing and processing the info.

Enter the Knowledge Lakehouse: The info lakehouse leverages the low price and versatile storage of a knowledge lake and strives to attain the identical knowledge administration capabilities and options of a conventional knowledge warehouse. By including a metadata/transactional layer to the thing storage container, not solely does the Knowledge Lakehouse method the equal ACID compliance of an RDBMS nevertheless it has the added bonus allowing a number of varieties of database engines that may work together with it.

What’s Object Storage Containers acquired to do with PostgreSQL? 

I’m glad you requested!

As you might know, the PostgreSQL extension framework permits one so as to add new options, knowledge sorts, features, operators and so forth to PostgreSQL as self-contained modules. And within the realm of analytics there may be one extension specifically that represents a outstanding paradigm of producing extremely prized knowledge insights gleaned from very massive datasets i.e. pg_lake.

Leveraging the pg_lake extension makes it potential to control uncooked knowledge from any supply, whether or not it’s structured or unstructured, by transparently accessing the info in a fashion that may be learn after which be processed in a postgres database.

The problem is what to do with the ensuing insights which in itself turns into a really helpful commodity. Whereas many cloud suppliers do state they implement encryption at relaxation, it doesn’t change the truth that its final safety is out of your palms. And this leap of religion you might not need to take. A lot in order that the chance of knowledge corruption, not to mention its theft, can have an effect on the corporate’s backside line.

Now let’s pivot and speak about safety.

These insights, as soon as generated, could be saved and secured in postgres in an encrypted at relaxation state by the advantage of pg_tde.

Wouldn’t it’s good if we may retailer the encrypted knowledge on postgres to object storage instantly?

What to do?

Summarizing;

  • pg_tde: Is the open-source PostgreSQL extension that gives Clear Knowledge Encryption (TDE), securing delicate knowledge at relaxation by mechanically encrypting database information on disk and defending towards unauthorized entry to its storage.
  • pg_lake: Is a set of extensions permitting PostgreSQL to combine instantly with knowledge lakehouses. It allows customers to entry and question knowledge saved in open codecs comparable to Apache Iceberg, Parquet, CSV, and JSON in cloud primarily based storage.

What you are able to do:

  • pg_tde:
    • Clear Encryption: Encrypts knowledge on disk (at relaxation) on the storage degree
    • Granular Management: Encrypt particular tables
    • Customizable Key Administration
    • WAL Encryption could be carried out
    • Replication Assist: logical replication can be utilized
    • Multi-Tenancy
      • the next WAL CLI are at the moment supported
        • pg_tde_basebackup
        • pg_tde_resetwal
        • pg_tde_rewind
        • pg_tde_waldump
        • pgBackRest (partial assist; check with part “what you’ll be able to’t do” under)
  • pg_lake:
    • Can carry out DML operations towards “lake home” structured knowledge
    • Can learn knowledge from a knowledge lake
    • Can learn a number of file codecs from the “knowledge lake” i.e. Apache Iceberg/Delta Lake/Parquet
    • Allows performing analytics on extraordinarily massive datasets with exterior Iceberg and file-backed tables in a single question

 

What can’t you do:

  • pg_tde:
    • System tables, catalogs, are at the moment not encrypted
    • Non permanent information aren’t absolutely encrypted
    • The next WAL CLI are at the moment unsupported
      • pg_createsubscriber
      • pg_receivewal
      • Barman
      • pg_verifybackup by default fails, due to this fact use -s (skip checksum) and -n (–no-parse-wal)
      • pgBackRest: the asynchronous archiving function
    • Can’t learn/write knowledge instantly from cloud storage providers
    • Encrypt knowledge totally different than PostgreSQL relations saved on its storage
  • pg_lake:
    • Requires a CLI interfacing with DuckDB which in flip interacts with the goal object storage service comparable to for instance “S3”
    • Superb-Grained Entry Management shouldn’t be at the moment out there
    • Iceberg v2 protocol is partial:
      • not all knowledge sorts are absolutely supported
      • not all administrative/housekeeping operations on the info lake is absolutely supported
    • Clear Knowledge Encryption shouldn’t be potential besides inside the object storage’s function set itself if it’s out there.

Whereas there’s plenty of curiosity in creating this dynamic duo, the problem is determining the place to start out.

For instance, a simple kludge is to archive encrypted WALs on to S3 suitable providers by leveraging the postgres runtime parameter “archive_command” and utilizing the suitable CLI storing it to S3 as an example.

Alternatively, as a primary step, allow the info that’s encrypted at relaxation to be each written and browse by pg_lake. 

However as a remaining answer; we may have a look at a overseas knowledge wrapper implementation of pg_tde. This FDW may carry out operations much like pg_lake however with the added functionality of performing client-side encryption that may write and browse postgres pg_tde encrypted knowledge on suitable cloud storage providers.

Inform us what you suppose.

For these desirous to check out these extensions, right here’s how one can set up every part into your personal private improvement atmosphere on Linux, you shouldn’t want something greater than 50GB HDD area to obtain and compile and set up.

Obtain and compile the next:

  • pg_tde: The Percona PostgreSQL supply code which has pg_tde builtin
  • pg_lake: The supply code construct consumes plenty of CPU, and area
  • DuckDB: A quick, open-source, in-process analytical database administration system (DBMS)
  • MinIO: An open supply S3 suitable, object storage system, designed for cloud-native environments

Whereas the aforementioned hyperlinks do present sources and directions you’ll want to notice just a few “further” particulars that aren’t readily understood:

Relating to downloading Percona pg_tde: Use these instructions downloading the supply

 

Relating to pg_lake: There’s a mistake within the supply code, use the next to repair it

 

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles