Lakehouse A New Generation of Open Platforms that Unify DataWarehousing and Advanced Analytics

Posted on June 16, 2022   1 minute read ∼ Filed in  : 


History of Datawarehouse


Schema-on-write: schema is well-defined when writing data into storage.

Schema-on-read: schema is defined when reading data for analysis.


It coupled compute and storage into an on-premises appliance, and enterprises have to pay for it.

Datasets are becoming unstructured, video, audio, and texts. The data warehouse cannot store and query them.

Second generation


The paper shows LakeHouse has the following advantages:

  1. Based on open direct-access data formats, like Parquet, ORC,
  2. First-class support for ML
  3. Offers state-of-art performance.

It can address several challenges: data staleness, reliability, the total cost of ownership, data lock-in, and limited use-case support.

The paper shows that LakeHouse is competitive with cloud data warehouses on TPC-DS


Tags Cloud

Categories Cloud

It's the niceties that make the difference fate gives us the hand, and we play the cards.