Lakehouse A New Generation of Open Platforms that Unify DataWarehousing and Advanced Analytics

Posted on June 16, 2022   1 minute read ∼ Filed in  : 

Introduction

History of Datawarehouse

Schema

Schema-on-write: schema is well-defined when writing data into storage.

Schema-on-read: schema is defined when reading data for analysis.

First-generation

It coupled compute and storage into an on-premises appliance, and enterprises have to pay for it.

Datasets are becoming unstructured, video, audio, and texts. The data warehouse cannot store and query them.

Second generation

LakeHouse

The paper shows LakeHouse has the following advantages:

  1. Based on open direct-access data formats, like Parquet, ORC,
  2. First-class support for ML
  3. Offers state-of-art performance.

It can address several challenges: data staleness, reliability, the total cost of ownership, data lock-in, and limited use-case support.

The paper shows that LakeHouse is competitive with cloud data warehouses on TPC-DS





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.