distributed olap

Posted on August 12, 2021   1 minute read ∼ Filed in  : 

OLAP Features

  • Long running, read-only queries.
  • Complex joins
  • Exploratory queries.

Execution Models

Pull or Push

Push query to data.

In shard-nothing system. 这种系统下,可以push 也可以pull

image-20220418212054128

Pull data to query machine.

In shard-disk system, 共享磁盘系统通常要从shard disk pull data

image-20220418212312997

Fault Tolerance

Traditional OLAP dont have fail tolerate feature.

Query Planning

Break query Into partition-specific fragments based on physical information. (Most system.)

image-20220418213659854

Distributed Join Algorithms

Replica all tables from other nodes to one node is not efficient and lose the parallelism of distributed DBMS.

Replica small table to each node

image-20220418214037442

Each table is partitioned on joinKey.

image-20220418214142188

image-20220418214955498

Cloud System

Vendors provide database-as-a-service. DBaaS

Shard-noting and shard-disk 不明显了

DBMS直接部署

Cloud-Native DBMS

System is designed explicitly to run in a cloud environment,

Based on shard-disk architecutre.

SnowFlake, BigQuery. Etc

Data Formats

Apache Parquet

Apache ORC

Apache Arrow

image-20220418221552387





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.