distributed olap
1 minute read ∼ Filed in : CMU databaseOLAP Features
- Long running, read-only queries.
- Complex joins
- Exploratory queries.
Execution Models
Pull or Push
Push query to data.
In shard-nothing system. 这种系统下,可以push 也可以pull
Pull data to query machine.
In shard-disk system, 共享磁盘系统通常要从shard disk pull data
Fault Tolerance
Traditional OLAP dont have fail tolerate feature.
Query Planning
Break query Into partition-specific fragments based on physical information. (Most system.)
Distributed Join Algorithms
Replica all tables from other nodes to one node is not efficient and lose the parallelism of distributed DBMS.
Replica small table to each node
Each table is partitioned on joinKey.
Cloud System
Vendors provide database-as-a-service. DBaaS
Shard-noting and shard-disk 不明显了
DBMS直接部署
Cloud-Native DBMS
System is designed explicitly to run in a cloud environment,
Based on shard-disk architecutre.
SnowFlake, BigQuery. Etc
Data Formats
Apache Parquet
Apache ORC
Apache Arrow