Dremel Interactive Analysis of WebScale Datasets
2 minute read ∼ Filed in : A paper noteIntroduction
Motivation
Data is often non-relational, and Nested. Normalizing and recombing those is hard. A better solution is to store all values of a given field consecutively to improve retrieval efficiency.
The main challenge is how to preserve all structural information and can reconstruct records from an arbitrary subset of fields.
Contribution
The paper proposes a scalable (trillion-record), interactive, ad-hoc query system for the analysis of read-only nested data.
- The paper proposes a novel columnar storage format for nested data. it basically dissects nested records into columns and reassembles them.
- Query processing is efficient and does not require restructuring of the records.
- Conducts experiments with trillion-record, multi-terabyte datasets. 1k-4k nodes. it shows The system can combine multi-level execution trees and columnar data layout to run aggregation over trillion-row tables in seconds.
System
Data Model
The data model is based on strongly-typed nested records.
Nested Columnar Storage
Mainly address the following challenges:
- lossless representation of a record in columnar format => with r and d
- fast encoding => tree writer, build a tree from schema.
- efficient record assembly.
Repetition and Definition Levels
Use r and d to record the meta information of each value, eg, which record the value belongs to. etc
Definition levels are not stored for values that are always defined. Similarly, repetition levels are stored only if required.
Splitting Records into Columns
Most datasets are sparse and have thousands of fields, the paper tries to process missing fields as cheaply as possible.
The paper uses a tree of field writers, whose structure matches the field hierarchy in the schema. It then updates field writers only when they have their own data.
Record Assembly
Efficiently reconstruct the original records from columnar data.
Query Language & Execution
Mainly for a read-only system. Many queries are one-pass aggregations.
Experiments
All query is about the sum, count, groupBy,
And then measures
- Executing time of the same query on row-based and column-based storage.
- influences the number of servers in the server tree on execution time.
- Scalability: Executing time vs increasing of leaf servers.
Conclusion
Record assembly and parsing are expensive