Steering Query Optimizers A Practical Take on Big Data Workloads
1 minute read ∼ Filed in : A paper noteIntroduction
This paper apply BAO’s logic into the big data processing system SCOPE to output a more efficient query plan.
Since SCOPE’s workload is different than OLAP in BAO, it solves some novel challenges accordingly.
- SCOPE has 219 rules and it’s hard to explore all combinations.
- Executing each job is time consuming, thus no enough data to train.
- Job in SCOPE is a big DAG with hundreds of nodes, graph based featurization is not directly applicable.
Details
To overcome those problems, the paper
- It use heuristics algorithm to approximate which the useful rule for a given job (rules affect the final query plan)
- Search space (all 219 rules) -> compile(like explain) -> query plan + used rules.
- Search space = all rules - used rules, then repeat until search space is empty.
- Instead of featurize whole Job DAG, it select only important features including job level features, rule configuration features.
- It use regression model to predict the runtime.