Steering Query Optimizers A Practical Take on Big Data Workloads

Posted on February 15, 2024 1 minute read ∼ Filed in : A paper note

Introduction

This paper apply BAO’s logic into the big data processing system SCOPE to output a more efficient query plan.

Since SCOPE’s workload is different than OLAP in BAO, it solves some novel challenges accordingly.

SCOPE has 219 rules and it’s hard to explore all combinations.
Executing each job is time consuming, thus no enough data to train.
Job in SCOPE is a big DAG with hundreds of nodes, graph based featurization is not directly applicable.

To overcome those problems, the paper

It use heuristics algorithm to approximate which the useful rule for a given job (rules affect the final query plan)
1. Search space (all 219 rules) -> compile(like explain) -> query plan + used rules.
2. Search space = all rules - used rules, then repeat until search space is empty.
Instead of featurize whole Job DAG, it select only important features including job level features, rule configuration features.
It use regression model to predict the runtime.