Steering Query Optimizers A Practical Take on Big Data Workloads

Posted on February 15, 2024   1 minute read ∼ Filed in  : 

Introduction

This paper apply BAO’s logic into the big data processing system SCOPE to output a more efficient query plan.

Since SCOPE’s workload is different than OLAP in BAO, it solves some novel challenges accordingly.

  1. SCOPE has 219 rules and it’s hard to explore all combinations.
  2. Executing each job is time consuming, thus no enough data to train.
  3. Job in SCOPE is a big DAG with hundreds of nodes, graph based featurization is not directly applicable.

Details

To overcome those problems, the paper

  1. It use heuristics algorithm to approximate which the useful rule for a given job (rules affect the final query plan)
    1. Search space (all 219 rules) -> compile(like explain) -> query plan + used rules.
    2. Search space = all rules - used rules, then repeat until search space is empty.
  2. Instead of featurize whole Job DAG, it select only important features including job level features, rule configuration features.
  3. It use regression model to predict the runtime.




END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.