Cost-based or Learning-based A Hybrid Query Optimizer for Query Plan Selection

Posted on March 18, 2024   1 minute read ∼ Filed in  : 

Learning-based methods:

  • they cannot support dynamic workloads well where the testing workloads are out of distributions with the training workloads.

  • Dynamic workloads: have different distributions with training examples.

This paper proposes a hybrid optimizer.

  • hint-based candidate generation method that leverages the learning-based method to generate highly beneficial leading hints (basically join order).
  • cost-based method + hint => complete plans.
  • predict execution time + uncertainty of each plan (confidence).

Challenges:

  • How to find a good join order (leading hints) from the vast space?
  • How to predict the latency + uncertainty for a given complete plan.

MCTS in join order

insights: MCTS follows an exploration-exploitation strategy, where we will focus more on the directions that have led to join orders with high estimated performance (𝑖.𝑒., exploitation), and will also pay attention to the directions that are rarely picked (𝑖.𝑒., exploration).

Methods: define node utility,

This process will return top-H nodes.

Optimal plan selection

Encoding: node encoding = operator + table + cost model

Multi-head Performance Estimator with Aleatoric Uncertainty.

Retraining

For every 10 executed queries, HybridQO will perform a training process.

Bao: retrain the neural network every 100 queries with a window size set to 2000.





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.