Kepler Robust Learning for Faster Parametric Query Optimization

Posted on January 2, 2024   1 minute read ∼ Filed in  : 

Introduction

Parametric query optimization (PQO): queries that have identical SQL structure and only differ in the value of bound parameters

Motivations

Traditionally, PQO is studied from the perspective of reducing query planning time by avoiding re-optimization, etc. But those approaches still rely on the traditional query optimizers, and are thus sub-optimal.

ML-based approaches have those drawbacks:

  • Inference times are higher than traditional methods.
  • inconsistent performance across dataset sizes and distributions.
  • unclear query performance improvements.
  • lack robustness.

Contributions

  1. Decoupled plan generation and learning-bsaed plan prediction architecture.
    • better candidate plan generation algorithms: Row count evolution: generate candidate plans by perturbing the optimizer’s cardinality estimates.
  2. It leverages actual query execution data to build a training dataset for best-plan prediction, avoiding the well-studied mismatch between cost models and execution latency.
  3. Robust neural network prediction techniques to reduce tail latency and reduce query regressions.
    • Use Spectral-normalized Neural Gaussian Processes to accurately quantify how confident it is about a prediction.

Solutions

Problem

Candidate generation + best plan prediction.

Assumption It assumes a fixed system state, including database configuration, optimizer implementation, and data distribution.

  • the database is reconfigured infrequently, data distribution drifts slowly.
  • when those things change, the model needs to re-train with new training data.

System

Planner: It proposes Row count evolutions, which is a computationally efficient algorithm that generates new plans by randomly perturbing the optimizer’s cardinality estimates.

  • Multiplicative perturbations: it is motivated by the standard metric of Q-error in cardinality estimation.

Models: each query template -> one ML model.

  • encoding: it uses the parameter value as an input feature, each going through embeddings for strings integer features.

  • training:

    • It forces the optimizer to produce a candidate plan by providing all join/scan methods and the join order via hints

    • It trains the model based on a multiple-label classification problem.

    • Objective function: regression is unstable and hard to train. multi-class classification is not suitable since multiple queries may be near-optimal.





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.