Deploying a Steered Query Optimizer in Production at Microsoft
1 minute read ∼ Filed in : A paper noteIntroduction
General query optimizer is far from optimal for a given customer and workloads, learned query optimizer works but
- hard to understand why it is better.
- high experimentation costs/pre-production cost (collecting training data)
- regression to newer queries.
This paper makes steering actions more manageable, keeping the costs of steering within budget, and avoiding unexpected performance regressions in production.
- Key idea: breaking down the steering process into smaller incremental steps that are easily explainable and reversible.
- Use contextual bandit model to redue the pre-production cost.
- Use validation model to accept or reject the suggested modifications to query plan.
Design
The paper propose QO-Advisor, which is recurrently triggered every day. It accepts historical metadata and return a list of job template identifiers and rule hint pairs.
Contextual Bandit => suggest one rule hint for each job.
- Actions: rules. Context: features of query. Reward: optimizer’s estimated cost change.
- It uses a tool named as Azure Personalizer to implement.
Flighting:
- Execute job from low cost to high cost and outputs total DataRead and DataWritten features.
Validation model => detect regression.
- It use a linear regression model to predict the PNhours delta based on DataRead and DataWritten features given job, and only when PNhours delta is below a predefined threshold, it is safe to execute.
- Intuition: If job reads and writes less data, PNhours will be reduced.