An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning
1 minute read ∼ Filed in : A paper noteThis paper proposes an end-to-end automatic CDB tuning system, which
- uses the deep deterministic policy gradient method to find the optimal configurations in high-dimensional continuous space.
- adopts a try-and-error strategy to learn knob settings with a limited number of samples to accomplish the initial training
- adopts the reward-feedback mechanism in RL instead of traditional regression, which enables end-to-end learning and accelerates the convergence speed
RL for tunning
State: Got from the “show status”.
Reward: performance changed (latency/throughput)
Action: change tunable knobs
Policy: a DNN, db status => DNN => recommended knobs.
Training data:
(q, a, s, r), q is a set of query workloads, a is the knobs and values, s is the db status, and r is the performance when processing q.
Insights:
value-based and policy-based,
Q-learning is effective in a relatively small state space. However, it is hard to solve the problem of a large state
DQN can model the states, but DQN is a discrete-oriented control algorithm, which means the actions of output are discrete
It uses Deep Deterministic Policy Gradient