TAWARE Automate Workload Autoscaling with Reinforcement Learning in Production Cloud Systems
1 minute read ∼ Filed in : A paper noteInsights
RL models system tasks as a sequential decision-making problem and provides a feedback for exploring the space
Introduction
meta-learning + bootstrapping => quickly adapt to various workloads.
safe and robust RL exploration.
Challenges of applying RL to production
First, a learned RL policy is workload-specific and infrastructure-specific, Nontrivial retraining is needed to adapt to new workloads and environment shifts in each problem domain, which is a critical problem in making RL practical in production
Without timely retraining, the online policy-serving performance of the RL agent fluctuates and leads to undesired degradation.
RL training is through trial and error, and
Objective
fast-adapting, effective, and robust RL-based solutions under the constraints s of production cloud systems
RL model trained in a robust manner.
RL policy can be adapted to new workloads without significant retraining.
The online RL model policy-serving performance can be kept stable.
Techniques
- Fast model adaptation: meta-learning to model RL agent as base-learner + meta-learner for learning to generalize to new app/env shifts.
With the embedding, fewer retraining iterations are needed for new, previously unseen workloads.
-
Stable online RL policy-serving performance.
continuous monitoring, retraining detection, and trigger mechanisms.
-
Safe exploration. RL bootstrapping module that combines offline and online training.
Heuristics-based controller by comparing rewards, the agent continues to be trained online.