TAWARE Automate Workload Autoscaling with Reinforcement Learning in Production Cloud Systems

Posted on March 12, 2024   1 minute read ∼ Filed in  : 


RL models system tasks as a sequential decision-making problem and provides a feedback for exploring the space


meta-learning + bootstrapping => quickly adapt to various workloads.

safe and robust RL exploration.

Challenges of applying RL to production

First, a learned RL policy is workload-specific and infrastructure-specific, Nontrivial retraining is needed to adapt to new workloads and environment shifts in each problem domain, which is a critical problem in making RL practical in production

Without timely retraining, the online policy-serving performance of the RL agent fluctuates and leads to undesired degradation.

RL training is through trial and error, and


fast-adapting, effective, and robust RL-based solutions under the constraints s of production cloud systems

RL model trained in a robust manner.

RL policy can be adapted to new workloads without significant retraining.

The online RL model policy-serving performance can be kept stable.


  1. Fast model adaptation: meta-learning to model RL agent as base-learner + meta-learner for learning to generalize to new app/env shifts.

With the embedding, fewer retraining iterations are needed for new, previously unseen workloads.

  1. Stable online RL policy-serving performance.

    continuous monitoring, retraining detection, and trigger mechanisms.

  2. Safe exploration. RL bootstrapping module that combines offline and online training.

    Heuristics-based controller by comparing rewards, the agent continues to be trained online.


Tags Cloud

Categories Cloud

It's the niceties that make the difference fate gives us the hand, and we play the cards.