Decision Transformer Reinforcement Learning via Sequence Modeling

Posted on April 10, 2024 1 minute read ∼ Filed in : A paper note

This paper mainly casts the problem of RL as conditional sequence modeling.

Where they input the expected reward and predict the best action here.

One problem is how to define the expected rewards.

The experiments show that it should be within in training dataset, but I feel there is a problem.

How to decide the expected rewards if we cannot predict future rewards?

Tags Cloud

Categories Cloud

It's the niceties that make the difference fate gives us the hand, and we play the cards.