Decision Transformer Reinforcement Learning via Sequence Modeling
1 minute read ∼ Filed in : A paper noteThis paper mainly casts the problem of RL as conditional sequence modeling.
Where they input the expected reward and predict the best action here.
One problem is how to define the expected rewards.
The experiments show that it should be within in training dataset, but I feel there is a problem.
How to decide the expected rewards if we cannot predict future rewards?