Summarize of RL paper I have learned

Posted on April 2, 2024 1 minute read ∼ Filed in : A paper note

RL basic category

Model-Free

Model-Base

Offline RL

Online RL

Large models require few samples to reach the same performance.

(Desired return, past states, actions) => sequence model (Decision Transformer) => actions with desired return.

This work is on offline RL.

This work is given a reward, and it try to predict an action to achieve the desired return

Large transformer models have shown strong generalization capability on language understanding when fine-tuned with limited data.