MultiModel dialogue response generation

Posted on March 18, 2024 1 minute read ∼ Filed in : A paper note

Propose a new task: Multi-Model dialogue response generation: given the dialogue context, the model should not only generate a pure text response but also have the capacity to generate a multimodal response (e.g., containing both image and text).

Challenges:

the training is over-fitted to the training datasets, and cannot generalize to the new domain.
not easy to collect enough training data for a new domain.

Ideas:

make parameters that rely on multimodal dialogues small and independent by disentangling textual response generation and image response generation, and thus we can learn the major part of the generation model from text-only dialogues and image_description+image, pairs that are much easier to be obtained.

Problem formulation:

(dialogue context U, response R) => learned model P(R

U; \theta)

U and R may contains images.

Unified representations of both text and images => express image in form of sequence tokens.
1. Texts => BPE-encoded tokens
2. Images => each token is a discrete Auto-Encoder

MultiModel dialogue response generation

Tags Cloud

Categories Cloud

It's the niceties that make the difference fate gives us the hand, and we play the cards.

END OF POST

Tags Cloud

Categories Cloud

It's the niceties that make the difference fate gives us the hand, and we play the cards.