Mixture Content Selection for Diverse Sequence Generation...
j-min.io
jaemin chomixture
DOCCI: Descriptions of Connected and Contrasting Images...
j-min.io
jaemin choconnected
TVLT: Textless Vision-Language Transformer | Jaemin Cho
j-min.io
jaemin chotextless
Fine-grained Image Captioning with CLIP Reward | Jaemin Cho
j-min.io
jaemin chofineimage
Unifying Vision-and-Language Tasks via Text Generation | Jaemin...
j-min.io
text generationvia
Jaemin Cho
j-min.io
jaemin cho
A Hierarchical Latent Structure for Variational Conversation...
j-min.io
jaemin chostructure
Hierarchical Video-Moment Retrieval and Step-Captioning...
j-min.io
jaemin chovideostep