Robuta

Mixture Content Selection for Diverse Sequence Generation... j-min.io jaemin chomixture DOCCI: Descriptions of Connected and Contrasting Images... j-min.io jaemin choconnected TVLT: Textless Vision-Language Transformer | Jaemin Cho j-min.io jaemin chotextless Fine-grained Image Captioning with CLIP Reward | Jaemin Cho j-min.io jaemin chofineimage Unifying Vision-and-Language Tasks via Text Generation | Jaemin... j-min.io text generationvia Jaemin Cho j-min.io jaemin cho A Hierarchical Latent Structure for Variational Conversation... j-min.io jaemin chostructure Hierarchical Video-Moment Retrieval and Step-Captioning... j-min.io jaemin chovideostep