Robuta

https://j-min.io/publication/vl-adapter_cvpr2022/
Oct 22, 2024 - Adapter-based Parameter-Efficient Training for V&L tasks - *[CVPR 2022](https://cvpr2022.thecvf.com)*
transfer learningvladapterparameterefficient
https://beta.dopple.ai/profile/45e907ce-f31c-4459-bb8f-906f8096e5e9
Dopple.ai is the premier AI platform bringing fictional worlds to life. Engage in meaningful conversations with iconic characters, seek guidance from virtual...
dopple aiparkjaemin
https://j-min.io/publication/bifrost-1_2025/
Sep 29, 2025 - a unified framework that bridges multimodal LLMs and diffusion models with patch-level CLIP latents
diffusion modelsbifrostbridgingmultimodalllms
https://j-min.io/tag/spatial-reasoning/
Jaemin Cho Academic website.
spatial reasoningjaemincho
https://j-min.io/publication/video-skill-cot-findingsinemnlp2025/
Sep 16, 2025 - a framework that automatically constructs and leverages skill-aware CoT supervisions for domain-adaptive video reasoning
videoskillcotbasedchain
https://j-min.io/publication/diagrammergpt_colm2024/
Aug 6, 2024 - Using LLM (GPT-4) to generate a 'diagram plan' for fine-grained layouts (object/text labels/arrows, etc.) and render in either raster images (via diffusion)...
open domaingeneratingplatformdiagramsvia
https://www.wattpad.com/story/291404508-snow-na-jaemin-end
Kisah perjuangan hidup seorang namja bernama Jaemin. Semenjak umur 10 tahun ia sudah di benci oleh keluarganya. Bahkan menginjak usia remaja ia sudah sering...
na jaeminsnowendhanniwattpad
https://j-min.io/publication/tvlt_neurips2022/
Feb 11, 2025 - Vision-and-Language modeling without text, by using a transformer which takes only raw visual and audio inputs - *[NeurIPS 2022](https://nips.cc/) (Oral)*
textlessvisionlanguagetransformerjaemin
https://porndeepfake.net/hot-jaemin-solo-sex-scene-boy-wanna-suck/
Feb 5, 2024 - Hot Jaemin solo sex scene - boy wanna suck / 재민 엔시티
solo sexscene boyhotjaeminwanna
https://j-min.io/publication/x-lxmert_emnlp2020/
May 12, 2024 - Text-to-Image Generation via predicting vector-quantized image patches with multimodal LMs - *[EMNLP 2020](https://2020.emnlp.org/)*
xpaintcaptionanswerquestions
https://j-min.io/publication/vhcr_naacl2018/
May 12, 2024 - Propose a hierarchical VAE model and utterance drop regularization to mitigate posterior collapse problem - *[NAACL 2018](http://naacl.org/naacl-hlt-2018/)*...
hierarchicallatentstructurevariationalconversation
https://j-min.io/publication/envgen_colm2024/
Aug 6, 2024 - EnvGen is a novel framework that uses LLMs to adaptively create training environments to help smaller embodied RL agents learn useful skills that they are weak...
generatingadaptingenvironmentsviallms
https://j-min.io/publication/videodirectorgpt_colm2024/
Aug 6, 2024 - Using LLM (GPT-4) to generate a 'video plan' for consistent multi-scene video generation - *[COLM 2024](https://colmweb.org/)*
consistentmultiscenevideogeneration
https://j-min.io/publication/vidlankd_neurips2021/
Sep 2, 2023 - Video-based grounding can improve diverse NLU tasks - *[NeurIPS 2021](https://nips.cc/Conferences/2021)*
language understandingknowledge transferimprovingviavideo