Robuta

https://openreview.net/forum?id=9hd5WA6QCn
Multimodal large language models (MLLMs) recently showed strong capacity in integrating data among multiple modalities, empowered by generalizable attention...
cognition and emotionmultimodal perceptionmodularduplexattention
https://praeclarumjj3.github.io/publication/visper-lm/
perceptionmultimodalllmsvisualembedding