Robuta

https://deepai.org/publication/mobilevitv3-mobile-friendly-vision-transformer-with-simple-and-effective-fusion-of-local-global-and-input-features
09/30/22 - MobileViT (MobileViTv1) combines convolutional neural networks (CNNs) and vision transformers (ViTs) to create light-weight models...
mobile friendlyvision transformersimpleeffectivefusion
https://blog.deeplite.ai/deeplite-token-pruning-bavit-transformers-0-0-1
We introduce BAViT, the Background Aware Vision Transformer designed to identify & prune background tokens to make transformers smaller for edge devices
vision transformerbackgroundawaretokenpruning
https://openreview.net/forum?id=M2pQrhlUSY&referrer=%5Bthe%20profile%20of%20Yilin%20Wang%5D(%2Fprofile%3Fid%3D~Yilin_Wang4)
Despite the impressive representation capacity of vision transformer models, current light-weight vision transformer models still suffer from inconsistent and...
vision transformerliteenhancedselfattention
https://huggingface.co/papers/2103.14030
Join the discussion on this paper page
paperswintransformerhierarchicalvision
https://j-min.io/publication/tvlt_neurips2022/
Feb 11, 2025 - Vision-and-Language modeling without text, by using a transformer which takes only raw visual and audio inputs - *[NeurIPS 2022](https://nips.cc/) (Oral)*
textlessvisionlanguagetransformerjaemin
https://huggingface.co/papers/2407.08083
Join the discussion on this paper page
paperhybridmambatransformervision
https://openreview.net/forum?id=vlOfFI9vWO&referrer=%5Bthe%20profile%20of%20Wei%20Wang%5D(%2Fprofile%3Fid%3D~Wei_Wang60)
Vision Transformers (ViT) have revolutionized the field of computer vision by leveraging self-attention mechanisms to process images. However, the...
reinforcement learningvision transformermultiagentefficient
https://github.com/NVlabs/RelViT
[ICLR 2022] RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning - NVlabs/RelViT
vision transformergithubiclrconceptguided
https://sciencedaily.com/releases/2023/06/230601160053.htm
Vision transformers (ViTs) are powerful artificial intelligence (AI) technologies that can identify or categorize objects in images -- however, there are...
vision transformernewmethodimprovesefficiency
https://huggingface.co/papers/2303.14189
Join the discussion on this paper page
vision transformerpaperfasthybridusing