https://deepai.org/publication/mobilevitv3-mobile-friendly-vision-transformer-with-simple-and-effective-fusion-of-local-global-and-input-features
09/30/22 - MobileViT (MobileViTv1) combines convolutional neural networks (CNNs) and vision transformers (ViTs) to create light-weight models...
mobile friendlyvision transformersimpleeffectivefusion
https://blog.deeplite.ai/deeplite-token-pruning-bavit-transformers-0-0-1
We introduce BAViT, the Background Aware Vision Transformer designed to identify & prune background tokens to make transformers smaller for edge devices
vision transformerbackgroundawaretokenpruning
https://openreview.net/forum?id=M2pQrhlUSY&referrer=%5Bthe%20profile%20of%20Yilin%20Wang%5D(%2Fprofile%3Fid%3D~Yilin_Wang4)
Despite the impressive representation capacity of vision transformer models, current light-weight vision transformer models still suffer from inconsistent and...
vision transformerliteenhancedselfattention
https://j-min.io/publication/tvlt_neurips2022/
Feb 11, 2025 - Vision-and-Language modeling without text, by using a transformer which takes only raw visual and audio inputs - *[NeurIPS 2022](https://nips.cc/) (Oral)*
textlessvisionlanguagetransformerjaemin
https://openreview.net/forum?id=vlOfFI9vWO&referrer=%5Bthe%20profile%20of%20Wei%20Wang%5D(%2Fprofile%3Fid%3D~Wei_Wang60)
Vision Transformers (ViT) have revolutionized the field of computer vision by leveraging self-attention mechanisms to process images. However, the...
reinforcement learningvision transformermultiagentefficient
https://sciencedaily.com/releases/2023/06/230601160053.htm
Vision transformers (ViTs) are powerful artificial intelligence (AI) technologies that can identify or categorize objects in images -- however, there are...
vision transformernewmethodimprovesefficiency