Robuta

https://aclanthology.org/2025.loresmt-1.8/ Jamo-Level Subword Tokenization in Low-Resource Korean Machine Translation - ACL Anthology Junyoung Lee, Marco Cognetta, Sangwhan Moon, Naoaki Okazaki. Proceedings of the Eighth Workshop on Technologies for Machine Translation of Low-Resource... subword tokenization https://lrec.elra.info/lrec2018-main-473 BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages - LREC 2018 | LREC -... May 1, 2018 - We present BPEmb, a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE). In an evaluation using fine-grained e