https://aclanthology.org/2025.loresmt-1.8/
Jamo-Level Subword Tokenization in Low-Resource Korean Machine Translation - ACL Anthology
Junyoung Lee, Marco Cognetta, Sangwhan Moon, Naoaki Okazaki. Proceedings of the Eighth Workshop on Technologies for Machine Translation of Low-Resource...
subword tokenization
https://lrec.elra.info/lrec2018-main-473
BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages - LREC 2018 | LREC -...
May 1, 2018 - We present BPEmb, a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE). In an evaluation using fine-grained e