https://aclanthology.org/2025.emnlp-main.1169/
R-BPE: Improving BPE-Tokenizers with Token Reuse - ACL Anthology
Nancy Hamdan, Osama Rakan Al Mraikhat, Fadi A. Zaraket. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.
rbpetokenaclanthology
https://lib.rs/crates/rust_tokenizers/rev
Reverse dependencies of rust_tokenizers // Lib.rs
High performance tokenizers for Rust
reverse dependenciesrusttokenizerslib
https://schalkneethling.com/posts/flexsearch-tokenizers-learning-through-writing-tests/
Flexsearch tokenizers - Learning through writing tests - Schalk Neethling - Open Web Engineer
An introduction to Flexsearch and its tokenizers through writing and interpreting tests.
writing testsschalk neethlingopen webflexsearchtokenizers
https://www.freecodecamp.org/news/train-algorithms-from-scratch-with-hugging-face/
How to Train BPE, WordPiece, and Unigram Tokenizers from Scratch using Hugging Face
Sep 26, 2024 - If you've had some experience with NLP, you probably know that tokenization is at the helm of any NLP pipeline. Tokenization is often regarded as a subfield of...
how to train
https://deepwiki.com/tadashi-aikawa/obsidian-various-complements-plugin/4.2-language-specific-tokenizers
Language-Specific Tokenizers | tadashi-aikawa/obsidian-various-complements-plugin | DeepWiki
This document covers the specialized tokenizers for languages that require non-standard word boundary detection: Japanese, Chinese, and Arabic. These...
languagespecifictokenizerstadashiaikawa
https://soyuj.com/blog/nepali-tokenizers/
Nepali Tokenizers: A Python Package for Nepali NLP | Soyuj Jung Basnet
Pre-trained Tokenizers for the Nepali language with an interface to HuggingFace's tokenizers library for customizability.
python packagenepalitokenizersnlpjung
https://docs.vllm.ai/en/latest/api/vllm/tokenizers/
tokenizers - vLLM
tokenizersvllm
https://someai.org/ai/HuggingFaceFW-Dev-lang-word-tokenizers
Lang Word Tokenizers | Free AI Tool on SomeAI.org
Explore and visualize language family trees with Lang Word Tokenizers, an interactive Visual QA tool for linguistic analysis and education.
free ai toollangwordtokenizers
https://hex.pm/packages/tokenizers
tokenizers | Hex
Bindings to Hugging Face Tokenizers for Elixir
tokenizershex
https://lib.rs/crates/tokenizers/features
Feature flags of Tokenizers crate // Lib.rs
Provides an implementation of today's most used tokenizers, with a focus on performances and versatility
feature flagstokenizerscratelib
https://joshuaberkowitz.us/blog/news-1/how-tokenizers-are-transforming-ai-image-editing-and-generation-589
How Tokenizers Are Transforming AI Image Editing and Generation | Joshua Berkowitz
Unlocking a New Era in AI Image Manipulation
ai image editingtokenizerstransforming