Sponsor of the Day:
Jerkmate
https://www.elastic.co/docs/reference/elasticsearch/plugins/analysis-kuromoji-tokenizer
kuromoji_tokenizer | Elasticsearch Reference
The kuromoji_tokenizer accepts the following settings: As a demonstration of how the user dictionary can be used, save the following dictionary to...
tokenizer elasticsearchreference
https://www.elastic.co/docs/reference/text-analysis/analysis-thai-tokenizer
Thai tokenizer | Elasticsearch Reference
The thai tokenizer segments Thai text into words, using the Thai segmentation algorithm included with Java. Text in other languages in general will be...
tokenizer elasticsearchthaireference
https://www.elastic.co/docs/reference/text-analysis/analysis-chargroup-tokenizer
Character group tokenizer | Elasticsearch Reference
The char_group tokenizer breaks text into terms whenever it encounters a character which is in a defined set. It is mostly useful for cases where a simple...
character grouptokenizer elasticsearchreference
https://www.elastic.co/docs/reference/elasticsearch/plugins/analysis-icu-tokenizer
ICU tokenizer | Elasticsearch Reference
Tokenizes text into words on word boundaries, as defined in UAX #29: Unicode Text Segmentation. It behaves much like the standard tokenizer, but adds...
tokenizer elasticsearchicureference
https://www.elastic.co/docs/reference/text-analysis/analysis-letter-tokenizer
Letter tokenizer | Elasticsearch Reference
The letter tokenizer breaks text into terms whenever it encounters a character which is not a letter. It does a reasonable job for most European languages,...
tokenizer elasticsearchletterreference
https://www.elastic.co/docs/reference/text-analysis/analysis-simplepatternsplit-tokenizer
Simple pattern split tokenizer | Elasticsearch Reference
The simple_pattern_split tokenizer uses a regular expression to split the input into terms at pattern matches. The set of regular expression features...
tokenizer elasticsearchsimplepatternsplitreference
https://www.elastic.co/docs/reference/text-analysis/analysis-uaxurlemail-tokenizer
UAX URL email tokenizer | Elasticsearch Reference
The uax_url_email tokenizer is like the standard tokenizer except that it recognises URLs and email addresses as single tokens. The above sentence would...
tokenizer elasticsearchuaxurlemailreference
https://www.elastic.co/docs/reference/text-analysis/tokenizer-reference
Tokenizer reference | Elasticsearch Reference
A tokenizer receives a stream of characters, breaks it up into individual tokens (usually individual words), and outputs a stream of tokens. For instance,...
reference elasticsearchtokenizer