Robuta

Sponsor of the Day: Jerkmate
https://www.elastic.co/docs/reference/elasticsearch/plugins/analysis-kuromoji-tokenizer kuromoji_tokenizer | Elasticsearch Reference The kuromoji_tokenizer accepts the following settings: As a demonstration of how the user dictionary can be used, save the following dictionary to... tokenizer elasticsearchreference https://www.elastic.co/docs/reference/text-analysis/analysis-thai-tokenizer Thai tokenizer | Elasticsearch Reference The thai tokenizer segments Thai text into words, using the Thai segmentation algorithm included with Java. Text in other languages in general will be... tokenizer elasticsearchthaireference https://www.elastic.co/docs/reference/text-analysis/analysis-chargroup-tokenizer Character group tokenizer | Elasticsearch Reference The char_group tokenizer breaks text into terms whenever it encounters a character which is in a defined set. It is mostly useful for cases where a simple... character grouptokenizer elasticsearchreference https://www.elastic.co/docs/reference/elasticsearch/plugins/analysis-icu-tokenizer ICU tokenizer | Elasticsearch Reference Tokenizes text into words on word boundaries, as defined in UAX #29: Unicode Text Segmentation. It behaves much like the standard tokenizer, but adds... tokenizer elasticsearchicureference https://www.elastic.co/docs/reference/text-analysis/analysis-letter-tokenizer Letter tokenizer | Elasticsearch Reference The letter tokenizer breaks text into terms whenever it encounters a character which is not a letter. It does a reasonable job for most European languages,... tokenizer elasticsearchletterreference https://www.elastic.co/docs/reference/text-analysis/analysis-simplepatternsplit-tokenizer Simple pattern split tokenizer | Elasticsearch Reference The simple_pattern_split tokenizer uses a regular expression to split the input into terms at pattern matches. The set of regular expression features... tokenizer elasticsearchsimplepatternsplitreference https://www.elastic.co/docs/reference/text-analysis/analysis-uaxurlemail-tokenizer UAX URL email tokenizer | Elasticsearch Reference The uax_url_email tokenizer is like the standard tokenizer except that it recognises URLs and email addresses as single tokens. The above sentence would... tokenizer elasticsearchuaxurlemailreference https://www.elastic.co/docs/reference/text-analysis/tokenizer-reference Tokenizer reference | Elasticsearch Reference A tokenizer receives a stream of characters, breaks it up into individual tokens (usually individual words), and outputs a stream of tokens. For instance,... reference elasticsearchtokenizer