sqlite

Author	SHA1	Message	Date
dan	3aaa4cd9ed	Add tests to check that the "unicode61" and "icu" tokenizers both identify white-space codepoints outside the ASCII range. FossilOrigin-Name: bfb2d4730cbbe18fb940e72f4fde9122d550734e	2012-06-19 06:35:39 +00:00
dan	25cdf46ae4	Add the "tokenchars=" and "separators=" options, for customizing the set of characters considered to be token separators, to the unicode61 tokenizer. FossilOrigin-Name: e56fb462aa1f11bb23303ae0dc62815c21e26a52	2012-06-07 15:53:48 +00:00
dan	754d3adf7c	Have the FTS unicode61 strip out diacritics when tokenizing text. This can be disabled by specifying the tokenizer option "remove_diacritics=0". FossilOrigin-Name: 790f76a5898dad1a955d40edddf11f7b0fec0ccd	2012-06-06 19:30:38 +00:00
dan	7946c53009	If SQLITE_DISABLE_FTS3_UNICODE is defined, do not build the "unicode61" tokenizer. FossilOrigin-Name: e71495a817b479bc23c5403d99255e3f098eb054	2012-05-26 18:28:14 +00:00
dan	7a796731db	Add coverage tests for fts3_unicode.c. FossilOrigin-Name: 07d3ea8a3cb179fab6c48934fc6751f53b507d36	2012-05-26 16:22:56 +00:00
dan	ab322bd21e	Change the name of the "unicode" tokenizer to "unicode61" to emphasize that the case folding and separator-character identification routines are based on unicode version 6.1. FossilOrigin-Name: 8f3e60aa2253f21bcee5d03982cfdd7f16c00060	2012-05-26 14:54:50 +00:00
dan	3d403c71a8	Add an experimental tokenizer to fts4 - "unicode". This tokenizer works in the same way except that it understands unicode "simple case folding" and recognizes all characters not classified as "Letters" or "Numbers" by unicode as token separators. FossilOrigin-Name: 0c13570ec78c6887103dc99b81b470829fa28385	2012-05-25 17:50:19 +00:00

7 Commits