dan
|
3aaa4cd9ed
|
Add tests to check that the "unicode61" and "icu" tokenizers both identify white-space codepoints outside the ASCII range.
FossilOrigin-Name: bfb2d4730cbbe18fb940e72f4fde9122d550734e
|
2012-06-19 06:35:39 +00:00 |
|
dan
|
25cdf46ae4
|
Add the "tokenchars=" and "separators=" options, for customizing the set of characters considered to be token separators, to the unicode61 tokenizer.
FossilOrigin-Name: e56fb462aa1f11bb23303ae0dc62815c21e26a52
|
2012-06-07 15:53:48 +00:00 |
|
dan
|
754d3adf7c
|
Have the FTS unicode61 strip out diacritics when tokenizing text. This can be disabled by specifying the tokenizer option "remove_diacritics=0".
FossilOrigin-Name: 790f76a5898dad1a955d40edddf11f7b0fec0ccd
|
2012-06-06 19:30:38 +00:00 |
|
dan
|
7946c53009
|
If SQLITE_DISABLE_FTS3_UNICODE is defined, do not build the "unicode61" tokenizer.
FossilOrigin-Name: e71495a817b479bc23c5403d99255e3f098eb054
|
2012-05-26 18:28:14 +00:00 |
|
dan
|
7a796731db
|
Add coverage tests for fts3_unicode.c.
FossilOrigin-Name: 07d3ea8a3cb179fab6c48934fc6751f53b507d36
|
2012-05-26 16:22:56 +00:00 |
|
dan
|
ab322bd21e
|
Change the name of the "unicode" tokenizer to "unicode61" to emphasize that the case folding and separator-character identification routines are based on unicode version 6.1.
FossilOrigin-Name: 8f3e60aa2253f21bcee5d03982cfdd7f16c00060
|
2012-05-26 14:54:50 +00:00 |
|
dan
|
3d403c71a8
|
Add an experimental tokenizer to fts4 - "unicode". This tokenizer works in the same way except that it understands unicode "simple case folding" and recognizes all characters not classified as "Letters" or "Numbers" by unicode as token separators.
FossilOrigin-Name: 0c13570ec78c6887103dc99b81b470829fa28385
|
2012-05-25 17:50:19 +00:00 |
|