dan
|
b651084713
|
Add tests to restore coverage of fts5_tokenizer.c.
FossilOrigin-Name: 8f9257361b05e368bf433e56d0698923b0f97d12e7c0ad7760aaab6746c0e467
|
2024-08-17 17:22:49 +00:00 |
|
dan
|
ec8962869a
|
Update mkunicode.tcl to match the change erroneously made to machine generated file fts5_unicode2.c in [b7b7bde9].
FossilOrigin-Name: 326d579d777fdede6bc64f9525248767f4730de4e50260b0387e614a9d006416
|
2020-11-26 20:13:54 +00:00 |
|
mistachkin
|
065f3bf4f2
|
Fix various harmless compiler warnings seen with MSVC.
FossilOrigin-Name: 1c0fe5b5763fe5cbace9773dcdab742e126d0bd035ab13d61f9d134afa0afc0c
|
2019-03-20 05:45:03 +00:00 |
|
drh
|
8fc4a11c94
|
Fix harmless compiler warnings in the unicode2 logic of FTS3 and FTS5.
FossilOrigin-Name: 703029ac6d24860230a8c30fcbf5e7e1da619e84f1cc9b9e65ebc74879a184d2
|
2019-01-02 23:49:47 +00:00 |
|
dan
|
b163b57212
|
Fix problems in fts5 found by ASAN.
FossilOrigin-Name: c564bf870106faef297594a51995619c80311d06bd5f8a0c7644f666f22ba576
|
2018-12-28 07:37:22 +00:00 |
|
drh
|
f8c2fea195
|
Remove the unused sqlite3Fts5UnicodeNCat() function.
FossilOrigin-Name: 7149dacf1d440a19f62808b4591c3fa8da202b2ec742d5490a63f2ec005ff9e7
|
2018-12-03 17:40:46 +00:00 |
|
dan
|
e89feee5c3
|
Add the "remove_diacritics=2" option to the unicode61 tokenizer in both FTS5
and FTS3/4.
FossilOrigin-Name: 06177f3f114b5d804b84c27ac843740282e2176fdf0f7a999feda0e1b624adec
|
2018-12-03 16:14:49 +00:00 |
|
dan
|
b80bb6ce88
|
Add the "categories" option to the unicode61 tokenizer in fts5.
FossilOrigin-Name: 80d2b9e635e3100f90cffdcffa5b5038da6fbbfccc9f5777c59a4ae760d4cb62
|
2018-07-13 19:52:43 +00:00 |
|
dan
|
920c83f18f
|
Fix some problems in fts3 found by address-sanitizer.
FossilOrigin-Name: 16a8e84fa7f67a467f824bdd7f72cbd6a6e95dab8cc7aa1e0e751720b98f3e31
|
2017-03-20 18:53:32 +00:00 |
|
dan
|
53ff9c2972
|
Fix a potential buffer overread provoked by invalid utf-8 in fts5.
FossilOrigin-Name: a049fbbde5da2e43d41aa8c2b41f9eb21507ac76
|
2016-02-12 18:48:09 +00:00 |
|
dan
|
3f09beda45
|
Remove "#ifdef SQLITE_ENABLE_FTS5" from individual fts5 source files. Add a single "#if !defined(SQLITE_CORE) || defined(SQLITE_ENABLE_FTS5)" to fts5.c.
FossilOrigin-Name: 7819002ed85497bbd0f9cf4d39df641573324436
|
2015-07-02 15:52:21 +00:00 |
|
dan
|
2e7d35e2fe
|
Avoid making redundant copies of position-lists within the fts5 code.
FossilOrigin-Name: 5165de548b84825cb000d33e5d3de12b0ef112c0
|
2015-05-23 15:43:05 +00:00 |
|
dan
|
21b7d2a9b8
|
Improve test coverage of fts5_unicode2.c.
FossilOrigin-Name: fea8a4db9d8c7b9a946017a0dc984cbca6ce240e
|
2015-05-22 06:08:25 +00:00 |
|
dan
|
57fec54b53
|
Fix some problems with building fts5 and fts3 together using the amalgamation.
FossilOrigin-Name: fb10bbb9f9c4481e6043d323a3018a4ec68eb0ff
|
2015-02-02 11:32:20 +00:00 |
|
dan
|
37db72f1f7
|
Merge latest trunk changes with this branch.
FossilOrigin-Name: 4b3651677e7132c4c45605bc1f216fc08ef31198
|
2015-01-01 18:03:49 +00:00 |
|
dan
|
6024772ba2
|
Add a version of the unicode61 tokenizer to fts5.
FossilOrigin-Name: d09f7800cf14f73ea86d037107ef80295b2c173a
|
2015-01-01 16:46:10 +00:00 |
|
drh
|
858b638d1f
|
A couple more harmless compiler warnings eliminated.
FossilOrigin-Name: bcf6d775f90f4d1ba018a1b965f2f710df130f01
|
2014-08-06 18:50:51 +00:00 |
|
drh
|
e8f2c9dc71
|
Fix two more harmless compiler warnings. Make sure the fts3_unicode2.c file
is in sync with mkunicode.tcl.
FossilOrigin-Name: a2a60307ea68a3230952a56cb65369ba0a208967
|
2014-08-06 17:49:13 +00:00 |
|
dan
|
f2c9229f73
|
Up until now the fts4 "unicode61" tokenizer has treated all private use codepoints except the first and last of each of the three ranges as alphanumeric (eligible to be part of tokens). This commit fixes this so that all private use codepoints are considered alphanumeric. In other words, it fixes the handling of codepoints 0xE000, 0xF8FF, 0xF0000, 0xFFFFD, 0x100000 and 0x10FFFD.
FossilOrigin-Name: 6cfd9af5250029c0d275be027b4208c48954a8a1
|
2013-06-05 16:17:21 +00:00 |
|
dan
|
754d3adf7c
|
Have the FTS unicode61 strip out diacritics when tokenizing text. This can be disabled by specifying the tokenizer option "remove_diacritics=0".
FossilOrigin-Name: 790f76a5898dad1a955d40edddf11f7b0fec0ccd
|
2012-06-06 19:30:38 +00:00 |
|
drh
|
a9cfaba95a
|
Omit the fts3 unicode character class routines from the build if fts3/4
is disabled.
FossilOrigin-Name: c00bb5d4601efc15933f222349e96a043b610a19
|
2012-05-28 12:22:00 +00:00 |
|
dan
|
7946c53009
|
If SQLITE_DISABLE_FTS3_UNICODE is defined, do not build the "unicode61" tokenizer.
FossilOrigin-Name: e71495a817b479bc23c5403d99255e3f098eb054
|
2012-05-26 18:28:14 +00:00 |
|
dan
|
501c74d3e1
|
Change the format of the tables used by sqlite3FtsUnicodeTolower() to make them a little smaller.
FossilOrigin-Name: b89d3834f6690073fca0fc22c18afa1fb280ea7d
|
2012-05-26 17:57:02 +00:00 |
|
dan
|
1c7016c9a5
|
Add special fast paths to sqlite3FtsUnicodeTolower() and Isalnum() for codepoints in the ASCII range.
FossilOrigin-Name: cf7b25d47687635a04f4347d45f135c686b9d758
|
2012-05-25 19:50:12 +00:00 |
|
dan
|
80ed5a56a5
|
Fix comments in generated file fts3_unicode2.c.
FossilOrigin-Name: 3dc567ef4702d9a63d78d11ff705cb7f7359f7a6
|
2012-05-25 18:48:48 +00:00 |
|
dan
|
3d403c71a8
|
Add an experimental tokenizer to fts4 - "unicode". This tokenizer works in the same way except that it understands unicode "simple case folding" and recognizes all characters not classified as "Letters" or "Numbers" by unicode as token separators.
FossilOrigin-Name: 0c13570ec78c6887103dc99b81b470829fa28385
|
2012-05-25 17:50:19 +00:00 |
|