Commit Graph

18 Commits

Author SHA1 Message Date
dan 920c83f18f Fix some problems in fts3 found by address-sanitizer.
FossilOrigin-Name: 16a8e84fa7f67a467f824bdd7f72cbd6a6e95dab8cc7aa1e0e751720b98f3e31
2017-03-20 18:53:32 +00:00
dan 53ff9c2972 Fix a potential buffer overread provoked by invalid utf-8 in fts5.
FossilOrigin-Name: a049fbbde5da2e43d41aa8c2b41f9eb21507ac76
2016-02-12 18:48:09 +00:00
dan 3f09beda45 Remove "#ifdef SQLITE_ENABLE_FTS5" from individual fts5 source files. Add a single "#if !defined(SQLITE_CORE) || defined(SQLITE_ENABLE_FTS5)" to fts5.c.
FossilOrigin-Name: 7819002ed85497bbd0f9cf4d39df641573324436
2015-07-02 15:52:21 +00:00
dan 2e7d35e2fe Avoid making redundant copies of position-lists within the fts5 code.
FossilOrigin-Name: 5165de548b84825cb000d33e5d3de12b0ef112c0
2015-05-23 15:43:05 +00:00
dan 21b7d2a9b8 Improve test coverage of fts5_unicode2.c.
FossilOrigin-Name: fea8a4db9d8c7b9a946017a0dc984cbca6ce240e
2015-05-22 06:08:25 +00:00
dan 57fec54b53 Fix some problems with building fts5 and fts3 together using the amalgamation.
FossilOrigin-Name: fb10bbb9f9c4481e6043d323a3018a4ec68eb0ff
2015-02-02 11:32:20 +00:00
dan 37db72f1f7 Merge latest trunk changes with this branch.
FossilOrigin-Name: 4b3651677e7132c4c45605bc1f216fc08ef31198
2015-01-01 18:03:49 +00:00
dan 6024772ba2 Add a version of the unicode61 tokenizer to fts5.
FossilOrigin-Name: d09f7800cf14f73ea86d037107ef80295b2c173a
2015-01-01 16:46:10 +00:00
drh 858b638d1f A couple more harmless compiler warnings eliminated.
FossilOrigin-Name: bcf6d775f90f4d1ba018a1b965f2f710df130f01
2014-08-06 18:50:51 +00:00
drh e8f2c9dc71 Fix two more harmless compiler warnings. Make sure the fts3_unicode2.c file
is in sync with mkunicode.tcl.

FossilOrigin-Name: a2a60307ea68a3230952a56cb65369ba0a208967
2014-08-06 17:49:13 +00:00
dan f2c9229f73 Up until now the fts4 "unicode61" tokenizer has treated all private use codepoints except the first and last of each of the three ranges as alphanumeric (eligible to be part of tokens). This commit fixes this so that all private use codepoints are considered alphanumeric. In other words, it fixes the handling of codepoints 0xE000, 0xF8FF, 0xF0000, 0xFFFFD, 0x100000 and 0x10FFFD.
FossilOrigin-Name: 6cfd9af5250029c0d275be027b4208c48954a8a1
2013-06-05 16:17:21 +00:00
dan 754d3adf7c Have the FTS unicode61 strip out diacritics when tokenizing text. This can be disabled by specifying the tokenizer option "remove_diacritics=0".
FossilOrigin-Name: 790f76a5898dad1a955d40edddf11f7b0fec0ccd
2012-06-06 19:30:38 +00:00
drh a9cfaba95a Omit the fts3 unicode character class routines from the build if fts3/4
is disabled.

FossilOrigin-Name: c00bb5d4601efc15933f222349e96a043b610a19
2012-05-28 12:22:00 +00:00
dan 7946c53009 If SQLITE_DISABLE_FTS3_UNICODE is defined, do not build the "unicode61" tokenizer.
FossilOrigin-Name: e71495a817b479bc23c5403d99255e3f098eb054
2012-05-26 18:28:14 +00:00
dan 501c74d3e1 Change the format of the tables used by sqlite3FtsUnicodeTolower() to make them a little smaller.
FossilOrigin-Name: b89d3834f6690073fca0fc22c18afa1fb280ea7d
2012-05-26 17:57:02 +00:00
dan 1c7016c9a5 Add special fast paths to sqlite3FtsUnicodeTolower() and Isalnum() for codepoints in the ASCII range.
FossilOrigin-Name: cf7b25d47687635a04f4347d45f135c686b9d758
2012-05-25 19:50:12 +00:00
dan 80ed5a56a5 Fix comments in generated file fts3_unicode2.c.
FossilOrigin-Name: 3dc567ef4702d9a63d78d11ff705cb7f7359f7a6
2012-05-25 18:48:48 +00:00
dan 3d403c71a8 Add an experimental tokenizer to fts4 - "unicode". This tokenizer works in the same way except that it understands unicode "simple case folding" and recognizes all characters not classified as "Letters" or "Numbers" by unicode as token separators.
FossilOrigin-Name: 0c13570ec78c6887103dc99b81b470829fa28385
2012-05-25 17:50:19 +00:00