Commit Graph

71 Commits

Author SHA1 Message Date
dan
de30fb5fc2 Store the values of any UNINDEXED columns of a contentless fts5 table persistently in the database. Warning: This currently creates a (technically) incompatible file-format for contentless fts5 tables that have UNINDEXED columns.
FossilOrigin-Name: dcacb1a8ef359b4507b4733356d3150ba5dc105cc9867c103d16a0908a1a9f64
2024-09-03 18:55:38 +00:00
dan
342984075b Add tests to restore coverage of fts5_config.c.
FossilOrigin-Name: 9d971b31df7ad4d68eb348f95d8f996071cf87d41c47033bde3fcc4dba732e06
2024-08-16 19:05:47 +00:00
dan
ff6ab9dd2b Update the fts5_tokenizer_v2 API so that the locale is passed as parameter to xTokenize(), instead of via a separate call to xSetLocale().
FossilOrigin-Name: 03e63ed24e7a08817341e59b075ae2d4e3f7a5e5b37e0e6138359d5fd29a5e9e
2024-08-10 18:59:36 +00:00
dan
e317e7f4b4 Change things so that locale=1 is required to write fts5_locale() values to an fts5 table, and so that blobs may not be stored in indexed (i.e. not UNINDEXED) columns of these tables.
FossilOrigin-Name: c98ccc12169419b8b27ead89ef0665de40320277c5daa748b80869337419e43e
2024-08-02 21:06:13 +00:00
dan
5bd8cc7dd5 Fix various problems with the code on this branch.
FossilOrigin-Name: 8bd4ae7e95c7b6ce34db5ea705dc136e742a22f333d0e7370b485ebd736b5ec2
2024-07-31 20:49:00 +00:00
dan
2ec78c0e4b Add the fts5_locale() function, and begin adding the related functionality to fts5.
FossilOrigin-Name: 8839ef7cfb49239e7f1c4812a53a93a672827c88d6921408b1d5062b352c87cc
2024-07-26 20:50:33 +00:00
mistachkin
6593b340ff Fix harmless compilation issues seen with MSVC.
FossilOrigin-Name: 816d4749384c7f398912c905a16c83b88f4c55632050b4c6117c61301d1a53e1
2024-06-05 20:50:39 +00:00
drh
7da33383c7 Remove an unused parameter from fts5ConfigParseSpecial(). Compiler-warning
fix only - no functional changes.

FossilOrigin-Name: c08dd245f7706f2fd2269d700be480477619a722e27e6b439462ae543302c49f
2024-05-29 15:16:17 +00:00
dan
12b205c637 Allow existing fts5 tables to be dropped even if the associated tokenizer is not available.
FossilOrigin-Name: 69ef47eeee8b53684c321393be34f03600694fbc86377f8720ff80307846aff6
2024-05-13 20:06:08 +00:00
dan
03204e9106 Add the tokendata=1 option to ignore trailing token-data when querying an fts5 table.
FossilOrigin-Name: 122935182ad5869ce3a4c6d796c38a0509f6f3384dd1b3e60a3f2f0f366cc5f5
2023-10-11 21:08:12 +00:00
dan
3f874b58fb Change the name of the fts5 'delete-automerge' option to 'deletemerge'. And add tests for it.
FossilOrigin-Name: 1079300db2a7d1fbc86a01c215c234a3af64889c5396e6da63ff4f3c7efae4c5
2023-07-25 15:48:58 +00:00
dan
24730de8d1 Add the fts5 'delete-automerge' integer option. A level is eligible for auto-merging if it has a greater than or equal percentage of its entries deleted by tombstones than the 'delete-automerge' option. Default value is 10.
FossilOrigin-Name: b314be66b9ac0190b5373b3b6baec012382bc588c2d86c2edab796669a4303c3
2023-07-24 19:13:06 +00:00
dan
6788c7b7c0 Begin adding support for deleting rows from contentless fts5 tables.
FossilOrigin-Name: e513bea84dfaf2280f7429c9a528b3a1354a46c36e58ab178ca45478975634e0
2023-07-10 20:44:09 +00:00
dan
3f23eb6813 Add an assert() to fts5_config to ensure that a potential OOM is being handled correctly.
FossilOrigin-Name: fe9c207657400f9d9f4e822eb658157bc147ed538e2701322f6f973933f023ed
2023-05-03 13:57:57 +00:00
dan
015020cd1a Add the 'secure-delete' option to fts5. Used to configure fts5 to aggressively remove old full-text-index entries belonging to deleted or updated rows.
FossilOrigin-Name: 4240fd09b717dbc69dffe3b88ec9149777ca4c3efa12f282af65be3af6fa5bb0
2023-04-12 17:40:44 +00:00
drh
7a3b4451a1 Fixes for harmless static-analyzer warnings. This also makes the code easier
for humans to understand.

FossilOrigin-Name: 36177a62feeb4fa93ab6e3c6f4dbe1ddcf63bb02f93284abab979da0261b218e
2021-10-05 17:41:12 +00:00
drh
f1f12661c3 Avoid taking the address of a NULL pointer following an OOM in FTS5. Doing
so is harmless in actual practice, but it technically UB so we want to
avoid it.

FossilOrigin-Name: 1cfcd9dceb56b5987e6900a36a0ec092f0e1b13a7e754b8c3d8efb943e5bcc66
2021-04-12 18:32:33 +00:00
dan
33a99fad08 Add experimental unicode-aware trigram tokenizer to fts5. And support for LIKE and GLOB optimizations for fts5 tables that use said tokenizer.
FossilOrigin-Name: 0d7810c1aea93c0a3da1ccc4911dbce8a1b6e1dbfe1ab7e800289a0c783b5985
2020-09-30 20:35:37 +00:00
dan
db5ed35609 Avoid a buffer overread in fts5 that could occur when parsing corrupt configuration records.
FossilOrigin-Name: 355afd77df21a2265871ca6d075f26b1fa121c7c2682cf512281944ff0c2186d
2019-12-10 03:40:11 +00:00
dan
ae55737fbf Do not allow users to effectively disable fts5 crisismerge operations by setting the crisismerge threshold to higher than the maximum allowable segment b-trees on a single level. Fix for [d392017c].
FossilOrigin-Name: 86e497209217abb7bcb491a023cd353f3c7c9c103ebd9f58dd8661b12cf3694c
2019-10-09 18:36:32 +00:00
dan
a6bd1871d1 Disallow fts5 page sizes greater than 65536 bytes - as there are 16-bit offsets used in the page header.
FossilOrigin-Name: 75775c5ab44e497cb19be10397229637f1374f05c3244e8f92d6c54fcea94f5f
2019-10-09 15:26:45 +00:00
dan
b186a622ee Disallow page-sizes smaller than 32 bytes in fts5. Also ensure the fts5 integrity-check works even when "PRAGMA reverse_unordered_selects" is true. Fix for [265e935b26].
FossilOrigin-Name: 8ab0aebdb3c2d6fb3160b2c58ce6cc0495a6ddd960878a6395958c837f3d1b71
2019-10-07 20:36:18 +00:00
dan
685b2ee0c3 Allow fts5 to filter on multiple MATCH clauses in a single scan.
FossilOrigin-Name: 9d418a7a491761eeb38a70898677a493e2631e5d62e75ee88431f52d3dfd2344
2019-09-12 19:38:40 +00:00
mistachkin
065f3bf4f2 Fix various harmless compiler warnings seen with MSVC.
FossilOrigin-Name: 1c0fe5b5763fe5cbace9773dcdab742e126d0bd035ab13d61f9d134afa0afc0c
2019-03-20 05:45:03 +00:00
drh
2d77d80a65 Use 64-bit math to compute the sizes of memory allocations in extensions.
FossilOrigin-Name: ca67f2ec0e294384c397db438605df1b47aae5f348a8de94f97286997625d169
2019-01-08 20:02:48 +00:00
dan
e8c20120ce Fix handling of strings that contain zero tokens in fts5. And other problems found by fuzzing.
FossilOrigin-Name: 72b3ff0f0df83e62adda6584b4281cf086d45e45
2016-03-12 16:32:16 +00:00
dan
4dbc65b29a Add an incremental optimize capability to fts5. Make the 'merge' command independent of the 'automerge' settings.
FossilOrigin-Name: 556671444c03e3afca072d0f5e9bea2657de6fd3
2016-03-09 20:54:14 +00:00
drh
df3a907ecc Add JSON1 and FTS5 to the set of extensions subject to close compiler warning
analysis.  Fix some warnings in each.   More (harmless) warnings still exist
in FTS5.

FossilOrigin-Name: cfe2eb88b504f5e9b1351022036641b1ac4c3e78
2016-02-11 15:37:18 +00:00
dan
8631402e6a Add further tests for fts5. Fix some problems with detail=col mode and auxiliary functions.
FossilOrigin-Name: de77d6026e8035c505a704e7b8cfe5af6579d35f
2016-01-16 18:58:51 +00:00
dan
f705e9deab Fix compiler warnings in fts5.
FossilOrigin-Name: 5a343cc0336bba056df4449e6cd2e3fb9e75a105
2016-01-14 14:15:54 +00:00
dan
9f44deed93 Change the name of the offsets=0 option to "detail=column". Have the xInst, xPhraseFirst and other API functions work by parsing the original text for detail=column tables.
FossilOrigin-Name: 228b4d10e38f7d70e1b008c3c9b4a1ae3e32e30d
2015-12-28 19:55:00 +00:00
dan
b12dc84fbb Add the "offsets=0" option to fts5, to create a smaller index without term offset information. A few things are currently broken on this branch.
FossilOrigin-Name: 40b5bbf02a824ca73b33aa4ae1c7d5f65b7cda10
2015-12-17 20:36:13 +00:00
dan
f5d8c58950 Fix the fts5 "prefix=" option to match the documentation (space separated list, multiple prefix= options supported). The undocumented comma-separated format (compatible with fts4) still works.
FossilOrigin-Name: 11eb8e877e2ba859ef6b44318f286597186dfaf2
2015-11-25 11:56:24 +00:00
dan
dbbda39453 Have fts5 load its configuration from the xConnect() method is invoked. This ensures that the very first query run uses the correct value of the 'rank' option.
FossilOrigin-Name: 33e6606f5e497e81119ec491cf2370f60bddafc0
2015-11-06 12:50:57 +00:00
dan
d82211db56 Add the 'hashsize' configuration option to fts5, for configuring the amount of memory allocated to the in-memory hash table while writing.
FossilOrigin-Name: 445480095e6877cce8220b1c095f334bbb04c1c3
2015-11-05 18:09:16 +00:00
mistachkin
cdabd7bd50 Fix harmless compiler warnings.
FossilOrigin-Name: 1c46c194a2da24fe613d77b5a8d727cc2fc9faa4
2015-10-14 20:34:57 +00:00
dan
fe8e2eba0a Remove the 0x00 terminators from the end of doclists stored on disk.
FossilOrigin-Name: 00d990061dec3661b0376bd167082942d5563bfe
2015-09-08 19:55:26 +00:00
dan
ee0c0a8de3 Another change to the fts5 tokenizer API.
FossilOrigin-Name: fc71868496f45f9c7a79ed2bf2d164a7c4718ce1
2015-08-29 15:44:27 +00:00
dan
57e0add3f9 Change the fts5 tokenizer API to allow more than one token to occupy a single position within a document.
FossilOrigin-Name: 90b85b42f2b2dd3e939b129b7df2b822a05e243d
2015-08-28 19:56:47 +00:00
dan
e3229c19cb Use a WITHOUT ROWID table to index fts5 btree leaves. This is faster to query and only slightly larger than storing btree nodes within an intkey table.
FossilOrigin-Name: 862418e3506d4b7cca9c44d58c2eb9dc915d75c9
2015-07-15 19:46:02 +00:00
dan
3f09beda45 Remove "#ifdef SQLITE_ENABLE_FTS5" from individual fts5 source files. Add a single "#if !defined(SQLITE_CORE) || defined(SQLITE_ENABLE_FTS5)" to fts5.c.
FossilOrigin-Name: 7819002ed85497bbd0f9cf4d39df641573324436
2015-07-02 15:52:21 +00:00
dan
6394d99a0e Fix a segfault that could follow an OOM error in fts5.
FossilOrigin-Name: 713239b8cf2900e8f7d97646c7f350248b4e804f
2015-06-26 20:08:25 +00:00
mistachkin
ed52f9ff48 Initial changes to get FTS5 working with MSVC.
FossilOrigin-Name: ef2052f81e33ca98e85a60f8a78cdd19a7c1c35c
2015-06-26 04:34:36 +00:00
dan
51ef0f57c7 Improve test coverage of fts5.
FossilOrigin-Name: df5ccea80e8f0da83af5e595b539687006085120
2015-06-23 18:47:55 +00:00
dan
bcc2f04c68 Add the "columnsize=" option to fts5, similar to fts4's "matchinfo=fts3".
FossilOrigin-Name: aa12f9d9b79c2f523fd6b00e47bcb66dba09ce0c
2015-06-09 20:58:39 +00:00
dan
27aac274b9 Improve test coverage of fts5_config.c.
FossilOrigin-Name: 47dbfadb994814c9349d4c9c113b862c2e97c01a
2015-05-18 17:50:17 +00:00
dan
76724372ae Improve the error message returned by FTS5 if it encounters an unknown file format.
FossilOrigin-Name: f369caec145f311bb136cf7af144e2695badcb9b
2015-05-08 09:21:05 +00:00
dan
4591334dd4 Change to storing all keys in a single merge-tree structure instead of one main structure and a separate one for each prefix index. This is a file-format change. Also introduce a mechanism for managing file-format changes.
FossilOrigin-Name: a684b5e2d9d52cf4700e7e5f9dd547a2ba54e8e9
2015-05-07 19:29:46 +00:00
dan
7c479d51e5 Reorganize some of the fts5 expression parsing code. Improve test coverage of the same.
FossilOrigin-Name: c4456dc5f5f8f45f04e3bbae53b6bcc209fc27d5
2015-05-02 20:35:24 +00:00
dan
7b2ec1ae41 Improve fts5 tests.
FossilOrigin-Name: c1f07a3aa98eac87e2747527d15e5e5562221ceb
2015-04-29 20:54:08 +00:00