Commit Graph

125 Commits

Author SHA1 Message Date
shess
b6a75606ed Change prefix search from O(N*M) to O(NlogM). The previous code
linearly merged the doclists, so as the accumulated list got large,
things got slow (the M term, a fucntion of the number of documents in
the index).  This change does pairwise merges until a single doclist
remains.  A test search of 't*' against a database of RFC text
improves from 1m16s to 4.75s. (CVS 4599)

FossilOrigin-Name: feef1b15d645d638b4a05742f214b0445fa7e176
2007-12-07 23:47:53 +00:00
drh
8255feca02 The FTS3 amalgamation can now be appended to the SQLite amalgamation to
generate a single source file that contains both components. (CVS 4558)

FossilOrigin-Name: 0fc61f99b54bd269fcc011f448b9b971e902cb01
2007-11-24 00:41:52 +00:00
drh
a6f46e991e Do not require SQLITE_ENABLE_BROKEN_FTS2 if FTS2 is not enabled.
The same for FTS1.  Ticket #2777. (CVS 4556)

FossilOrigin-Name: f94cdcfd1171fd110ed9cd4c47f1fb5fa7e99ca9
2007-11-23 18:06:23 +00:00
drh
ac320cc14a Add a #include of sqlite3.h to fts3_hash.c. Tickets #2762 and #2777. (CVS 4555)
FossilOrigin-Name: c8485eb8bc62c810ec9f73e103468c57116fd94c
2007-11-23 18:01:07 +00:00
drh
613a0fe455 Changes fts3 to use only sqlite3_malloc() and not system malloc.
Ticket #2762. (CVS 4554)

FossilOrigin-Name: 460af6bb668094c99a1d4dc1540b44b6d1d036b6
2007-11-23 17:31:17 +00:00
shess
cd7274ceb0 Don't do anything when input doclists are both empty. Ticket #2774 (CVS 4546)
FossilOrigin-Name: 75cb46f82a6a95dbe9e279dede299bafa2e91cae
2007-11-16 00:23:07 +00:00
shess
adafd5747f Be a bit more susicious of invalid results from the tokenizer. (CVS 4514)
FossilOrigin-Name: deb8f56d3adea0025d28b8effabec7c7b7fe3026
2007-10-24 23:24:22 +00:00
shess
b6d78dc7bb fts3.c buildTerms() passes -1 for nInput. (CVS 4511)
FossilOrigin-Name: e87c883a1235ac47ee340a31051dcd5deb369d4e
2007-10-24 21:52:37 +00:00
danielk1977
1c1764ae66 Add the NEAR operator to fts3. (CVS 4502)
FossilOrigin-Name: aef7720e0bb49d52332ddebe6f698feb926ef7d7
2007-10-22 18:02:20 +00:00
drh
8a07c7a414 Cleanup the hash functions in FTS3. (CVS 4440)
FossilOrigin-Name: ac645c8f30aac0d98fc481260084c9bd3975a845
2007-09-20 12:53:27 +00:00
shess
961303c1e7 Drop the forced error from fts3.c and add forced errors to fts2.c and
fts1.c. (CVS 4427)

FossilOrigin-Name: fec6567a0f8a868cda9bba2a473491dfb17b6c88
2007-09-13 18:16:08 +00:00
shess
d83ae45639 Add an implicit (HIDDEN) docid column. This works as an alias to
rowid, similar to how things work in SQLite tables with INTEGER
PRIMARY KEY.  Add tests to verify operation. (CVS 4426)

FossilOrigin-Name: c8d2345200f9ece1af712543982097d0b6f348c7
2007-09-13 18:14:49 +00:00
shess
0ec85ae216 Mark the table-named column HIDDEN. Add tests to make sure it's
working as expected. (CVS 4425)

FossilOrigin-Name: ca669eaf1b4af441741129bee4af02f32a7c74b8
2007-09-13 18:12:09 +00:00
shess
999cc5d7e8 Fix memory leak reported by an fts1 user. Was losing a doclist on a
query error. (CVS 4347)

FossilOrigin-Name: eee025024972852990e704253d1443c1cefb376c
2007-08-30 19:56:37 +00:00
shess
27a770e044 Fix memory leak of InteriorReader.term. Comes up when doing queries
against large segments. (CVS 4315)

FossilOrigin-Name: 6c617bd89fc57881a2a308a6360e8ebb42835d46
2007-08-28 20:36:53 +00:00
shess
bae37537b0 Make comments and variable naming more consistent WRT rowid versus
docid/blockid.  This should have no code impact. (CVS 4281)

FossilOrigin-Name: 76f1e18ebc25d692f122784e87d202992c4cfed2
2007-08-23 20:28:49 +00:00
shess
6beeb0329a Fix fts3 to not have the VACUUM bug from fts2. %_content.docid is an
alias to fix the rowid for documents, %_segments.blockid is an alias
to fix the rowid for segment blocks.  Unit test for the problem. (CVS 4280)

FossilOrigin-Name: 6eb2d74a8cfce322930f05c97d4ec255f3711efb
2007-08-23 20:23:37 +00:00
shess
acce22f5c7 Copy fts2 to fts3, renaming, and replacing references to fts2 with
fts3, including capitalization variants. (CVS 4249)

FossilOrigin-Name: 216c91d2fc49792d9ff53596746f1162f5b7f8d4
2007-08-20 17:37:02 +00:00
shess
9fa502205d Convert fts2 to use sqlite3_prepare_v2() to prevent certain logic
errors around SQLITE_SCHEMA handling.  This also allows
sql_step_statement() and sql_step_leaf_statement() to be replaced with
sqlite3_step().

Also fix a logic error in flushPendingTerms() which was clearing the
term table in case of error.  This was wrong in the face of
SQLITE_SCHEMA.  Even though the change to sqlite3_prepare_v2() should
cause us not to see SQLITE_SCHEMA any longer, it was still a logic
error... (CVS 4205)

FossilOrigin-Name: 16730cb137eaf576b87cdc17913564c9c5c0ed82
2007-08-10 23:47:03 +00:00
drh
e6e4d6bb1a Fix some compiler warnings. (CVS 4196)
FossilOrigin-Name: 6cc15409ad6baefbe6e2214a4ac1cb3a0433f922
2007-08-05 23:52:05 +00:00
rse
e21733baa5 Fix ticket #2439: the FTS1 and FTS2 extensions use the non-standard,
unportable and highly deprecated <malloc.h> header on all platforms
except Apple Mac OS X. The <malloc.h> actually is never required on
any OS with an at least partly POSIX-conforming API as the malloc(3) &
friends functions officially live in <stdlib.h> since over 10 years.
Under some platform like FreeBSD the inclusion of <malloc.h> since a few
years even causes an "#error" and this way a build failure. So, just get
rid of the bad <malloc.h> usage in FTS1 and FTS2 extensions at all and
stick with <stdlib.h> there only. (CVS 4191)

FossilOrigin-Name: 3f9a666143a8aafa0b1a5d56ec68f69f2b3d6a21
2007-07-30 18:55:36 +00:00
shess
a2d04e9a0f Implement xRename() for fts1 so that it is possible to rename fts1 tables.
See http://www.sqlite.org/cvstrac/chngview?cn=4143 (CVS 4184)

FossilOrigin-Name: febf75f022b9414fc456ddf274d301f95d61e1b8
2007-07-25 00:56:09 +00:00
shess
443ecd036d Replicates http://www.sqlite.org/cvstrac/chngview?cn=4151 which
modified fts2:

Modify handling of SQLITE_SCHEMA in fts2 code. An SQLITE_SCHEMA error
may cause SQLite to reload the internal schema, deleting and
recreating v-table objects. So the sqlite3_vtab structure can be
deleted out from under a v-table implementation. (CVS 4183)

FossilOrigin-Name: f9020cffda02923ef45979bb447ec2e232086ad5
2007-07-25 00:38:05 +00:00
shess
f6e3624cfc Sorry, previous check-in included a last-minute "Did it really work?"
change :-). (CVS 4182)

FossilOrigin-Name: 5db25e369a1a4b5a4d87947abdbf25f96fe64807
2007-07-25 00:27:59 +00:00
shess
9f8a4b43ef Apply change 4095 to fts1. Fix snippet generation when the left-most
column of an fts table is used in the MATCH clause. Fix for ticket
#2429. (CVS 4181)

FossilOrigin-Name: c2ba3cc0f7ac9f5dfe5ffb554f9a1cd96b28335a
2007-07-25 00:25:20 +00:00
danielk1977
ab9749ebb9 Modify handling of SQLITE_SCHEMA in fts2 code. An SQLITE_SCHEMA error may cause SQLite to reload the internal schema, deleting and recreating v-table objects. So the sqlite3_vtab structure can be deleted out from under a v-table implementation. (CVS 4151)
FossilOrigin-Name: dee1a0fd28e8341af6523ab0c5628b671d7d2811
2007-07-02 10:16:49 +00:00
danielk1977
c033b64276 Implement xRename() for fts2 so that it is possible to rename fts2 tables. (CVS 4143)
FossilOrigin-Name: 488474fde753c5a7a14ed8f2fad7f16efd236491
2007-06-27 16:26:07 +00:00
danielk1977
9ff802627a Reorganize comments in fts2_tokenizer.h. No code changes. (CVS 4132)
FossilOrigin-Name: b331e30395e9fc90abe40ab802972a67648cf48e
2007-06-26 12:54:07 +00:00
danielk1977
08ada518ff Remove the unused EXTSRC variable from the non-configure makefile. (CVS 4129)
FossilOrigin-Name: bbdcf372c6f2144a62fba742b3f4bd6b2fe58593
2007-06-26 10:56:40 +00:00
danielk1977
4877ef2aae Fix an unitialized variable in fts2. (CVS 4128)
FossilOrigin-Name: c349cf942534357955f80fc2aa8c96206af97b78
2007-06-26 10:55:01 +00:00
danielk1977
576d3db541 Modify the non-configure build system to make it easier to build the library with the fts2 or icu extensions linked in. (CVS 4121)
FossilOrigin-Name: 02b23c4394da7efb82e9318146f10818b0f68b1f
2007-06-25 14:28:48 +00:00
drh
397aa141ed Put #ifdefs in fts2_tokenizer so that the build works even when FTS2
is omitted.  Add the SQLite blessing to the header comments on all FTS2
source files. (CVS 4120)

FossilOrigin-Name: c795e6fd8f01bcbc1967062632c13d4952abf4d8
2007-06-25 13:50:03 +00:00
drh
5665b3ea44 All the use of MySQL-style quoting in the FTS modules. Ticket #2446. (CVS 4119)
FossilOrigin-Name: 3be2a6d1c342454d93b05c38f3d9a960ab15dae2
2007-06-25 12:49:05 +00:00
danielk1977
46760820a1 Add a test that calls fts2_tokenizer() with an argument set via C code. (CVS 4118)
FossilOrigin-Name: fbcf2d75cd2b88d175c122477aa483f0771870e5
2007-06-25 12:05:40 +00:00
danielk1977
f86643b32f Add some tests for the fts2 icu tokenizer. (CVS 4117)
FossilOrigin-Name: b79ced3e0a26b0db13613073c847c2d2ba7e174e
2007-06-25 11:24:38 +00:00
danielk1977
24e1afa222 Add some documentation for user-defined fts2 tokenizers. (CVS 4116)
FossilOrigin-Name: 5a9eee86587219a68655d548864d129edec969ae
2007-06-25 09:52:31 +00:00
danielk1977
832a58a68c Extend fts2 so that user defined tokenizers may be added. Add a tokenizer that uses the ICU library if available. Documentation and tests to come. (CVS 4108)
FossilOrigin-Name: 68677e420c744b39ea9d7399819e0f376748886d
2007-06-22 15:21:15 +00:00
danielk1977
86889fc3c6 Fix snippet generation when the left-most column of an fts2 table is used in the MATCH clause. Fix for ticket #2429. (CVS 4095)
FossilOrigin-Name: fec56ad2ede53e3e202d9ad869a059eeb315796f
2007-06-20 06:23:54 +00:00
shess
401b80656d Minor comment edits from my prefix development client. No code changes. (CVS 4058)
FossilOrigin-Name: 6953cd0935b5526756ab745545420e40adc3c56d
2007-06-12 18:20:04 +00:00
danielk1977
b39fa65289 Add a README.txt file for the ICU extension. (CVS 4055)
FossilOrigin-Name: 7b6927829f18d39052e67eebca4275e7aa496035
2007-06-11 08:00:00 +00:00
shess
8a7de08a8b Fix overzealous fts2 assertions WRT rowid 0 or lower. Only check that
docids are ascending if there was a prior docid set for the doclist,
ignore the initial docid of 0. (CVS 4026)

FossilOrigin-Name: ed3a131f1d3fe51d1e79bdfe1bfafa55f825afa9
2007-05-21 21:59:18 +00:00
danielk1977
7de68a097e Add a version of the LIKE operator to the icu extension. Requires optimisation. (CVS 3939)
FossilOrigin-Name: 3e96105c1f084a4ab4dad4de6f4759e43fc497f7
2007-05-07 16:58:02 +00:00
danielk1977
2559136971 Add interface to configure SQLite to use ICU collation functions. (CVS 3936)
FossilOrigin-Name: b29a81b4fbb926fa09186340342848b9fe589033
2007-05-07 11:53:13 +00:00
danielk1977
a9808b31a8 Add the experimental create_collation_x() api. (CVS 3934)
FossilOrigin-Name: ff49d48f2f025898a0f4ace1fc227e1d367ea89f
2007-05-07 09:32:45 +00:00
danielk1977
83852acc44 Add the start of the ICU extension. (CVS 3931)
FossilOrigin-Name: f473e8526770b6a332dfde3e1fd1ddf8df493e9a
2007-05-06 16:04:11 +00:00
shess
290283fe69 Enable prefix-search in query-parsing and snippet generation. If the
character immediately after the end of a term is '*', that term is
marked for prefix matching.  Modify term comparison in
snippetOffsetsOfColumn() to respect isPrefix.  fts2n.test runs prefix
searching through some obvious test cases. (CVS 3893)

FossilOrigin-Name: 7c4c65924035d9f260f6b64eb92c5c6cf6c04b7b
2007-05-01 18:25:52 +00:00
shess
cc3e986643 Modify loadSegmentLeavesInt() to correctly handle prefix searching.
The new function docListUnion() is used to accumulate a union of the
hits for the matching terms, which will be merged across segments
using docListMerge(). (CVS 3891)

FossilOrigin-Name: 72c796307338c2751a91c30f6fb16989afbf3816
2007-05-01 17:14:59 +00:00
shess
0b6212090f Propagate prefix flag through implementation of doclist query code.
Also implement correct prefix-handling for traversal of interior nodes
of segment tree.  A given prefix can span multiple children of an
interior node, and from there the branches need to be followed in
parallel. (CVS 3889)

FossilOrigin-Name: cae844a01a1d87ffb00bba8b4e7b62a92e633aa9
2007-04-30 22:09:36 +00:00
shess
f055154108 Lift docListMerge() call out of loadSegmentLeavesInt() for prefix
search.  Doclists from multiple prefix matches will need a union merge
function, which will have to logically happen across a segment before
doclists are merged between segments. (CVS 3887)

FossilOrigin-Name: 7ddb82668906e33e2d6a796f2da1795032e036d5
2007-04-30 17:52:51 +00:00
shess
8ffcadb57e Break interior-node and leaf-node readers apart in loadSegment().
Previously, the code looped until the block was a leaf node as
indicated by a leading NUL.  Now the code loops until it finds a block
in the range of leaf nodes for this segment, then reads it using
LeavesReader.  This will make it easier to traverse a range of leaves
when doing a prefix search. (CVS 3884)

FossilOrigin-Name: 9466367d65f43d58020e709428268dc2ff98aa35
2007-04-27 22:02:57 +00:00