where excessively large terms keep the tree from finding a single
root. A downside is that this could result in large interior nodes in
the presence of large terms, which may be prone to fragmentation,
though if the nodes were smaller that would translate into more levels
in the tree, which would also have that problem. (CVS 3510)
FossilOrigin-Name: 64b7e3406134ac4891113b9bb432ad97504268bb
( http://www.sqlite.org/cvstrac/chngview?cn=3486 ) broke test fts2a-5.3.
This change should make the expected result more obvious. (CVS 3489)
FossilOrigin-Name: cde383eb467de0d752e94a22cd2f890c2dc599cc
http://www.sqlite.org/cvstrac/tktview?tn=2036,35 describes some cases
where we were passing memset() a length which was the sizeof a
pointer, rather than the structure pointed to. Instead, wrap this
idiom up in CLEAR() and SCRAMBLE() macros. (CVS 3488)
FossilOrigin-Name: 5878add0839f9c5bec77caae2361ec20cb60b48b
distinguish reading from a static buffer from writing to a dynamic
buffer. This allows n-way doclist merging, and in-place merging of
segment leaf nodes, which together cut segment merge times in half. (CVS 3486)
FossilOrigin-Name: af5bfb986e39248abbfc6fff2e13c6f9e634a751
was writing out a segment made up of a single leaf node containing the
\0 header. LeafReader assumed that leaf nodes always contained at
least one term, so assertions would fail.
While it would be possible to support reading and merging empty
segments, there's no reason to do so. While this change could have
been done in writeZeroSegment(), I put it in leafWriterFlush() so that
it would work right if segmentMerge() created an empty segment, which
could happen with future changes to how deleted documents are handled. (CVS 3484)
FossilOrigin-Name: fed79beec7da24a26ae94494bdc0c98dd102bc06
updates. Groups of documents form segments which are encoded in a
btree layered over a table of blocks, with various tricks to make
merges fast. This performs 20x-25x faster than fts1 when loading the
Enron corpus, and is only slightly slower for queries. (CVS 3474)
FossilOrigin-Name: 85272b2f5394e37916afb1d509e7296810d976f5
docListRestrictColumn() generates a DL_POSITIONS doclist, which means
that after the first doclist is processed, the second doclist is
initialized as DL_POSITIONS, but with DL_POSITIONS_OFFSETS data.
(Note that DL_DEFAULT is now DL_POSITIONS, which masks this bug.) (CVS 3467)
FossilOrigin-Name: 144e3f11e22c6efd6f2d960599ab2d93542db406
We handle an UPDATE to a row by performing an UPDATE on the content table and by building new position lists for each term which appears in either the old or new versions of the row. We write these position lists all at once; this is presumably more efficient than a delete followed by an insert (which would first write empty position lists, then new position lists). (CVS 3434)
FossilOrigin-Name: 757fa22400b363212b4d5f648bdc9fcbd9a7f152
method of a virtual table. In FTS1, use strcmp instead of strcasecmp.
Ticket #1981. (CVS 3428)
FossilOrigin-Name: efa8fb32a596c7232bb1754b3231e4f2421df75b
table. Offsets are retrieved using a special "offsets" function whose
first argument is the magic column. Snippets will ultimately be retrieved
in the same way. (CVS 3427)
FossilOrigin-Name: 5e35dc1ffadfe7fa47673d052501ee79903eead9
a string containing byte offset information for all matching terms.
Also added a large test case based on SQLite mailing list entries. (CVS 3417)
FossilOrigin-Name: f25cfa1aec0e4c1fe07176039a1b7f4e6a2c66ec
names in the spec that are SQL keywords or have special characters, etc.
Also added support for additional control lines. Column names can be
followed by a type specifier (which is ignored.) (CVS 3410)
FossilOrigin-Name: adb780e0dc8bc7dcd1102efbfa4bc17eefdf968e
For now, each posting list stores position/offset information for multiple columns. We may implement separate posting lists for separate columns at some future point. (CVS 3408)
FossilOrigin-Name: 366a70b086c817bddecd83053472ec76ef20f309
surprising impact on performance, I believe because it keeps the index
smaller (by keeping rowids smaller), and also because it improves
locality in the table (deleting a row means we've already touched the
pages leading to that rowid). (CVS 3405)
FossilOrigin-Name: 2f5f6290c9ef99c7b060aecc4d996c976c50c9d7
arguments during initialization. Recognized arguments include a
tokenizer selector and a list of virtual table columns. (CVS 3403)
FossilOrigin-Name: 227dc3feb537e6efd5b0c1d2dad40193db07d5aa
in order to provide better error reporting. This is an interface change
for virtual tables. Prior virtual table implementations will need to be
modified and recompiled. (CVS 3402)
FossilOrigin-Name: f44b8bae97b6872524580009c96d07391578c388
New items for a term are merged with the term's segment 0 doclist,
until that doclist exceeds CHUNK_MAX. Then the segments are merged in
exponential fashion, so that segment 1 contains approximately
2*CHUNK_MAX data, segment 2 4*CHUNK_MAX, and so on. (CVS 3398)
FossilOrigin-Name: b6b93a3325d3e728ca36255c0ff6e1f63e03b0ac
making sure we always pass around ptr/len, but there were a few places
where we actually relied on nul-termination.
An earlier change had additionally changed appropriate
sqlite3_bind_text() calls to sqlite3_bind_blob(). I've found that
this changes what's actually stored in the database, so backed those
changes out. Also (and this is weird), I found that I could no longer
do straight-forward = queries against %_term.term at a command-line. (CVS 3379)
FossilOrigin-Name: 5844db1aa9c23a005c88104b084f68afb21891c7
strcspn() and a nul-terminated delimiter list, I just flagged
delimiters in an array and wrote things inline. Submitting this for
review separately because it's pretty standalone. (CVS 3378)
FossilOrigin-Name: 2631ceaeefaca3aa837e3b439399f13c51456914