Commit Graph

257 Commits

Author SHA1 Message Date
Andrew Dunstan d6eaeb335b Adjust contrib/tsearch2 regression results to use XML tag and XML entity descriptions, as now used by core text search default parser. 2007-11-20 04:23:10 +00:00
Tom Lane f00d75b8d7 Add snb_ru_init(internal) to list of stub functions in tsearch2
compatibility module.  Needed to support loading of 8.1-era tsearch2
configuration data.
2007-11-16 00:34:54 +00:00
Bruce Momjian fdf5a5efb7 pgindent run for 8.3. 2007-11-15 21:14:46 +00:00
Tom Lane 4394c1b09c Resurrect the code for the rewrite(ARRAY[...]) aggregate function,
and put it into contrib/tsearch2 compatibility module.
2007-11-13 22:14:50 +00:00
Tom Lane abd183e4e7 Ooops, missed one file to remove. 2007-11-13 21:25:25 +00:00
Tom Lane 90e3f2aca7 Replace the now-incompatible-with-core contrib/tsearch2 module with a
compatibility package.  This supports importing dumps from past versions
using tsearch2, and provides the old names and API for most functions
that were changed.  (rewrite(ARRAY[...]) is a glaring omission, though.)

Pavel Stehule and Tom Lane
2007-11-13 21:02:29 +00:00
Bruce Momjian 33e2e02493 Add CVS version labels to all install/uninstall scripts. 2007-11-13 04:24:29 +00:00
Bruce Momjian 926bbab448 Make /contrib install/uninstall script consistent:
remove transactions
	use create or replace function
	make formatting consistent
	set search patch on first line

Add documentation on modifying *.sql to set the search patch, and
mention that major upgrades should still run the installation scripts.

Some of these issues were spotted by Tom today.
2007-11-11 03:25:35 +00:00
Peter Eisentraut 5f9869d0ee Use "alternative" instead of "alternate" where it is clearer. 2007-11-07 12:24:24 +00:00
Tom Lane a190eb3d7d Avoid possibly-unportable initializer, per buildfarm warning. 2007-07-15 22:57:48 +00:00
Tom Lane 7176e60bc8 Silence Solaris compiler warnings, per buildfarm. 2007-07-15 22:49:36 +00:00
Tom Lane af18d3d05c Fix compile warning on Solaris, per buildfarm. (Why have we got
three slightly different copies of this file?)
2007-07-15 22:40:28 +00:00
Tom Lane c11b8dcdbb Fix unportable use of isspace(), per buildfarm results. 2007-07-15 22:32:53 +00:00
Tom Lane b09c248bdd Fix PGXS conventions so that extensions can be built against Postgres
installations whose pg_config program does not appear first in the PATH.
Per gripe from Eddie Stanley and subsequent discussions with Fabien Coelho
and others.
2007-06-26 22:05:04 +00:00
Tom Lane 3e23b68dac Support varlena fields with single-byte headers and unaligned storage.
This commit breaks any code that assumes that the mere act of forming a tuple
(without writing it to disk) does not "toast" any fields.  While all available
regression tests pass, I'm not totally sure that we've fixed every nook and
cranny, especially in contrib.

Greg Stark with some help from Tom Lane
2007-04-06 04:21:44 +00:00
Teodor Sigaev 9477f12ea8 Fix caching of unsuccessful initialization of parser or configuration.
Per report from Listmail <lists@peufeu.com>
2007-04-02 11:42:04 +00:00
Tom Lane 0565579c5b Fix uninitialized-variable bug. 2007-03-28 01:28:34 +00:00
Teodor Sigaev 66daeb074b Add checking of end of line in parsing stopword list. Thanks to sharp eyes of Tom lane 2007-03-26 13:57:07 +00:00
Teodor Sigaev debb3aa8e9 Fix stopword and synonym files parsing bug in MSVC build, per report from
Magnus Hagander. Also, now it ignores space symbol after stopwords.
2007-03-26 12:25:35 +00:00
Teodor Sigaev bb8998a475 Fix parser bug on Windows with UTF8 encoding and C locale, the reason was
sizeof(wchar_t) = 2 instead of 4.
2007-03-22 15:58:24 +00:00
Tom Lane 9f652d430f Fix up several contrib modules that were using varlena datatypes in not-so-obvious
ways.  I'm not totally sure that I caught everything, but at least now they pass
their regression tests with VARSIZE/SET_VARSIZE defined to reverse byte order.
2007-02-28 22:44:38 +00:00
Tom Lane 234a02b2a8 Replace direct assignments to VARATT_SIZEP(x) with SET_VARSIZE(x, len).
Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with
VARSIZE and VARDATA, and as a consequence almost no code was using the
longer names.  Rename the length fields of struct varlena and various
derived structures to catch anyplace that was accessing them directly;
and clean up various places so caught.  In itself this patch doesn't
change any behavior at all, but it is necessary infrastructure if we hope
to play any games with the representation of varlena headers.
Greg Stark and Tom Lane
2007-02-27 23:48:10 +00:00
Teodor Sigaev 44655290cc Fix backend crash in parsing incorrect tsquery.
Per report from Jon Rosebaugh <jon@inklesspen.com>
2007-02-12 14:14:33 +00:00
Peter Eisentraut c138b966d4 Replace useless uses of := by = in makefiles. 2007-02-09 15:56:00 +00:00
Peter Eisentraut 086c189456 Normalize fgets() calls to use sizeof() for calculating the buffer size
where possible, and fix some sites that apparently thought that fgets()
will overwrite the buffer by one byte.

Also add some strlcpy() to eliminate some weird memory handling.
2007-02-08 11:10:27 +00:00
Peter Eisentraut f11aa82d03 Use memcpy() instead of strncpy() for copying into varlena structures. 2007-02-07 00:32:15 +00:00
Bruce Momjian 8b4ff8b6a1 Wording cleanup for error messages. Also change can't -> cannot.
Standard English uses "may", "can", and "might" in different ways:

        may - permission, "You may borrow my rake."

        can - ability, "I can lift that log."

        might - possibility, "It might rain today."

Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice.  Similarly, "It may crash" is better stated, "It might crash".
2007-02-01 19:10:30 +00:00
Teodor Sigaev d4c6da1527 Allow GIN's extractQuery method to signal that nothing can satisfy the query.
In this case extractQuery should returns -1 as nentries. This changes
prototype of extractQuery method to use int32* instead of uint32* for
nentries argument.
Based on that gincostestimate may see two corner cases: nothing will be found
or seqscan should be used.

Per proposal at http://archives.postgresql.org/pgsql-hackers/2007-01/msg01581.php

PS tsearch_core patch should be sightly modified to support changes, but I'm
waiting a verdict about reviewing of tsearch_core patch.
2007-01-31 15:09:45 +00:00
Neil Conway 8ff2bccee3 Squelch some VC++ compiler warnings. Mark float literals with the "f"
suffix, to distinguish them from doubles. Make some function declarations
and definitions use the "const" qualifier for arguments consistently.
Ignore warning 4102 ("unreferenced label"), because such warnings
are always emitted by bison-generated code. Patch from Magnus Hagander.
2007-01-26 17:45:42 +00:00
Teodor Sigaev f2a01b0d5a Fix localization support for multibyte encoding and C locale.
Slightly reworked patch from Tatsuo Ishii
2007-01-15 15:16:28 +00:00
Tom Lane 859b8dd51a Add a defense to prevent core dumps if 8.2 version of rank_cd() is used with
the 8.1 SQL function definition for it.  Per report from Rajesh Kumar Mallah,
such a DBA error doesn't seem at all improbable, and the cost of checking for
it is not very high compared to the cost of running this function.  (It would
have been better to change the C name of the function so it wouldn't be called
by the old SQL definition, but it's too late for that now in the 8.2 branch.)
2006-12-28 01:09:01 +00:00
Teodor Sigaev 49b64d346f Fix memory reallocation condition 2006-12-26 14:54:24 +00:00
Teodor Sigaev ca5bc1ae51 Fix convertion for 'PFX flag N num' 2006-12-21 17:35:28 +00:00
Teodor Sigaev 6cd9a58480 Fix core dump of ispell for case of non-successfull initialization.
Previous versions aren't affected.

Fix synonym dictionary init: string should be malloc'ed, not palloc'ed. Bug
introduced recently while fixing lowerstr().
2006-12-04 09:26:57 +00:00
Teodor Sigaev 3de2682a1e Fix lowercasing while parse OO dictionary 2006-11-23 17:35:14 +00:00
Teodor Sigaev 84151d0644 Avoid infinity calculations in rank_cd 2006-11-22 15:55:05 +00:00
Teodor Sigaev dd92a8c33f Fix type in return value 2006-11-21 18:31:28 +00:00
Teodor Sigaev 419fe7cd1b Fix bug http://archives.postgresql.org/pgsql-bugs/2006-10/msg00258.php.
Fix string's length calculation for recoding, fix strlower() to avoid wrong
assumption about length of recoded string (was: recoded string is no greater
that source, it may not true for multibyte encodings)
Thanks to Thomas H. <me@alternize.com> and Magnus Hagander <mha@sollentuna.net>
2006-11-20 14:03:30 +00:00
Neil Conway 9d6f26325f Fix two typos. 2006-11-08 19:06:15 +00:00
Teodor Sigaev 092ed294fc New README, forgotten when docs was updated 2006-11-08 16:00:29 +00:00
Teodor Sigaev bf028fa8a6 Add description of new features 2006-10-31 16:23:05 +00:00
Tom Lane e378f82e00 Make use of qsort_arg in several places that were formerly using klugy
static variables.  This avoids any risk of potential non-reentrancy,
and in particular offers a much cleaner workaround for the Intel compiler
bug that was affecting ginutil.c.
2006-10-05 17:57:40 +00:00
Tom Lane c48f2e3124 Improve error messages from to_tsquery per yesterday's discussion:
provide the bad input, and be sure to mention that we are talking about
a tsearch query.
2006-10-04 17:52:52 +00:00
Bruce Momjian f99a569a2e pgindent run for 8.2. 2006-10-04 00:30:14 +00:00
Bruce Momjian 26ffa627ac Update tsearch2 README.
Robert Treat
2006-10-02 22:32:10 +00:00
Tom Lane 41dcc65c0e Rename the uninstall scripts for contrib/lo and contrib/tsearch2 to
match the convention that foo's uninstall script is uninstall_foo.sql.
Also, stop installing lo_test.sql, which really ought to be made into
a regression test anyway (though it's unclear how to avoid a dependency
on the current OID counter...)
2006-09-11 15:14:46 +00:00
Tom Lane 684ad6a92f Rename contrib contains/contained-by operators to @> and <@, per discussion. 2006-09-10 17:36:52 +00:00
Bruce Momjian 0c4f2894f9 Use '' rather than \' for literal single quotes in strings in
/contrib/tsearch2.

Teodor Sigaev
2006-09-02 22:03:30 +00:00
Teodor Sigaev 72a3582522 Add description of tsvector type layout 2006-08-29 13:57:34 +00:00
Teodor Sigaev 3a214ab0f1 Remove pos comparison in silly_cmp_tsvector(): it is not a semantically significant 2006-08-29 13:39:20 +00:00
Teodor Sigaev 9711782628 Fix incorrect length of lexemes in silly_cmp_tsvector() 2006-08-29 13:31:54 +00:00
Teodor Sigaev 74dbba701f Fix regression tests: after changing comparing function
order is changed.
2006-08-25 07:39:08 +00:00
Teodor Sigaev 8f91e2b607 Fix compare bug for tsvector: problem was in aligment. Per Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> and Phil Frost <indigo@bitglue.com> 2006-08-24 17:37:34 +00:00
Tom Lane a7143b3088 Fix some makefiles that fail to yield good results from 'make -qp'.
This doesn't really matter for ordinary building of Postgres, but it's
useful for automated checks, such as my just-committed pgcheckdefines.
2006-07-15 03:33:14 +00:00
Tom Lane ae643747b1 Fix a passel of recently-committed violations of the rule 'thou shalt
have no other gods before c.h'.  Also remove some demonstrably redundant
#include lines, mostly of <errno.h> which was added to c.h years ago.
2006-07-14 05:28:29 +00:00
Bruce Momjian 66c15dfda1 Adjust /contrib for new include file contents. 2006-07-13 16:57:31 +00:00
Bruce Momjian ac230e7431 Alphabetically order reference to include files, "S"-"Z". 2006-07-11 18:26:11 +00:00
Bruce Momjian 0ff3461bcc Alphabetically order reference to include files, "N" - "S". 2006-07-11 17:26:59 +00:00
Teodor Sigaev 234163649e GIN improvements
- Replace sorted array of entries in maintenance_work_mem to binary tree,
  this should improve create performance.
- More precisely calculate allocated memory, eliminate leaks
  with user-defined extractValue()
- Improve wordings in tsearch2
2006-07-11 16:55:34 +00:00
Bruce Momjian fa601357fb Sort reference of include files, "A" - "F". 2006-07-11 16:35:33 +00:00
Bruce Momjian c5133e5920 Allow /contrib include files to compile on their own. 2006-07-10 22:06:11 +00:00
Teodor Sigaev 1f7ef548ec Changes
* new split algorithm (as proposed in http://archives.postgresql.org/pgsql-hackers/2006-06/msg00254.php)
  * possible call pickSplit() for second and below columns
  * add spl_(l|r)datum_exists to GIST_SPLITVEC -
    pickSplit should check its values to use already defined
    spl_(l|r)datum for splitting. pickSplit should set
    spl_(l|r)datum_exists to 'false' (if they was 'true') to
    signal to caller about using spl_(l|r)datum.
  * support for old pickSplit(): not very optimal
    but correct split
* remove 'bytes' field from GISTENTRY: in any case size of
  value is defined by it's type.
* split GIST_SPLITVEC to two structures: one for using in picksplit
  and second - for internal use.
* some code refactoring
* support of subsplit to rtree opclasses

TODO: add support of subsplit to contrib modules
2006-06-28 12:00:14 +00:00
Teodor Sigaev 04e9704b9e Now ispell dictionary can eat dictionaries in MySpell format,
used by OpenOffice. Dictionaries are placed at
http://lingucomponent.openoffice.org/spell_dic.html
Dictionary automatically recognizes format of files.

Warning. MySpell's format has limitation with compound
word support: it's impossible to mark affix as
compound-only affix. So for norwegian, german etc
languages it's recommended to use original ispell format.
For that reason I don't want to remove my2ispell
scripts, it's has workaround at least for norwegian language.
2006-06-09 13:25:59 +00:00
Teodor Sigaev 92bcb5abe0 Allow do not lexize words in substitution.
Docs will be submitted some later, now it's at
 http://www.sai.msu.su/~megera/oddmuse/index.cgi/Thesaurus_dictionary
2006-06-06 16:25:55 +00:00
Teodor Sigaev a513ce2dff Fix wrong NOTICE/ERROR levels 2006-06-02 18:03:06 +00:00
Teodor Sigaev efe1d427da Distinguish between stop-word recognized in thesaurus_lexize() 2006-06-02 17:55:40 +00:00
Teodor Sigaev c7faf45160 Add more strict check of stop and non-recognized words,
allow only recognized words in thezaurus configuration file.
2006-06-02 15:35:42 +00:00
Teodor Sigaev c269f0f1e2 fix comparison with SPI_processed 2006-05-31 14:53:41 +00:00
Teodor Sigaev 22505f4703 Add thesaurus dictionary which can replace N>0 lexemes by M>0 lexemes.
It required some changes in lexize algorithm, but interface with
dictionaries stays compatible with old dictionaries.

Funded by Georgia Public Library Service and LibLime, Inc.
2006-05-31 14:05:31 +00:00
Tom Lane a0ffab351e Magic blocks don't do us any good unless we use 'em ... so install one
in every shared library.
2006-05-30 22:12:16 +00:00
Bruce Momjian 19892feb3c Back out \' change for tsearch2, broke regression tests. 2006-05-19 04:39:47 +00:00
Bruce Momjian cc84163fa9 Use SQL standard '' rather than \' in /contrib. Backpatch to 8.1.X. 2006-05-19 02:38:47 +00:00
Teodor Sigaev 8a3631f8d8 GIN: Generalized Inverted iNdex.
text[], int4[], Tsearch2 support for GIN.
2006-05-02 11:28:56 +00:00
Teodor Sigaev e30df619cd Fix stupid mistake in rank_cd_def cleanup 2006-04-10 09:56:52 +00:00
Bruce Momjian f3d99d160d Add CVS tag lines to files that were lacking them. 2006-03-11 04:38:42 +00:00
Teodor Sigaev 38c4fe87ac Significantly improve ranking:
1) rank_cd now use weight of lexemes
2) rank_cd and rank can use any combination of normalization methods:
        no normalization
        normalization by log(length of document)
        -----/------- by length of document
        -----/------- by number of unique word in document
        -----/------- by log(number of unique word in document)
        -----/------- by number of covers (only rank_cd)

Improve cover's search.

TODO: changes in documentation
2006-03-02 19:07:19 +00:00
Neil Conway 8e5a10d46c This patch makes the error message strings throughout the backend
more compliant with the error message style guide. In particular,
errdetail should begin with a capital letter and end with a period,
whereas errmsg should not. I also fixed a few related issues in
passing, such as fixing the repeated misspelling of "lexeme" in
contrib/tsearch2 (per Tom's suggestion).
2006-03-01 06:30:32 +00:00
Peter Eisentraut 7f4f42fa10 Clean up CREATE FUNCTION syntax usage in contrib and elsewhere, in
particular get rid of single quotes around language names and old WITH ()
construct.
2006-02-27 16:09:50 +00:00
Teodor Sigaev dde9457294 Fixing and improve compound word support. This changes cannot be applied to
previous version iwthout recreating tsvector fields...

Thanks to Alexander Presber <aljoscha@weisshuhn.de> to discover a problem.
2006-02-20 17:51:05 +00:00
Tom Lane b35fdaaa1a Clean up some signedness warnings. 2006-02-10 15:57:58 +00:00
Teodor Sigaev 01f2172ec1 Allow "'" symbol in affixes ("'s" affix in english): it was diallowed during
multibyte support work.
Add line number to error output during affix file parsing.
2006-02-10 12:56:14 +00:00
Teodor Sigaev 011c520cb6 renew output of regression test accordingly to
http://archives.postgresql.org/pgsql-committers/2006-02/msg00089.php
2006-02-10 11:18:40 +00:00
Teodor Sigaev 46a25ce6a9 1 Fix bug with very short word: prefix and suffix might be overlapped,
sorry but fix can't be applyed to previous version: it's require
  refill tsvector...
2 Small optimize of load time for huge dictionaries
3 use palloc instead of malloc during load dict file
2006-02-09 18:04:20 +00:00
Teodor Sigaev a6fefc866c Check number of affixes to prevent core dump with zero number of affixes 2006-02-06 15:45:34 +00:00
Teodor Sigaev 5e2707c45f Snowball multibyte. It's a pity, but snowball sources is very diferent for multibyte and
singlebyte encodings, so we should have snowball for every encodings.

I hope that finalize multibyte support work in tsearch2, but testing is needed...
2006-01-27 16:32:31 +00:00
Teodor Sigaev 80324fb1e3 Fix typeing as Tom suggest 2006-01-23 14:24:06 +00:00
Tom Lane 33feb55c47 Replace bitwise looping with bytewise looping in hemdistsign and
sizebitvec of tsearch2, as well as identical code in several other
contrib modules.  This provided about a 20X speedup in building a
large tsearch2 index ... didn't try to measure its effects for other
operations.  Thanks to Stephan Vollmer for providing a test case.
2006-01-20 22:46:16 +00:00
Teodor Sigaev 7ac8a4be89 Multibyte encodings support for ISpell dictionary 2005-12-21 13:05:49 +00:00
Teodor Sigaev cb4ea994c6 Improve support of multibyte encoding:
- tsvector_(in|out)
- tsquery_(in|out)
- to_tsvector
- to_tsquery, plainto_tsquery
- 'simple' dictionary
2005-12-12 11:10:12 +00:00
Teodor Sigaev faacdab101 Improve tag recognizing 2005-12-08 09:11:19 +00:00
Teodor Sigaev 9551ab2fe9 Fix small memory leak 2005-12-07 13:30:15 +00:00
Teodor Sigaev 4f94b49a31 Improve word parser.
- allow ~ in filenames
 - -8.2.1 now is '-' and '8.2.1' instead of '-8.2' '.' '3'
 - '.text' now is not a file
2005-12-07 13:12:54 +00:00
Teodor Sigaev e8c81e179e Improve word parser.
- improve file and path recognition
 - fix misspeling
 - improve tag recognition
2005-12-05 18:13:22 +00:00
Bruce Momjian 436a2956d8 Re-run pgindent, fixing a problem where comment lines after a blank
comment line where output as too long, and update typedefs for /lib
directory.  Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).

Backpatch to 8.1.X.
2005-11-22 18:17:34 +00:00
Teodor Sigaev 3c6cd8a113 Fixes motivated by snake and spoonbill pgbuildfarm members 2005-11-22 09:01:35 +00:00
Teodor Sigaev 62699337bc remove forgotten // comments 2005-11-21 18:00:52 +00:00
Teodor Sigaev c52795d18a Text parser rewritten:
- supports multibyte encodings
        - more strict rules for lexemes
        - flex isn't used
Add:
        - tsquery plainto_tsquery(text)
          Function makes tsquery from plain text.
        - &&, ||, !! operation for tsquery for combining
          tsquery from it's parts:  'foo & bar' || 'asd' => 'foo & bar | asd'
2005-11-21 12:27:57 +00:00
Tom Lane 1d0d8d3c38 Mop-up for nulls-in-arrays patch: fix some places that access array
contents directly.
2005-11-18 02:38:24 +00:00
Tom Lane cecb607559 Make SQL arrays support null elements. This commit fixes the core array
functionality, but I still need to make another pass looking at places
that incidentally use arrays (such as ACL manipulation) to make sure they
are null-safe.  Contrib needs work too.
I have not changed the behaviors that are still under discussion about
array comparison and what to do with lower bounds.
2005-11-17 22:14:56 +00:00
Teodor Sigaev bad1a5c217 Use postgres-wide macros BITS_PER_BYTE instead self-definenig macros, also use it for calculating bit length of TPQTGist 2005-11-14 14:44:06 +00:00