postgres

Author	SHA1	Message	Date
Tom Lane	dbaec70c15	Rename and slightly redefine the default text search parser's "word" categories, as per discussion. asciiword (formerly lword) is still ASCII-letters-only, and numword (formerly word) is still the most general mixed-alpha-and-digits case. But word (formerly nlword) is now any-group-of-letters-with-at-least-one-non-ASCII, rather than all-non-ASCII as before. This is no worse than before for parsing mixed Russian/English text, which seems to have been the design center for the original coding; and it should simplify matters for parsing most European languages. In particular it will not be necessary for any language to accept strings containing digits as being regular "words". The hyphenated-word categories are adjusted similarly.	2007-10-23 20:46:12 +00:00
Tom Lane	bb36c51fcd	Fix several bugs in tsvectorin, including crash due to uninitialized field and miscomputation of required palloc size. The crash could only occur if the input contained lexemes both with and without positions, which is probably not common in practice. The miscomputation would definitely result in wasted space. Also fix some inconsistent coding around alignment of strings and positions in a tsvector value; these errors could also lead to crashes given mixed with/without position data and a machine that's picky about alignment. And be more careful about checking for overflow of string offsets. Patch is only against HEAD --- I have not looked to see if same bugs are in back-branch contrib/tsearch2 code.	2007-10-23 00:51:23 +00:00
Tom Lane	638bd34f89	Found another small glitch in tsearch API: the two versions of ts_lexize() are really redundant, since we invented a regdictionary alias type. We can have just one function, declared as taking regdictionary, and it will handle both behaviors. Noted while working on documentation.	2007-10-19 22:01:45 +00:00
Teodor Sigaev	689df1bc77	Fix crash of to_tsvector() function on huge input: compareWORD() function didn't return correct result for word position greate than limit. Per report from Stuart Bishop <stuart@stuartbishop.net>	2007-09-26 10:09:57 +00:00
Tom Lane	33b9c8bd68	Temporarily modify tsearch regression tests to suppress notice that comes out at erratic times, because it is creating a totally unacceptable level of noise in our buildfarm results. This patch can be reverted when and if the code is fixed to not issue notices during cache reload events.	2007-09-23 15:58:58 +00:00
Teodor Sigaev	8544110042	Avoid possibly-unportable initializer, per buildfarm warning per notice by Gregory Stark <stark@enterprisedb.com>	2007-09-18 15:03:23 +00:00
Teodor Sigaev	13553cbbff	Fix header's size of structs defines in ispell. Backpatch is needed for contrib version.	2007-09-11 12:57:05 +00:00
Teodor Sigaev	64def09592	Add regression tests for ispell, synonym and thesaurus dictionaries. Rename synonym.syn.sample and thesaurs.ths.sample to synonym_sample.syn and thesaurs_sample.ths accordingly to be able to use they in regression test. Ispell dictionary uses synthetic simple dictionary files.	2007-09-11 11:54:42 +00:00
Teodor Sigaev	53ef36cb4a	Fix recently introduced bugs about parsing ispell/hunspell files. In most cases it cause because of unneeded lowercasing of flags. Per experiment with regression checks with ispell dictionary.	2007-09-10 20:27:12 +00:00
Teodor Sigaev	d982daae0b	Change void* opaque argument to Datum type, add argument's name to PushFunction type definition. Per suggestion by Tome Lane <tgl@sss.pgh.pa.us>	2007-09-10 12:36:41 +00:00
Teodor Sigaev	83d0b9f3ca	Fixes from Heikki Linnakangas <heikki@enterprisedb.com>: Apparently it's a bug I introduced when I refactored spell.c to use the readline function for reading and recoding the input file. I didn't notice that some calls to STRNCMP used the non-lowercased version of the input line.	2007-09-10 10:39:56 +00:00
Teodor Sigaev	e5be89981f	Refactoring by Heikki Linnakangas <heikki@enterprisedb.com> with small editorization by me - Brake the QueryItem struct into QueryOperator and QueryOperand. Type was really the only common field between them. QueryItem still exists, and is used in the TSQuery struct as before, but it's now a union of the two. Many other changes fell from that, like separation of pushval_asis function into pushValue, pushOperator and pushStop. - Moved some structs that were for internal use only from header files to the right .c-files. - Moved tsvector parser to a new tsvector_parser.c file. Parser code was about half of the size of tsvector.c, it's also used from tsquery.c, and it has some data structures of its own, so it seems better to separate it. Cleaned up the API so that TSVectorParserState is not accessed from outside tsvector_parser.c. - Separated enumerations (#defines, really) used for QueryItem.type field and as return codes from gettoken_query. It was just accidental code sharing. - Removed ParseQueryNode struct used internally by makepol and friends. push*-functions now construct QueryItems directly. - Changed int4 variables to just ints for variables like "i" or "array size", where the storage-size was not significant.	2007-09-07 15:09:56 +00:00
Tom Lane	6d871a2538	Restrict tsearch config file base names to contain a-z, 0-9, and underscore, instead of the initial policy of whatever isalpha() likes. Per discussion.	2007-09-04 02:16:56 +00:00
Tom Lane	a13cefafb1	Fix synonym-dict breakage introduced in last patch :-(. Minor other cleanups.	2007-08-25 02:29:45 +00:00
Tom Lane	7351b5fa17	Cleanup for some problems in tsearch patch: - ispell initialization crashed on empty dictionary file - ispell initialization crashed on affix file with prefixes but no suffixes - stop words file was run through pg_verify_mbstr, with database encoding, but it's supposed to be UTF-8; similar bug for synonym files - bunch of comments added, typos fixed, and other cleanup Introduced consistent encoding checking/conversion of data read from tsearch configuration files, by doing this in a single t_readline() subroutine (replacing direct usages of fgets). Cleaned up API for readstopwords too. Heikki Linnakangas	2007-08-25 00:03:59 +00:00
Tom Lane	f4ccdb3a17	Fix VPATH-build problem in new tsearch makefile, per Chad Wagner.	2007-08-22 06:11:56 +00:00
Tom Lane	b77c6c7311	Whoops, missed updating dsynonym_init for new dictionary parameter method.	2007-08-22 04:13:15 +00:00
Tom Lane	d321421d0a	Simplify the syntax of CREATE/ALTER TEXT SEARCH DICTIONARY by treating the init options of the template as top-level options in the syntax. This also makes ALTER a bit easier to use, since options can be replaced individually. I also made these statements verify that the tmplinit method will accept the new settings before they get stored; in the original coding you didn't find out about mistakes until the dictionary got invoked. Under the hood, init methods now get options as a List of DefElem instead of a raw text string --- that lets tsearch use existing options-pushing code instead of duplicating functionality.	2007-08-22 01:39:46 +00:00
Tom Lane	140d4ebcb4	Tsearch2 functionality migrates to core. The bulk of this work is by Oleg Bartunov and Teodor Sigaev, but I did a lot of editorializing, so anything that's broken is probably my fault. Documentation is nonexistent as yet, but let's land the patch so we can get some portability testing done.	2007-08-21 01:11:32 +00:00

19 Commits