From de085820bf7f9dbff4b6c427a7a3689b7909c690 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Wed, 14 Nov 2007 03:26:24 +0000 Subject: [PATCH] Update discussion of tsearch2 migration. I'm not entirely sure about the division of material between here and the tsearch2 contrib page, but at least it's not obviously unfinished any more. --- doc/src/sgml/textsearch.sgml | 122 ++++++++++++++--------------------- 1 file changed, 50 insertions(+), 72 deletions(-) diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml index e556c6dd78..0ba401c2a4 100644 --- a/doc/src/sgml/textsearch.sgml +++ b/doc/src/sgml/textsearch.sgml @@ -1,4 +1,4 @@ - + Full Text Search @@ -3489,99 +3489,77 @@ Parser: "pg_catalog.default" Migration from Pre-8.3 Text Search - This area needs lots of work. Here is a quick list of known issues: + Applications that used the contrib/tsearch2 add-on module + for text searching will need some adjustments to work with the + built-in features: - + - The old contrib/tsearch2 objects must be removed from - the pg_dump output from a pre-8.3 database. While many of them won't - load for lack of a tsearch2.so library, some do and cause problems. - We have a working perl script for doing this with a custom- or tar-format - backup, but there is a proposal to incorporate the functionality directly - into pg_restore. Neither approach will help for pg_dumpall output. + Some functions have been renamed or had small adjustments in their + argument lists, and all of them are now in the pg_catalog + schema, whereas in a previous installation they would have been in + public or another non-system schema. There is a new + version of contrib/tsearch2 (see ) + that provides a compatibility layer to solve most problems in this + area. - The old dump may include schema-qualified references to the old - contrib/tsearch2 objects; for example public.tsvector - columns in table definitions. These will fail since the objects - are now in the pg_catalog schema. Given current pg_dump behavior - this will happen only for tables that are in a different schema - from the tsearch2 objects; which makes it more likely to bite - people who carefully put their tsearch2 objects in a - non-public schema. - - - - Question: will restore-time failures of this type happen for - any objects other than the tsvector and tsquery datatypes? - - - - The basic alternatives for fixing this seem to involve creating - a dummy linkage, such as a public.tsvector domain linking to the - base pg_catalog.tsvector type (which only helps for the datatypes); - or stripping the schema references out of the dump. We could - just recommend that users do this manually, or try to provide - some tools to help. + The old contrib/tsearch2 functions and other objects + must be suppressed when loading pg_dump + output from a pre-8.3 database. While many of them won't load anyway, + a few will and then cause problems. One simple way to deal with this + is to load the new contrib/tsearch2 module before restoring + the dump; then it will block the old objects from being loaded. - We have renamed the built-in tsvector update triggers, and changed - their arguments too. This will result in CREATE TRIGGER commands - failing during load, which can be ignored, but users will need to - re-issue them with suitable argument adjustment. We probably - can't automate that for them. Also, the old tsearch2 trigger - function offered an option to invoke functions, which was removed - as being a security hole. Users who were relying on that will need to - write custom trigger functions as a substitute. I think all we - can do here is document what to do to fix it. + Text search configuration setup is completely different now. + Instead of manually inserting rows into configuration tables, + search is configured through the specialized SQL commands shown + earlier in this chapter. There is not currently any automated + support for converting an existing custom configuration for 8.3; + you're on your own here. - We have renamed a number of other functions besides the triggers, - compared to the tsearch2 versions. This seems unlikely to cause - any problems during dump/reload but it will require adjustments in - the bodies of stored procedures and in client application code. - Again, not much to do except document it. - - + Most types of dictionaries rely on some outside-the-database + configuration files. These are largely compatible with pre-8.3 + usage, but note the following differences: - - - Configuration setup is completely different now. Can we provide - any automated assistance for translating an old custom setup? - It probably can't be 100% automatic in any case, so maybe documentation - is the best we can do here too. Aside from the inside-the-database - differences, outside-the-database configuration files now have - prescribed location and extensions, which was not true before. - - + + + + Configuration files now must be placed in a single specified + directory ($SHAREDIR/tsearch_data), and must have + a specific extension depending on the type of file, as noted + previously in the descriptions of the various dictionary types. + This restriction was added to forestall security problems. + + - - - Relocation of configuration from add-on tables into core system catalogs - will break client queries that looked at the add-on tables. - - + + + Configuration files must be encoded in UTF-8 encoding, + regardless of what database encoding is used. + + - - - Thesaurus files now use ? for stop words. - - - - - - What else? + + + In thesaurus configuration files, stop words must be marked with + ?. + + +