diff --git a/contrib/tsearch/README.tsearch b/contrib/tsearch/README.tsearch index e3bb9d91ec..fcc15e11db 100644 --- a/contrib/tsearch/README.tsearch +++ b/contrib/tsearch/README.tsearch @@ -6,6 +6,8 @@ All work was done by Teodor Sigaev (teodor@stack.net) and Oleg Bartunov CHANGES: +August 29, 2002 + Space usage and using CLUSTER command documented August 22, 2002 Fix works with 'bad' queries August 13, 2002 @@ -286,8 +288,8 @@ is strongly depends on many factors (query, collection, dictionaries and hardware). Collection is available for download from -http://www.sai.msu.su/~megera/postgres/gist/tsearch/ -as mw_titles.gz (about 3Mb). +http://www.sai.msu.su/~megera/postgres/gist/tsearch/mw_titles.gz +(377905 titles from postgresql mailing lists, about 3Mb). 0. install contrib/tsearch module 1. createdb test @@ -353,3 +355,61 @@ using gist indices (morph) There are no visible difference between these 2 cases but your mileage may vary. + + +NOTES: + +1. The size of txtidx column should be lesser than size of corresponding column. + Below some real numbers from test database (link above). + + a) After loading data + +-rw------- 1 postgres users 23191552 Aug 29 14:08 53016937 +-rw------- 1 postgres users 81059840 Aug 29 14:08 52639027 + +Table titles (52639027) occupies 80Mb, index on txtidx column (53016937) +occupies 22Mb. Use contrib/oid2name to get mappings from oid to names. +After doing + +test=# select title into titles_tmp from titles; +SELECT + +I got size of table 'titles' without txtidx field + +-rw------- 1 postgres users 30105600 Aug 29 14:14 53016938 + +So, txtidx column itself occupies about 50Mb. + + b) after running 'vacuum full analyze' I got: + +-rw------- 1 postgres users 30105600 Aug 29 14:26 53016938 +-rw------- 1 postgres users 36880384 Aug 29 14:26 53016937 +-rw------- 1 postgres users 51494912 Aug 29 14:26 52639027 + +53016938 = titles_tmp + +So, actual size of 'txtidx' field is 20 Mb ! "quod erat demonstrandum" + +2. CLUSTER command is highly recommended if you need fast searching. + For example: + + test=# cluster t_idx on titles; + + BUT ! In 7.2 CLUSTER command forgets about other indices and permissions, + so you need be carefull and rebuild these indices and restore permissions + after clustering. Also, clustering isn't dynamic, so you'd need to + use CLUSTER from time to time. In 7.3 CLUSTER command should works + fine. + + after clustering: + +-rw------- 1 postgres users 23404544 Aug 29 14:59 53394850 +-rw------- 1 postgres users 30105600 Aug 29 14:26 53016938 +-rw------- 1 postgres users 50995200 Aug 29 14:45 53394845 +pg@zen:/usr/local/pgsql/data/base/52638986$ oid2name -d test +All tables from database "test": +--------------------------------- +53394850 = t_idx +53394845 = titles +53016938 = titles_tmp +