Michael Paquier 59f47fb98d unaccent: Add support for quoted translated characters
As reported in bug #18057, the extension unaccent removes in its rule
file whitespace characters that are intentionally specified when
building unaccent.rules from UnicodeData.txt, causing an incorrect
translation for some characters like numeric symbols.  This is caused by
the fact that all whitespaces before and after the origin and target
characters are all discarded (this limitation is documented).

This commit makes possible the use of quotes around target characters,
so as whitespaces can be considered part of target characters.  Some
target characters use a double quote, these require an extra double
quote.

The documentation is updated to show how to use quoted areas,
generate_unaccent_rules.py is updated to generate unaccent.rules and a
couple of tests are added for numeric symbols.  While working on this
patch, I have implemented a fake rule file to test the parsing logic
implemented, which is not included here as it would just consume extra
cycles in the tests, and it requires the manipulation of an installation
tree to be able to work correctly.

As this requires a change of format in unaccent.rules, this cannot be
backpatched, unfortunately.  The idea to use double quotes as escaped
characters comes from Tom Lane.

Reported-by: Martin Schlossarek
Author: Michael Paquier
Discussion: https://postgr.es/m/18057-62712cad01bd202c@postgresql.org
2023-09-20 12:29:36 +09:00
..
2023-01-02 15:00:37 -05:00
2023-01-02 15:00:37 -05:00
2023-01-02 15:00:37 -05:00
2023-01-02 15:00:37 -05:00
2023-01-02 15:00:37 -05:00
2023-01-02 15:00:37 -05:00
2023-01-02 15:00:37 -05:00
2023-01-02 15:00:37 -05:00
2023-01-02 15:00:37 -05:00
2023-01-02 15:00:37 -05:00
2023-01-02 15:00:37 -05:00
2023-01-02 15:00:37 -05:00
2023-01-02 15:00:37 -05:00
2023-01-02 15:00:37 -05:00
2023-01-02 15:00:37 -05:00
2023-08-25 13:31:24 +02:00
2023-01-02 15:00:37 -05:00
2023-01-02 15:00:37 -05:00
2023-01-02 15:00:37 -05:00
2023-01-02 15:00:37 -05:00
2023-01-02 15:00:37 -05:00
2023-01-02 15:00:37 -05:00
2023-01-02 15:00:37 -05:00

The PostgreSQL contrib tree
---------------------------

This subtree contains porting tools, analysis utilities, and plug-in
features that are not part of the core PostgreSQL system, mainly
because they address a limited audience or are too experimental to be
part of the main source tree.  This does not preclude their
usefulness.

User documentation for each module appears in the main SGML
documentation.

When building from the source distribution, these modules are not
built automatically, unless you build the "world" target.  You can
also build and install them all by running "make all" and "make
install" in this directory; or to build and install just one selected
module, do the same in that module's subdirectory.

Some directories supply new user-defined functions, operators, or
types.  To make use of one of these modules, after you have installed
the code you need to register the new SQL objects in the database
system by executing a CREATE EXTENSION command.  In a fresh database,
you can simply do

    CREATE EXTENSION module_name;

See the PostgreSQL documentation for more information about this
procedure.