postgres/contrib/unaccent
Thomas Munro 18501841bc Add simple codepoint redirections to unaccent.rules.
Previously we searched for code points where the Unicode data file
listed an equivalent combining character sequence that added accents.
Some codepoints redirect to a single other codepoint, instead of doing
any combining.  We can follow those references recursively to get the
answer.

Per bug report #18362, which reported missing Ancient Greek characters.
Specifically, precomposed characters with oxia (from the polytonic
accent system used for old Greek) just point to precomposed characters
with tonos (from the monotonic accent system for modern Greek), and we
have to follow the extra hop to find out that they are composed with
an acute accent.

Besides those, the new rule also:

* pulls in a lot of 'Mathematical Alphanumeric Symbols', which are
  copies of the Latin and Greek alphabets and numbers rendered
  in different typefaces, and

* corrects a single mathematical letter that previously came from the
  CLDR transliteration file, but the new rule extracts from the main
  Unicode database file, where clearly the latter is right and the
  former is a wrong (reported to CLDR).

Reported-by: Cees van Zeeland <cees.van.zeeland@freedom.nl>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/18362-be6d0cfe122b6354%40postgresql.org
2024-07-05 15:25:31 +12:00
..
expected Add simple codepoint redirections to unaccent.rules. 2024-07-05 15:25:31 +12:00
sql unaccent: Add support for quoted translated characters 2023-09-20 12:29:36 +09:00
.gitignore Add support for automatically updating Unicode derived files 2020-01-09 10:08:14 +01:00
Makefile unaccent: Tweak value of PYTHON when building without Python support 2023-09-27 14:40:23 +09:00
generate_unaccent_rules.py Add simple codepoint redirections to unaccent.rules. 2024-07-05 15:25:31 +12:00
meson.build Update copyright for 2024 2024-01-03 20:49:05 -05:00
unaccent--1.0--1.1.sql Update unaccent extension for parallel query. 2016-06-14 14:55:49 -04:00
unaccent--1.1.sql Update unaccent extension for parallel query. 2016-06-14 14:55:49 -04:00
unaccent.c Update copyright for 2024 2024-01-03 20:49:05 -05:00
unaccent.control Mark some contrib modules as "trusted". 2020-02-13 15:02:35 -05:00
unaccent.rules Add simple codepoint redirections to unaccent.rules. 2024-07-05 15:25:31 +12:00