* xraydict functionality and usage improvements
Add a filter_function to xraydict, allowing fewer big data structures. Make
uses of xraydict prefer exclusion sets to exclusion lists, to avoid
repeated linear search of a list.
* Make `big5_coded_forms_from_hkscs` a set, remove set trailing commas.
* Remove `big5_coded_forms_from_hkscs` in favour of a filter function.
* Similarly, use sets for 7-bit exclusion lists except when really short.
* Revise mappings for seven 78JIS codepoints.
Mappings for 25-23 and 90-22 were previously the same as those used for
97JIS; they have been swapped to correspond with how the IBM extension
versus the standard code are mapped in the "old sequence" (78JIS-based)
as opposed to the "new sequence".
Mappings for 32-70, 34-45, 35-29, 39-77 and 54-02 in 78JIS have been
changed to reflect disunifications made in 2000-JIS and 2004-JIS, assigning
the 1978-edition unsimplified variants of those characters separate coded
forms (where previously, only swaps and disunifications in 83JIS and
disunifications in 90JIS (including JIS X 0212) had been considered).
This only affects the `jis_encoding` codec (including the decoding
direction for `iso-2022-jp-2`, `iso-2022-jp-3` and `iso-2022-jp-2004`),
and the decoding is only affected when `ESC $ @` (not `ESC $ B`) is used.
The `iso-2022-jp` codec is unaffected, and remains similar to (but more
consistently pedantic than) the WHATWG specification, thus using the same
table for both 78JIS and 97JIS.
* Make `johab-ebcdic` decoder use many-to-one, not corporate PUA.
Many-to-one decodes are not uncommon in CJK encodings (e.g. Windows-31J),
and mapping to the IBM Corporate PUA (code page 1449) would probably make
it render as completely the wrong character if at all in practice.
* Switch `cp950_no_eudc_encoding_map` away from a hardcoded exclusion list.
* Codec support for `x-mac-korean`.
* Add a test bit for the UTF-8 wrapper.
* Document the unique error-condition definition of the ISO-2022-JP codec.
* Update docs now there is an actual implementation for `x-mac-korean`.
* Further explanations of the hazards of `jis_encoding`.
* Sanitised → Sanitised or escaped.
* Further clarify the status with not verifying Shift In.
* Corrected description of End State 2.
* Changes to MacKorean to avoid mapping non-ASCII using ASCII punctuation.
* Extraneous word "still".
* Fix omitting MacKorean single-byte codes.