a580a835b8
* xraydict functionality and usage improvements Add a filter_function to xraydict, allowing fewer big data structures. Make uses of xraydict prefer exclusion sets to exclusion lists, to avoid repeated linear search of a list. * Make `big5_coded_forms_from_hkscs` a set, remove set trailing commas. * Remove `big5_coded_forms_from_hkscs` in favour of a filter function. * Similarly, use sets for 7-bit exclusion lists except when really short. * Revise mappings for seven 78JIS codepoints. Mappings for 25-23 and 90-22 were previously the same as those used for 97JIS; they have been swapped to correspond with how the IBM extension versus the standard code are mapped in the "old sequence" (78JIS-based) as opposed to the "new sequence". Mappings for 32-70, 34-45, 35-29, 39-77 and 54-02 in 78JIS have been changed to reflect disunifications made in 2000-JIS and 2004-JIS, assigning the 1978-edition unsimplified variants of those characters separate coded forms (where previously, only swaps and disunifications in 83JIS and disunifications in 90JIS (including JIS X 0212) had been considered). This only affects the `jis_encoding` codec (including the decoding direction for `iso-2022-jp-2`, `iso-2022-jp-3` and `iso-2022-jp-2004`), and the decoding is only affected when `ESC $ @` (not `ESC $ B`) is used. The `iso-2022-jp` codec is unaffected, and remains similar to (but more consistently pedantic than) the WHATWG specification, thus using the same table for both 78JIS and 97JIS. * Make `johab-ebcdic` decoder use many-to-one, not corporate PUA. Many-to-one decodes are not uncommon in CJK encodings (e.g. Windows-31J), and mapping to the IBM Corporate PUA (code page 1449) would probably make it render as completely the wrong character if at all in practice. * Switch `cp950_no_eudc_encoding_map` away from a hardcoded exclusion list. * Codec support for `x-mac-korean`. * Add a test bit for the UTF-8 wrapper. * Document the unique error-condition definition of the ISO-2022-JP codec. * Update docs now there is an actual implementation for `x-mac-korean`. * Further explanations of the hazards of `jis_encoding`. * Sanitised → Sanitised or escaped. * Further clarify the status with not verifying Shift In. * Corrected description of End State 2. * Changes to MacKorean to avoid mapping non-ASCII using ASCII punctuation. * Extraneous word "still". * Fix omitting MacKorean single-byte codes. |
||
---|---|---|
.. | ||
codecs | ||
foo | ||
syntax | ||
callgrind.krk | ||
collections.krk | ||
dis.krk | ||
dummy.krk | ||
help.krk | ||
json.krk | ||
maindemo.krk | ||
string.krk |