Commit Graph

7 Commits

Author SHA1 Message Date
HarJIT
14db828233
Fix an oversight in the UTF-32 endian sniffing. (#18)
(I'd commented about the heuristic of characters at the start of the plane
being rare, but failed to actually implement said heuristic, only having
implemented the detection of the high eight bits (which can be expanded
to eleven) having to be false.)
2021-10-13 17:04:11 +09:00
HarJIT
fa6dbc8365
Expansion and fixes to codecs.sbextra docs. (#16)
Mostly expanding docs with more information, but also correcting a mistake
where the cp424 docstrings refer to cp273.
2021-10-07 07:18:11 +09:00
HarJIT
f5f314a42d
One fix and one improvement to GB18030: (#15)
— The codec had been failing to decode 0x81308130 to U+0080, even though
it successfully encoded it. Since U+0080 is not used for anything in most
contexts (it's allocated as a control code in the ECMA-35 sense, but
ECMA-48 does not use it) this is unlikely to have hurt anything, but I
have fixed it anyway (it arose from 0 and None being conflated in a
conditional).

— The encoding and decoding of GB18030 four-byte codes now uses binary
search rather than linear search. This significantly improves performance
on four-byte codes, though performance on two-byte codes is unaffected.
2021-08-12 19:17:59 +09:00
HarJIT
0ef38bb6ee
Corrected documentation for iso-2022-jp-ext (implementation unchanged) (#8) 2021-04-09 17:59:39 +09:00
HarJIT
614193b8a1
Codecs package docs, as well as some assorted tweaks or minor additions (#5)
* Add some docs, and remove second Code page 874 codec (they handled the
non-overridden C1 area differently, but we only need one).

* More docs work.

* Doc stuff.

* Adjusted.

* More tweaks (table padding is not the docstring's problem).

* CSS and docstring tweaks.

* Link from modules to parent packages and vice versa.

* More documentation.

* Docstrings for all `codecs` submodules.

* Move encode_jis7_reduced into dbextra_data_7bit (thus completing the lazy
startup which was apparently not complete already) and docstrings added to
implementations of base class methods referring up to the base class.

* Remove FUSE junk that somehow made it into the repo.

* Some more docstrings.

* Fix some broken references to `string` (rather than `data`) which would have
caused a problem if any existing error handler had returned a negative
offset (which no current handler does, but it's worth fixing anyway).

* Add a cp042 codec to accompany the x-user-defined codec, and to pave the
way for maybe adding Adobe Symbol, Zapf Dingbats or Wingdings codecs
in future.

* Better Japanese Autodetect behaviour for ISO-2022-JP (add yet another
condition in which it will be detected, making it able to conclusively
detect it prior to end of stream without being fed an entire escape
sequence in one call). Also some docs tweaks.

* idstr() → _idstr() since it's internal.

* Docs for codecs.pifonts.

* Docstrings for dbextra.

* Document the sbextra classes.

* Docstrings for the web encodings.

* Possibly a fairer assessment of likely reality.

* Docstrings for codecs.binascii

* The *encoding* isn't removed (the BOM is).

* Make it clearer when competing OEM code pages use different letter layouts.

* Fix copied in error.

* Stop generating linking to non-existent "← tools" from tools.gendoc.

* Move .fuse_hidden* exclusion to my user-level config.

* Constrain the table style changes to class .markdownTable, to avoid any
effect on other interface tables generated by Doxygen.

* Refer to `__ispackage__` when generating help.
2021-04-02 16:34:10 +09:00
K. Lange
40836cba21 Implement Python 3 division semantics 2021-04-02 16:02:05 +09:00
HarJIT
5c2de206b9
Codecs package (#4)
Codecs package

Co-authored-by: HarJIT <harjit@harjit.moe>
2021-03-24 04:53:02 -07:00