Codecs package docs, as well as some assorted tweaks or minor additions (#5)
* Add some docs, and remove second Code page 874 codec (they handled the non-overridden C1 area differently, but we only need one). * More docs work. * Doc stuff. * Adjusted. * More tweaks (table padding is not the docstring's problem). * CSS and docstring tweaks. * Link from modules to parent packages and vice versa. * More documentation. * Docstrings for all `codecs` submodules. * Move encode_jis7_reduced into dbextra_data_7bit (thus completing the lazy startup which was apparently not complete already) and docstrings added to implementations of base class methods referring up to the base class. * Remove FUSE junk that somehow made it into the repo. * Some more docstrings. * Fix some broken references to `string` (rather than `data`) which would have caused a problem if any existing error handler had returned a negative offset (which no current handler does, but it's worth fixing anyway). * Add a cp042 codec to accompany the x-user-defined codec, and to pave the way for maybe adding Adobe Symbol, Zapf Dingbats or Wingdings codecs in future. * Better Japanese Autodetect behaviour for ISO-2022-JP (add yet another condition in which it will be detected, making it able to conclusively detect it prior to end of stream without being fed an entire escape sequence in one call). Also some docs tweaks. * idstr() → _idstr() since it's internal. * Docs for codecs.pifonts. * Docstrings for dbextra. * Document the sbextra classes. * Docstrings for the web encodings. * Possibly a fairer assessment of likely reality. * Docstrings for codecs.binascii * The *encoding* isn't removed (the BOM is). * Make it clearer when competing OEM code pages use different letter layouts. * Fix copied in error. * Stop generating linking to non-existent "← tools" from tools.gendoc. * Move .fuse_hidden* exclusion to my user-level config. * Constrain the table style changes to class .markdownTable, to avoid any effect on other interface tables generated by Doxygen. * Refer to `__ispackage__` when generating help.
This commit is contained in:
parent
0fd2849fd8
commit
614193b8a1
@ -388,3 +388,14 @@ div.memdoc {
|
||||
color: #bbb;
|
||||
content: '->';
|
||||
}
|
||||
|
||||
.markdownTableRowOdd {
|
||||
background: rgba(204, 204, 204, 0.2);
|
||||
}
|
||||
table.markdownTable {
|
||||
margin-bottom: 1em;
|
||||
}
|
||||
.markdownTable th, .markdownTable td {
|
||||
padding-left: 0.5ex;
|
||||
padding-right: 0.5ex;
|
||||
}
|
||||
|
@ -1,2 +1,167 @@
|
||||
"""@brief Convert a string to and from various encodings.
|
||||
|
||||
The basic supported encodings are roughly as specified in the [WHATWG Encoding Standard
|
||||
](https://encoding.spec.whatwg.org/), but more are also supported unless restriction to web
|
||||
encodings is explicitly specified.
|
||||
|
||||
Most encodings supported by Python are implemented, but not currently `idna` or `punycode`. Note
|
||||
however that Python makes `x-mac-japanese` and `x-mac-korean` aliases of `shift_jis` and `euc-kr`;
|
||||
this has not been done here. Also note that the behaviour in regards to association of encoding
|
||||
names with variants is somewhat different to Python's, partly due to following WHATWG: this affects
|
||||
most CJK codecs (e.g. Python treats `shift_jis` and `ms-kanji` differently, while this package does
|
||||
not), but also e.g. "ISO-8859-1".
|
||||
|
||||
Main entry points for the package are `codecs.infrastructure.encode`, `codecs.infrastructure.decode`
|
||||
and `codecs.infrastructure.lookup`, all three of which are also available as e.g. `codecs.encode`
|
||||
for convenience.
|
||||
|
||||
The list of codecs (not an exhaustive list of labels, nor close to one) is as follows.
|
||||
|
||||
### Single-byte extended ASCII encodings:
|
||||
|
||||
|Major label(s)|Meaning|
|
||||
|---|---|
|
||||
|`cp437`|8-bit United States (DOS)|
|
||||
|`cp720`|8-bit Arabic Letters and Box Drawing (DOS)|
|
||||
|`cp737`|8-bit Greek and Box Drawing (DOS)|
|
||||
|`cp775`|8-bit Baltic Rim (DOS)|
|
||||
|`cp850`|8-bit Western Europe and Canada (DOS)|
|
||||
|`cp852`|8-bit Central European (DOS)|
|
||||
|`cp855`|8-bit Balkan Cyrillic (DOS)|
|
||||
|`cp856`|8-bit Hebrew (DOS)|
|
||||
|`cp857`|8-bit Turkish (DOS)|
|
||||
|`cp858`|8-bit Western Europe and Canada with Euro (DOS)|
|
||||
|`cp860`|8-bit European Portugese (DOS)|
|
||||
|`cp861`|8-bit Icelandic (DOS)|
|
||||
|`cp862`|8-bit Hebrew and Box Drawing (DOS)|
|
||||
|`cp863`|8-bit Quebecois French (DOS)|
|
||||
|`cp864`|8-bit Arabic Positional Forms (DOS)|
|
||||
|`cp865`|8-bit Continental Nordic (DOS)|
|
||||
|`cp866`, `ibm866`|8-bit Russian Cyrillic (DOS)|
|
||||
|`cp869`|8-bit Greek (DOS)|
|
||||
|`cp1006`|8-bit Urdu|
|
||||
|`cp1125`|8-bit Ukrainian Cyrillic (DOS)|
|
||||
|`ecma-43-dv`, `cp367`, `csascii`|"8-bit Plain ASCII", i.e. ASCII without backspace composition, and with high bit unused. Note: most ASCII labels are mapped to Windows-1252, per WHATWG.|
|
||||
|`hp-roman8`|8-bit Roman (HP)|
|
||||
|`iso-8859-2`|8-bit Central European (ISO)|
|
||||
|`iso-8859-3`|8-bit South European (Maltese/Esperanto)|
|
||||
|`iso-8859-4`|8-bit North European|
|
||||
|`iso-8859-5`|8-bit Cyrillic (ISO)|
|
||||
|`iso-8859-6`|8-bit Arabic (ASMO/ISO)|
|
||||
|`iso-8859-7`|8-bit Greek (ISO)|
|
||||
|`iso-8859-8`, `iso-8859-8-i`|8-bit Hebrew (without vowel points). Although some, but not all, of the labels using this mapping request legacy visual-order behaviour (e.g. `iso-8859-8`, `iso-8859-8-e` or even `visual`, but not e.g. `iso-8859-8-i`), bidirectional conversion for any given markup format is beyond the scope of this package: determining from the label whether legacy visual-order behaviour should be used, and responding if so, should be implemented separately if needed.|
|
||||
|`iso-8859-10`|8-bit Nordic|
|
||||
|`iso-8859-13`|8-bit Baltic Rim (ISO)|
|
||||
|`iso-8859-14`|8-bit Celtic|
|
||||
|`iso-8859-15`|8-bit New Western European|
|
||||
|`iso-8859-16`|8-bit South-Eastern European (ISO)|
|
||||
|`koi8-r`|8-bit Russian Cyrillic (KOI8)|
|
||||
|`koi8-u`, `koi8-ru`|8-bit Ruthenian/Ukrainian/Belarusian Cyrillic (KOI8)|
|
||||
|`koi8-t`|8-bit Tajik Cyrillic|
|
||||
|`kz1048`|8-bit Kazakh Cyrillic|
|
||||
|`macintosh`|8-bit Roman (Macintosh)|
|
||||
|`palmos`|PalmOS code page|
|
||||
|`ptcp154`|8-bit Asian Cyrillic (Paratype)|
|
||||
|`windows-874`, `iso-8859-11`, `tis-620`, `cp874`|8-bit Thai|
|
||||
|`windows-1250`|8-bit Central European (Windows)|
|
||||
|`windows-1251`|8-bit Cyrillic (Windows)|
|
||||
|`windows-1252`, `ascii`, `iso-8859-1`, `latin1`|8-bit Western European. This is in accordance with WHATWG specification _in re_ which mappings to associate with which labels. Note: Python's `latin1` is sometimes used to round-trip arbitrary _sensu stricto_ extended ASCII data; in Kuroko, it is better to use `x-user-defined` for that.|
|
||||
|`windows-1253`|8-bit Greek (Windows)|
|
||||
|`windows-1254`, `iso-8859-9`|8-bit Turkish|
|
||||
|`windows-1255`|8-bit Hebrew (logical with vowel points)|
|
||||
|`windows-1256`|8-bit Arabic (Windows)|
|
||||
|`windows-1257`|8-bit Baltic Rim (Windows)|
|
||||
|`windows-1258`|8-bit Vietnamese (Windows). Basic codec: encoder will accept text in the form generated by the decoder, but neither NFC nor NFD normalised forms. This follows both Python and WHATWG behaviour. Conversion of text in NFC or NFD forms to encodable form may need to be done in a separate step before using the encoder.|
|
||||
|`x-mac-arabic`|8-bit Arabic (Macintosh)|
|
||||
|`x-mac-ce`|8-bit Central European (Macintosh)|
|
||||
|`x-mac-croatian`|8-bit Gajica|
|
||||
|`x-mac-cyrillic`|8-bit Cyrillic (Macintosh)|
|
||||
|`x-mac-farsi`|8-bit Persian (Macintosh)|
|
||||
|`x-mac-greek`|8-bit Greek (Macintosh)|
|
||||
|`x-mac-icelandic`|8-bit Icelandic (Macintosh)|
|
||||
|`x-mac-romanian`|8-bit Romanian (Macintosh)|
|
||||
|`x-mac-turkish`|8-bit Turkish (Macintosh)|
|
||||
|`x-user-defined`|8-bit User Defined (ASCII based variant: using U+0000–007F, U+F780–F7FF)|
|
||||
|
||||
### Single-byte symbol or dingbat font encodings:
|
||||
|
||||
|Major label(s)|Meaning|
|
||||
|---|---|
|
||||
|`cp042`|8-bit User Defined (variant using U+0000–001F, U+F020–F0FF). Windows uses that mapping for symbol fonts in some contexts.|
|
||||
|
||||
### 8-bit multi-byte Unicode codecs:
|
||||
|
||||
|Major label(s)|Meaning|
|
||||
|---|---|
|
||||
|`cesu-8`, `utf8mb3`, `utf8-ucs2`|CESU-8 (to UTF-16 as UTF-8 is to UTF-32). Mostly for interoperability with existing systems that use it.|
|
||||
|`gb18030`|Chinese GB18030, WHATWG version. Not technically a full UTF in this implementation, since one PUA character is changed to an ideographic space per WHATWG.|
|
||||
|`utf-8`, `utf8mb4`, `utf8-ucs4`|UTF-8 without a byte order mark|
|
||||
|`utf-8-sig`|UTF-8 with a byte order mark|
|
||||
|`utf-16`|UTF-16 with byte order mark, little endian if missing|
|
||||
|`utf-16be`|UTF-16, big endian, no byte order mark|
|
||||
|`utf-16le`|UTF-16, little endian, no byte order mark|
|
||||
|`utf-32`|UTF-32 with byte order mark (though byte order can usually also be detected in its absence)|
|
||||
|`utf-32be`|UTF-32, big endian, no byte order mark|
|
||||
|`utf-32le`|UTF-32, little endian, no byte order mark|
|
||||
|
||||
### 8-bit multi-byte legacy CJK codecs:
|
||||
|
||||
|Major label(s)|Meaning|
|
||||
|---|---|
|
||||
|`big5`, `big5-eten`|Traditional Chinese Big-5, ETen version, condoning HKSCS extensions when decoding.|
|
||||
|`big5-hkscs`|Traditional Chinese Big-5 with HKSCS extensions in both directions.|
|
||||
|`big5-nonetenkana`, `big5-tw`|Traditional Chinese Big-5, with BIG5.TXT (non-ETen) layout for kana and Cyrillic.|
|
||||
|`euc-jp`, `x-euc-jp`|Japanese EUC-JP, with Microsoft extensions, permitting JIS X 0212 only when decoding.|
|
||||
|`euc-jp-full`|Japanese EUC-JP, with Microsoft extensions, permitting JIS X 0212 in both directions.|
|
||||
|`euc-jisx0213`, `euc-jis-2004`|Japanese EUC-JP, with JIS X 0213 mappings and extensions.|
|
||||
|`euc-kr`, `uhc`, `windows-949`|Korean Unified Hangul Code (superset of EUC-KR, encodes KS C 5601).|
|
||||
|`gbk`, `gb2312`|Chinese GBK (GB2312 extension), condoning GB18030 when decoding.|
|
||||
|`johab`, `johab-ascii`|Korean Johab (ASCII-compatible stateless standard version)|
|
||||
|`shift_jis`, `ms-kanji`, `windows-31j`|Japanese Shift JIS (Windows compatible version)|
|
||||
|`shift-jisx0213`, `shift-jis-2004`|Japanese Shift JIS (JIS X 0213 version)|
|
||||
|`x-mac-chinesesimp`|Simplified Chinese GB2312, Macintosh version|
|
||||
|`x-mac-chinesetrad`|Traditional Chinese Big5, Macintosh version|
|
||||
|
||||
### 7-bit stateful codecs:
|
||||
|
||||
|Major label(s)|Meaning|
|
||||
|---|---|
|
||||
|`hz-gb-2312`|HZ (Usenet Simplified Chinese) encoding|
|
||||
|`iso-2022-cn`|7-bit stateful Chinese (Simplified and Traditional)|
|
||||
|`iso-2022-jp`|7-bit stateful Japanese, web version|
|
||||
|`iso-2022-jp-ext`|7-bit stateful Japanese, preserving katakana width|
|
||||
|`iso-2022-jp-1`|7-bit stateful Japanese, including JIS X 0212|
|
||||
|`iso-2022-jp-2`|7-bit stateful Multilingual (Japanese, Korean, Greek, Simplified Chinese, Western European)|
|
||||
|`iso-2022-jp-3`|7-bit stateful Japanese, including JIS X 0213 (2000 edition format)|
|
||||
|`iso-2022-jp-2004`|7-bit stateful Japanese, including JIS X 0213 (2004 edition format)|
|
||||
|`iso-2022-kr`|7-bit stateful Korean|
|
||||
|`jis_encoding`|7-bit stateful Japanese, comprehensive version|
|
||||
|`utf-7`|A largely obsolete scheme for mixing ASCII and Base64'd UTF-16BE in e-mail. Included mostly for Python parity.|
|
||||
|
||||
### EBCDIC codecs:
|
||||
|
||||
|Major label(s)|Meaning|
|
||||
|---|---|
|
||||
|`cp037`|EBCDIC Default (United States, Netherlands, Portugal, Brazil, Australia, New Zealand, Canadian ESA/390)|
|
||||
|`cp273`|EBCDIC German|
|
||||
|`cp424`|EBCDIC Hebrew|
|
||||
|`cp500`|EBCDIC "International" (Belgium, Switzerland, Canadian AS/400)|
|
||||
|`cp875`|EBCDIC Greek|
|
||||
|`cp933`, `ibm-933`, `ibm-1364`, `johab-ebcdic`|EBCDIC Korean (Johab, IBM stateful version for EBCDIC)|
|
||||
|`cp1026`|EBCDIC Turkish|
|
||||
|`cp1140`|EBCDIC with Euro Sign|
|
||||
|
||||
### Codecs with unusual behaviour:
|
||||
|
||||
|Major label(s)|Meaning|
|
||||
|---|---|
|
||||
|`inverse-base64`|Base64 with inverse semantics to preserve type correctness (encoder reads, decoder creates). Error handler is ignored.|
|
||||
|`inverse-base64hqx`|Same, but using the BinHex4 alphabet (note: does not in and of itself create the BinHex4 *format*)|
|
||||
|`inverse-base64uu`|Same, but using the uuencode alphabet (note: does not in and of itself create the uuencode *format*)|
|
||||
|`inverse-quopri`|Quoted-Printable, with inverse semantics (encoder reads, decoder creates). Error handler is ignored.|
|
||||
|`japanese`|Attempts to detect the encoding of a Japanese document (like the unified "Japanese" option now offered by some browsers' encoding override menus), and raises `ValueError` if it cannot. Not intended to be used in the encode direction, but will behave as `utf-8-sig` in that case.|
|
||||
|`undefined`, `replacement`|Represents data for which encoding/decoding must not be attempted. Following WHATWG (and differing from Python), error handlers are accepted, though only by the decoder: the encoder will ignore them.|
|
||||
"""
|
||||
|
||||
from codecs.infrastructure import encode, decode, lookup
|
||||
import codecs.sbencs, codecs.dbdata, codecs.bespokecodecs, codecs.sbextra, codecs.dbextra, codecs.binascii
|
||||
import codecs.sbencs, codecs.dbdata, codecs.bespokecodecs, codecs.sbextra, codecs.dbextra, codecs.binascii, codecs.pifonts
|
||||
|
@ -1,11 +1,19 @@
|
||||
"""Contains various WHATWG-defined codecs which require dedicated implementations.
|
||||
|
||||
Also includes `utf-8-sig` which, while not a WHATWG-specified codec _per se_, is detected,
|
||||
interpreted and handled by WHATWG BOM tag logic, in preference above any label, before the codec
|
||||
gets to see it. WHATWG BOM tag logic is not implemented here (it is not always sensible in a
|
||||
non-browser context); hence, they remain separate codecs."""
|
||||
from codecs.infrastructure import register_kuroko_codec, ByteCatenator, StringCatenator, UnicodeEncodeError, UnicodeDecodeError, lookup_error, lookup, IncrementalDecoder, IncrementalEncoder, lazy_property
|
||||
from codecs.dbdata import more_dbdata
|
||||
|
||||
class Gb18030IncrementalEncoder(IncrementalEncoder):
|
||||
"""IncrementalEncoder implementation for GB18030 (Mainland Chinese Unicode format)"""
|
||||
name = "gb18030"
|
||||
html5name = "gb18030"
|
||||
four_byte_codes = True
|
||||
def encode(string, final = False):
|
||||
"""Implements `IncrementalEncoder.encode`"""
|
||||
let out = ByteCatenator()
|
||||
let offset = 0
|
||||
while 1: # offset can be arbitrarily changed by the error handler, so not a for
|
||||
@ -64,6 +72,8 @@ class Gb18030IncrementalEncoder(IncrementalEncoder):
|
||||
offset += 1
|
||||
|
||||
class GbkIncrementalEncoder(Gb18030IncrementalEncoder):
|
||||
"""IncrementalEncoder implementation for GBK (Chinese),
|
||||
extension of GB2312 (Simplified Chinese)"""
|
||||
name = "gbk"
|
||||
html5name = "gbk"
|
||||
four_byte_codes = False
|
||||
@ -77,9 +87,12 @@ def _get_gbsurrogate_pointer(leader, i):
|
||||
return ret
|
||||
|
||||
class Gb18030IncrementalDecoder(IncrementalDecoder):
|
||||
"""IncrementalDecoder implementation for GB18030 (Mainland Chinese Unicode),
|
||||
extension of GB2312 (Simplified Chinese)"""
|
||||
name = "gb18030"
|
||||
html5name = "gb18030"
|
||||
def decode(data_in, final = False):
|
||||
"""Implements `IncrementalDecoder.decode`"""
|
||||
let data = self.pending + data_in
|
||||
self.pending = b""
|
||||
let out = StringCatenator()
|
||||
@ -150,7 +163,7 @@ class Gb18030IncrementalDecoder(IncrementalDecoder):
|
||||
out.add(errorret[0])
|
||||
offset = errorret[1]
|
||||
if offset < 0:
|
||||
offset += len(string)
|
||||
offset += len(data)
|
||||
|
||||
register_kuroko_codec(["gb18030", "gb18030_2000"], Gb18030IncrementalEncoder, Gb18030IncrementalDecoder)
|
||||
register_kuroko_codec(
|
||||
@ -159,6 +172,7 @@ register_kuroko_codec(
|
||||
GbkIncrementalEncoder, Gb18030IncrementalDecoder)
|
||||
|
||||
class Iso2022JpIncrementalEncoder(IncrementalEncoder):
|
||||
"""IncrementalEncoder implementation for ISO-2022-JP (7-bit stateful Japanese JIS)"""
|
||||
name = "iso-2022-jp"
|
||||
html5name = "iso-2022-jp"
|
||||
encodes_sbcs = []
|
||||
@ -187,6 +201,7 @@ class Iso2022JpIncrementalEncoder(IncrementalEncoder):
|
||||
raise ValueError("set to invalid state: " + repr(state))
|
||||
self.state = state
|
||||
def encode(string, final = False):
|
||||
"""Implements `IncrementalEncoder.encode`"""
|
||||
let out = ByteCatenator()
|
||||
let offset = 0
|
||||
while 1: # offset can be arbitrarily changed by the error handler, so not a for
|
||||
@ -261,17 +276,21 @@ class Iso2022JpIncrementalEncoder(IncrementalEncoder):
|
||||
else:
|
||||
raise RuntimeError("inconsistently configured encoder")
|
||||
def reset():
|
||||
"""Implements `IncrementalEncoder.reset`"""
|
||||
self.state = 0
|
||||
self.state_greekmode = False
|
||||
self.state_desigsupershift = False
|
||||
def getstate():
|
||||
"""Implements `IncrementalEncoder.getstate`"""
|
||||
return (self.state, self.state_desigsupershift, self.state_greekmode)
|
||||
def setstate(state):
|
||||
"""Implements `IncrementalEncoder.setstate`"""
|
||||
self.state = state[0]
|
||||
self.state_desigsupershift = state[1]
|
||||
self.state_greekmode = state[2]
|
||||
|
||||
class Iso2022JpIncrementalDecoder(IncrementalDecoder):
|
||||
"""IncrementalDecoder implementation for ISO-2022-JP (7-bit stateful Japanese JIS)"""
|
||||
name = "iso-2022-jp"
|
||||
html5name = "iso-2022-jp"
|
||||
@lazy_property
|
||||
@ -291,6 +310,7 @@ class Iso2022JpIncrementalDecoder(IncrementalDecoder):
|
||||
super_shift = False
|
||||
concat_lenient = False
|
||||
def decode(data_in, final = False):
|
||||
"""Implements `IncrementalDecoder.decode`"""
|
||||
let data = self.pending + data_in
|
||||
self.pending = b""
|
||||
let out = StringCatenator()
|
||||
@ -457,8 +477,9 @@ class Iso2022JpIncrementalDecoder(IncrementalDecoder):
|
||||
out.add(errorret[0])
|
||||
offset = errorret[1]
|
||||
if offset < 0:
|
||||
offset += len(string)
|
||||
offset += len(data)
|
||||
def reset():
|
||||
"""Implements `IncrementalDecoder.reset`"""
|
||||
self.pending = b""
|
||||
self.state_set = 0
|
||||
self.state_greekmode = False
|
||||
@ -475,9 +496,11 @@ class Iso2022JpIncrementalDecoder(IncrementalDecoder):
|
||||
self.state_last646seen = None
|
||||
self.scrutinising_inter646 = False
|
||||
def getstate():
|
||||
"""Implements `IncrementalDecoder.getstate`"""
|
||||
return (self.pending, self.state_set, self.state_greekmode, self.state_shiftoutmode,
|
||||
self.state_justswitched, self.state_last646seen, self.scrutinising_inter646)
|
||||
def setstate(state):
|
||||
"""Implements `IncrementalDecoder.setstate`"""
|
||||
self.pending = state[0]
|
||||
self.state_set = state[1]
|
||||
self.state_greekmode = state[2]
|
||||
@ -491,6 +514,7 @@ register_kuroko_codec(["iso-2022-jp", "iso2022-jp", "iso2022jp", "csiso2022jp",
|
||||
|
||||
|
||||
class Utf16IncrementalEncoder(IncrementalEncoder):
|
||||
"""IncrementalEncoder implementation for UTF-16 with Byte Order Mark"""
|
||||
name = "utf-16"
|
||||
html5name = "utf-16"
|
||||
encoding_map = {}
|
||||
@ -507,6 +531,7 @@ class Utf16IncrementalEncoder(IncrementalEncoder):
|
||||
else:
|
||||
raise ValueError("unexpected endian value: " + repr(self.endian))
|
||||
def encode(string, final = False):
|
||||
"""Implements `IncrementalEncoder.encode`"""
|
||||
let out = ByteCatenator()
|
||||
let offset = 0
|
||||
if self.include_bom and self.state == -1:
|
||||
@ -536,13 +561,17 @@ class Utf16IncrementalEncoder(IncrementalEncoder):
|
||||
if offset < 0:
|
||||
offset += len(string)
|
||||
def reset():
|
||||
"""Implements `IncrementalEncoder.reset`"""
|
||||
self.state = -1
|
||||
def getstate():
|
||||
"""Implements `IncrementalEncoder.getstate`"""
|
||||
return self.state
|
||||
def setstate(state):
|
||||
"""Implements `IncrementalEncoder.setstate`"""
|
||||
self.state = state
|
||||
|
||||
class Utf16IncrementalDecoder(IncrementalDecoder):
|
||||
"""IncrementalDecoder implementation for UTF-16"""
|
||||
name = "utf-16"
|
||||
html5name = "utf-16"
|
||||
force_endian = None # subclass may set to "little" or "big"
|
||||
@ -552,6 +581,7 @@ class Utf16IncrementalDecoder(IncrementalDecoder):
|
||||
state = None
|
||||
pending = b""
|
||||
def decode(data_in, final = False):
|
||||
"""Implements `IncrementalDecoder.decode`"""
|
||||
let data = self.pending + data_in
|
||||
self.pending = b""
|
||||
let out = StringCatenator()
|
||||
@ -616,34 +646,41 @@ class Utf16IncrementalDecoder(IncrementalDecoder):
|
||||
out.add(errorret[0])
|
||||
offset = errorret[1]
|
||||
if offset < 0:
|
||||
offset += len(string)
|
||||
offset += len(data)
|
||||
def reset():
|
||||
"""Implements `IncrementalDecoder.reset`"""
|
||||
self.pending = b""
|
||||
self.state = -1
|
||||
def getstate():
|
||||
"""Implements `IncrementalDecoder.getstate`"""
|
||||
return (self.pending, self.state)
|
||||
def setstate(state):
|
||||
"""Implements `IncrementalDecoder.setstate`"""
|
||||
self.pending = state[0]
|
||||
self.state = state[1]
|
||||
|
||||
class Utf16BeIncrementalEncoder(Utf16IncrementalEncoder):
|
||||
"""IncrementalEncoder implementation for UTF-16 Big Endian without Byte Order Mark"""
|
||||
name = "utf-16be"
|
||||
html5name = "utf-16be"
|
||||
endian = "big"
|
||||
include_bom = False
|
||||
|
||||
class Utf16BeIncrementalDecoder(Utf16IncrementalDecoder):
|
||||
"""IncrementalDecoder implementation for UTF-16 Big Endian without Byte Order Mark"""
|
||||
name = "utf-16be"
|
||||
html5name = "utf-16be"
|
||||
force_endian = "big"
|
||||
|
||||
class Utf16LeIncrementalEncoder(Utf16IncrementalEncoder):
|
||||
"""IncrementalEncoder implementation for UTF-16 Little Endian without Byte Order Mark"""
|
||||
name = "utf-16le"
|
||||
html5name = "utf-16le"
|
||||
endian = "little"
|
||||
include_bom = False
|
||||
|
||||
class Utf16LeIncrementalDecoder(Utf16IncrementalDecoder):
|
||||
"""IncrementalDecoder implementation for UTF-16 Little Endian without Byte Order Mark"""
|
||||
name = "utf-16le"
|
||||
html5name = "utf-16le"
|
||||
force_endian = "little"
|
||||
@ -660,6 +697,7 @@ register_kuroko_codec(["utf-16be", "utf-16-be", "unicodefffe", "unicodebigunmark
|
||||
|
||||
|
||||
class Utf8IncrementalEncoder(IncrementalEncoder):
|
||||
"""IncrementalEncoder implementation for UTF-8"""
|
||||
name = "utf-8"
|
||||
html5name = "utf-8"
|
||||
# -1: expecting BOM
|
||||
@ -667,6 +705,7 @@ class Utf8IncrementalEncoder(IncrementalEncoder):
|
||||
state = None
|
||||
include_bom = False
|
||||
def encode(string, final = False):
|
||||
"""Implements `IncrementalEncoder.encode`"""
|
||||
# We use UTF-8 natively, so this is fairly simple
|
||||
let out = ByteCatenator()
|
||||
if self.include_bom and self.state == -1:
|
||||
@ -675,13 +714,17 @@ class Utf8IncrementalEncoder(IncrementalEncoder):
|
||||
out.add(string.encode())
|
||||
return out.getvalue()
|
||||
def reset():
|
||||
"""Implements `IncrementalEncoder.reset`"""
|
||||
self.state = -1
|
||||
def getstate():
|
||||
"""Implements `IncrementalEncoder.getstate`"""
|
||||
return self.state
|
||||
def setstate(state):
|
||||
"""Implements `IncrementalEncoder.setstate`"""
|
||||
self.state = state
|
||||
|
||||
class Utf8IncrementalDecoder(IncrementalDecoder):
|
||||
"""IncrementalDecoder implementation for UTF-8"""
|
||||
name = "utf-8"
|
||||
html5name = "utf-8"
|
||||
# -1: expecting BOM
|
||||
@ -692,6 +735,7 @@ class Utf8IncrementalDecoder(IncrementalDecoder):
|
||||
def _error_handler(error):
|
||||
return lookup_error(self.errors)(error)
|
||||
def decode(data_in, final = False):
|
||||
"""Implements `IncrementalDecoder.decode`"""
|
||||
# We use UTF-8 natively, so this only validates it and applies the error handler
|
||||
# (and removes a BOM if remove_bom is set)
|
||||
let data = self.pending + data_in
|
||||
@ -762,7 +806,7 @@ class Utf8IncrementalDecoder(IncrementalDecoder):
|
||||
out.add(errorret[0])
|
||||
running_offset = errorret[1]
|
||||
if running_offset < 0:
|
||||
running_offset += len(string)
|
||||
running_offset += len(data)
|
||||
countdown = 0
|
||||
bolster = 1
|
||||
first_offset = running_offset
|
||||
@ -777,20 +821,25 @@ class Utf8IncrementalDecoder(IncrementalDecoder):
|
||||
self.pending = bytes(dlist[second_offset:])
|
||||
return out.getvalue()
|
||||
def reset():
|
||||
"""Implements `IncrementalDecoder.reset`"""
|
||||
self.pending = b""
|
||||
self.state = -1
|
||||
def getstate():
|
||||
"""Implements `IncrementalDecoder.getstate`"""
|
||||
return (self.pending, self.state)
|
||||
def setstate(state):
|
||||
"""Implements `IncrementalDecoder.setstate`"""
|
||||
self.pending = state[0]
|
||||
self.state = state[1]
|
||||
|
||||
class Utf8SigIncrementalEncoder(Utf8IncrementalEncoder):
|
||||
"""IncrementalEncoder implementation for UTF-8 with Byte Order Mark"""
|
||||
name = "utf-8-sig"
|
||||
html5name = None
|
||||
include_bom = True
|
||||
|
||||
class Utf8SigIncrementalDecoder(Utf8IncrementalDecoder):
|
||||
"""IncrementalDecoder implementation for UTF-8 with Byte Order Mark"""
|
||||
name = "utf-8-sig"
|
||||
html5name = None
|
||||
remove_bom = True
|
||||
|
@ -1,3 +1,6 @@
|
||||
"""
|
||||
Defines functions and codecs pertaining to binary-to-text encodings.
|
||||
"""
|
||||
from codecs.infrastructure import StringCatenator, ByteCatenator, IncrementalEncoder, IncrementalDecoder, UnicodeDecodeError, UnicodeEncodeError, register_kuroko_codec
|
||||
|
||||
let _base64_alphabet = (
|
||||
@ -11,10 +14,14 @@ let _base64_alphabet_hqx = [ord(i) for i in
|
||||
"!\"#$%&'()*+,-012345689@ABCDEFGHIJKLMNPQRSTUVXYZ[`abcdefhijklmpqr"]
|
||||
|
||||
class Base64IncrementalCreator(IncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation to create (yes) Base64 from bytes.
|
||||
"""
|
||||
name = "inverse-base64"
|
||||
alphabet = _base64_alphabet
|
||||
padchar = "="
|
||||
def decode(data_in, final = False):
|
||||
"""Implements `IncrementalDecoder.decode`"""
|
||||
let data = self.pending + data_in
|
||||
self.pending = b""
|
||||
let offset = 0
|
||||
@ -39,10 +46,14 @@ class Base64IncrementalCreator(IncrementalDecoder):
|
||||
offset += 3
|
||||
|
||||
class Base64IncrementalParser(IncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation to parse (yes) Base64 from a string to bytes.
|
||||
"""
|
||||
name = "inverse-base64"
|
||||
alphabet = _base64_alphabet
|
||||
padchar = "="
|
||||
def encode(string_in, final = False):
|
||||
"""Implements `IncrementalEncoder.encode`"""
|
||||
let string = self.pending + string_in
|
||||
self.pending = ""
|
||||
let offset = 0
|
||||
@ -90,19 +101,35 @@ class Base64IncrementalParser(IncrementalEncoder):
|
||||
raise UnicodeEncodeError(self.name, string, offset, suboffset,
|
||||
"Base64 truncated or with invalid number of pad characters")
|
||||
offset = suboffset
|
||||
def reset(): self.pending = ""
|
||||
def getstate(): return self.pending
|
||||
def setstate(state): self.pending = state
|
||||
def reset():
|
||||
"""Implements `IncrementalEncoder.reset`"""
|
||||
self.pending = ""
|
||||
def getstate():
|
||||
"""Implements `IncrementalEncoder.getstate`"""
|
||||
return self.pending
|
||||
def setstate(state):
|
||||
"""Implements `IncrementalEncoder.setstate`"""
|
||||
self.pending = state
|
||||
|
||||
register_kuroko_codec(["inverse-base64"],
|
||||
Base64IncrementalParser, Base64IncrementalCreator)
|
||||
|
||||
class Base64UUIncrementalCreator(Base64IncrementalCreator):
|
||||
"""
|
||||
IncrementalDecoder implementation to create (yes) the flavour of Base64 used in uuencode.
|
||||
|
||||
Note that this does not output the uuencode format, and is only one component of implementing it.
|
||||
"""
|
||||
name = "inverse-base64uu"
|
||||
alphabet = _base64_alphabet_uu
|
||||
padchar = " "
|
||||
|
||||
class Base64UUIncrementalParser(Base64IncrementalParser):
|
||||
"""
|
||||
IncrementalEncoder implementation to parse (yes) the flavour of Base64 used in uuencode.
|
||||
|
||||
Note that this does not take the uuencode format, and is only one component of implementing it.
|
||||
"""
|
||||
name = "inverse-base64uu"
|
||||
alphabet = _base64_alphabet_uu
|
||||
padchar = " "
|
||||
@ -111,10 +138,20 @@ register_kuroko_codec(["inverse-base64uu"],
|
||||
Base64UUIncrementalParser, Base64UUIncrementalCreator)
|
||||
|
||||
class Base64HQXIncrementalCreator(Base64IncrementalCreator):
|
||||
"""
|
||||
IncrementalDecoder implementation to create (yes) the flavour of Base64 used in BinHex4.
|
||||
|
||||
Note that this does not output the BinHex4 format, and is only one component of implementing it.
|
||||
"""
|
||||
name = "inverse-base64hqx"
|
||||
alphabet = _base64_alphabet_hqx
|
||||
|
||||
class Base64HQXIncrementalParser(Base64IncrementalParser):
|
||||
"""
|
||||
IncrementalEncoder implementation to parse (yes) the flavour of Base64 used in BinHex4.
|
||||
|
||||
Note that this does not take the BinHex4 format, and is only one component of implementing it.
|
||||
"""
|
||||
name = "inverse-base64hqx"
|
||||
alphabet = _base64_alphabet_hqx
|
||||
|
||||
@ -123,8 +160,12 @@ register_kuroko_codec(["inverse-base64hqx"],
|
||||
|
||||
|
||||
class QuoPriIncrementalCreator(IncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation to create (yes) Quoted-Printable from bytes.
|
||||
"""
|
||||
name = "inverse-quopri"
|
||||
def decode(data_in, final = False):
|
||||
"""Implements `IncrementalDecoder.decode`"""
|
||||
let data = self.pending + data_in
|
||||
self.pending = b""
|
||||
let offset = 0
|
||||
@ -171,17 +212,24 @@ class QuoPriIncrementalCreator(IncrementalDecoder):
|
||||
self.linelength += 3
|
||||
offset += 1
|
||||
def reset():
|
||||
"""Implements `IncrementalDecoder.reset`"""
|
||||
self.linelength = 0
|
||||
self.pending = b""
|
||||
def getstate():
|
||||
"""Implements `IncrementalDecoder.getstate`"""
|
||||
return (self.linelength, self.pending)
|
||||
def setstate(state):
|
||||
"""Implements `IncrementalDecoder.setstate`"""
|
||||
self.linelength = state[0]
|
||||
self.pending = state[1]
|
||||
|
||||
class QuoPriIncrementalParser(IncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation to parse (yes) Quoted-Printable from a string to bytes.
|
||||
"""
|
||||
name = "inverse-quopri"
|
||||
def encode(string_in, final = False):
|
||||
"""Implements `IncrementalEncoder.encode`"""
|
||||
let string = self.pending + string_in
|
||||
self.pending = ""
|
||||
let offset = 0
|
||||
@ -220,15 +268,26 @@ class QuoPriIncrementalParser(IncrementalEncoder):
|
||||
let byteval = (hexd.index(procsubst[1]) << 4) | hexd.index(procsubst[2])
|
||||
out.add(bytes([byteval]))
|
||||
offset += 3
|
||||
def reset(): self.pending = ""
|
||||
def getstate(): return self.pending
|
||||
def setstate(state): self.pending = state
|
||||
def reset():
|
||||
"""Implements `IncrementalEncoder.reset`"""
|
||||
self.pending = ""
|
||||
def getstate():
|
||||
"""Implements `IncrementalEncoder.getstate`"""
|
||||
return self.pending
|
||||
def setstate(state):
|
||||
"""Implements `IncrementalEncoder.setstate`"""
|
||||
self.pending = state
|
||||
|
||||
register_kuroko_codec(["inverse-quopri"],
|
||||
QuoPriIncrementalParser, QuoPriIncrementalCreator)
|
||||
|
||||
|
||||
def base64_file_create(data, filename=None, mode=0o666):
|
||||
"""
|
||||
Create a Base64 string containing the provided data, with lines wrapped as required by some
|
||||
formats. If a filename and optional UNIX mode are provided, Base64 headers as recognised by
|
||||
some modern versions of uudecode are added.
|
||||
"""
|
||||
let out = StringCatenator()
|
||||
let creator = Base64IncrementalCreator("strict")
|
||||
if filename != None:
|
||||
@ -246,6 +305,9 @@ def base64_file_create(data, filename=None, mode=0o666):
|
||||
|
||||
|
||||
def uu_file_create(data, filename="-", mode=0o666):
|
||||
"""
|
||||
Create a string in the uuencode file format containing the provided data.
|
||||
"""
|
||||
let out = StringCatenator()
|
||||
let creator = Base64UUIncrementalCreator("strict")
|
||||
let octmode = oct(mode)[2:]
|
||||
|
@ -1,6 +1,7 @@
|
||||
"""
|
||||
This module includes some additional variable-width or wide encodings not specified by WHATWG. As
|
||||
such, none of the codecs in this module should be used in HTML.
|
||||
This module includes some additional variable-width or wide encodings not specified by WHATWG.
|
||||
|
||||
As such, none of the codecs in this module should be used in HTML.
|
||||
"""
|
||||
|
||||
from codecs.dbextra_data_8bit import data_8bit
|
||||
@ -13,6 +14,17 @@ from collections import xraydict
|
||||
|
||||
|
||||
class Big5NonEtenKanaIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for Big5 with non-ETEN layout of kana, Cyrillic, list markers.
|
||||
|
||||
The other ETEN extension section (the one retained by Microsoft's version) is still included.
|
||||
|
||||
Although this is the kana/Cyrillic/list marker layout included in the UTC's BIG5.TXT, it is the
|
||||
less common of the two (most extension schemes for Big5 use the ETEN layout), and has several
|
||||
problems (katakana lacks the vowel extender, and Cyrillic lacks several capitals) which the
|
||||
ETEN layout does not have. However, this codec corresponds roughly to Python's `big5`, and more
|
||||
closely to its (built-in, as opposed to if/when Python aliases it to `mbcs`) `cp950`.
|
||||
"""
|
||||
name = "big5-nonetenkana"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -20,6 +32,17 @@ class Big5NonEtenKanaIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return xraydict(data_8bit.cp950_no_eudc_encoding_map, data_8bit.encode_big5_nonetenkana)
|
||||
|
||||
class Big5NonEtenKanaIncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for Big5 with non-ETEN layout of kana, Cyrillic, list markers.
|
||||
|
||||
The other ETEN extension section (the one retained by Microsoft's version) is still included.
|
||||
|
||||
Although this is the kana/Cyrillic/list marker layout included in the UTC's BIG5.TXT, it is the
|
||||
less common of the two (most extension schemes for Big5 use the ETEN layout), and has several
|
||||
problems (katakana lacks the vowel extender, and Cyrillic lacks several capitals) which the
|
||||
ETEN layout does not have. However, this codec corresponds roughly to Python's `big5`, and more
|
||||
closely to its (built-in, as opposed to if/when Python aliases it to `mbcs`) `cp950`.
|
||||
"""
|
||||
name = "big5-nonetenkana"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -32,6 +55,13 @@ register_kuroko_codec(["big5-nonetenkana", "big5-tw"],
|
||||
Big5NonEtenKanaIncrementalEncoder, Big5NonEtenKanaIncrementalDecoder)
|
||||
|
||||
class XMacChineseTradIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for Big5 with Apple's additions and reduced lead byte range.
|
||||
|
||||
The Unicode mappings are partly changed to be closer to Apple's (as opposed to Microsoft's)
|
||||
correspondences; however, Microsoft's are retained where following Apple's would have required
|
||||
PUA transcoding hints to round-trip.
|
||||
"""
|
||||
name = "x-mac-chinesetrad"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -54,6 +84,13 @@ class XMacChineseTradIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
})
|
||||
|
||||
class XMacChineseTradIncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for Big5 with Apple's additions and reduced lead byte range.
|
||||
|
||||
The Unicode mappings are partly changed to be closer to Apple's (as opposed to Microsoft's)
|
||||
correspondences; however, Microsoft's are retained where following Apple's would have required
|
||||
PUA transcoding hints to round-trip.
|
||||
"""
|
||||
name = "x-mac-chinesetrad"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -83,6 +120,12 @@ register_kuroko_codec(["x-mac-chinesetrad", "x-mac-trad-chinese"],
|
||||
|
||||
|
||||
class XMacChineseSimpIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for EUC-CN, Apple version (hence slightly reduced lead byte range).
|
||||
|
||||
Mappings to more-recently added characters are used for the vertical forms, rather than
|
||||
Apple transcoding hints (or GB18030 private use codes).
|
||||
"""
|
||||
name = "x-mac-chinesesimp"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -102,6 +145,12 @@ class XMacChineseSimpIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
})
|
||||
|
||||
class XMacChineseSimpIncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for EUC-CN, Apple version (hence slightly reduced lead byte range).
|
||||
|
||||
Mappings to more-recently added characters are used for the vertical forms, rather than
|
||||
Apple transcoding hints (or GB18030 private use codes).
|
||||
"""
|
||||
name = "x-mac-chinesesimp"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -128,6 +177,10 @@ register_kuroko_codec(["x-mac-chinesesimp", "x-mac-simp-chinese", "euc-cn", "euc
|
||||
|
||||
|
||||
class Cesu8IncrementalEncoder(IncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for CESU-8, a deprecated UTF-8-like encoding still used by
|
||||
some systems, such as TCL, and still mis-called "utf8" in some places for legacy reasons.
|
||||
"""
|
||||
name = "cesu-8"
|
||||
html5name = None
|
||||
# -1: expecting BOM
|
||||
@ -135,6 +188,7 @@ class Cesu8IncrementalEncoder(IncrementalEncoder):
|
||||
state = None
|
||||
include_bom = False
|
||||
def encode(string, final = False):
|
||||
"""Implements `IncrementalEncoder.encode`"""
|
||||
let out = ByteCatenator()
|
||||
if self.include_bom and self.state == -1:
|
||||
out.add("\uFEFF".encode())
|
||||
@ -161,13 +215,20 @@ class Cesu8IncrementalEncoder(IncrementalEncoder):
|
||||
out.add(string[first_offset:second_offset].encode())
|
||||
return out.getvalue()
|
||||
def reset():
|
||||
"""Implements `IncrementalEncoder.reset`"""
|
||||
self.state = -1
|
||||
def getstate():
|
||||
"""Implements `IncrementalEncoder.getstate`"""
|
||||
return self.state
|
||||
def setstate(state):
|
||||
"""Implements `IncrementalEncoder.setstate`"""
|
||||
self.state = state
|
||||
|
||||
class Cesu8IncrementalDecoder(Utf8IncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for CESU-8, a deprecated UTF-8-like encoding still used by
|
||||
some systems, such as TCL, and still mis-called "utf8" in some places for legacy reasons.
|
||||
"""
|
||||
name = "cesu-8"
|
||||
html5name = None
|
||||
def _error_handler(error):
|
||||
@ -211,6 +272,10 @@ let _base64_alphabet = (
|
||||
let _utf7_not_need_hyphen = [ord(i) for i in "(),.:? \r\n"]
|
||||
|
||||
class Utf7IncrementalEncoder(IncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for UTF-7, a largely obsolete (and forbidden in HTML5)
|
||||
scheme for mixing ASCII with Base64'd UTF-16BE in e-mail.
|
||||
"""
|
||||
name = "utf-7"
|
||||
html5name = None
|
||||
utf16encoder = None
|
||||
@ -220,6 +285,7 @@ class Utf7IncrementalEncoder(IncrementalEncoder):
|
||||
self.utf16encoder = Utf16BeIncrementalEncoder(errors)
|
||||
IncrementalEncoder.__init__(self, errors)
|
||||
def encode(data, final=False):
|
||||
"""Implements `IncrementalEncoder.encode`"""
|
||||
let incoming = self.pending + list(self.utf16encoder.encode(data, final=final))
|
||||
self.pending = []
|
||||
let offset = 0
|
||||
@ -254,16 +320,23 @@ class Utf7IncrementalEncoder(IncrementalEncoder):
|
||||
self.mode = "ascii"
|
||||
return out.getvalue()
|
||||
def reset():
|
||||
"""Implements `IncrementalEncoder.reset`"""
|
||||
self.utf16encoder.reset()
|
||||
self.mode = "ascii"
|
||||
def getstate():
|
||||
"""Implements `IncrementalEncoder.getstate`"""
|
||||
return (self.utf16encoder.getstate(), self.mode, self.pending)
|
||||
def setstate(state):
|
||||
"""Implements `IncrementalEncoder.setstate`"""
|
||||
self.utf16encoder.setstate(state[0])
|
||||
self.mode = state[1]
|
||||
self.pending = state[2]
|
||||
|
||||
class Utf7IncrementalDecoder(IncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for UTF-7, a largely obsolete (and forbidden in HTML5)
|
||||
scheme for mixing ASCII with Base64'd UTF-16BE in e-mail.
|
||||
"""
|
||||
name = "utf-7"
|
||||
html5name = None
|
||||
utf16decoder = None
|
||||
@ -273,6 +346,7 @@ class Utf7IncrementalDecoder(IncrementalDecoder):
|
||||
self.utf16decoder = Utf16BeIncrementalDecoder(errors)
|
||||
IncrementalDecoder.__init__(self, errors)
|
||||
def decode(data_in, final=False):
|
||||
"""Implements `IncrementalDecoder.decode`"""
|
||||
let data = self.pending + data_in
|
||||
self.pending = b""
|
||||
let incoming = list(data)
|
||||
@ -329,12 +403,15 @@ class Utf7IncrementalDecoder(IncrementalDecoder):
|
||||
self.mode = "ascii"
|
||||
return out.getvalue()
|
||||
def reset():
|
||||
"""Implements `IncrementalDecoder.reset`"""
|
||||
self.utf16decoder.reset()
|
||||
self.mode = "ascii"
|
||||
self.pending = b""
|
||||
def getstate():
|
||||
"""Implements `IncrementalDecoder.getstate`"""
|
||||
return (self.utf16encoder.getstate(), self.mode, self.pending)
|
||||
def setstate(state):
|
||||
"""Implements `IncrementalDecoder.setstate`"""
|
||||
self.utf16encoder.setstate(state[0])
|
||||
self.mode = state[1]
|
||||
self.pending = state[2]
|
||||
@ -344,6 +421,9 @@ register_kuroko_codec(["utf-7", "utf7", "u7", "unicode-1-1-utf-7"],
|
||||
|
||||
|
||||
class EucJpFullIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for EUC-JP, including JIS X 0212.
|
||||
"""
|
||||
name = "euc-jp-full"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -355,6 +435,9 @@ register_kuroko_codec(["euc-jp-full"],
|
||||
|
||||
|
||||
class EucJis2004IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for the JIS X 0213 version of EUC-JP.
|
||||
"""
|
||||
name = "euc-jis-2004"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -362,6 +445,9 @@ class EucJis2004IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return data_8bit.encode_euc04
|
||||
|
||||
class EucJis2004IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for the JIS X 0213 version of EUC-JP.
|
||||
"""
|
||||
name = "euc-jis-2004"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -377,6 +463,9 @@ register_kuroko_codec(["euc-jis-2004", "jisx0213", "eucjis2004", "euc_jis2004",
|
||||
|
||||
|
||||
class ShiftJis2004IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for the JIS X 0213 version of Shift_JIS.
|
||||
"""
|
||||
name = "shift-jis-2004"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -385,6 +474,9 @@ class ShiftJis2004IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
ascii_exceptions = (0x5C, 0x7E)
|
||||
|
||||
class ShiftJis2004IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for the JIS X 0213 version of Shift_JIS.
|
||||
"""
|
||||
name = "shift-jis-2004"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -403,6 +495,9 @@ register_kuroko_codec(["shift_jis-2004", "shiftjis2004", "sjis_2004", "s_jis_200
|
||||
|
||||
|
||||
class AsciiJohabIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for the PC Johab encoding (code page 1361).
|
||||
"""
|
||||
name = "johab-ascii"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -410,6 +505,9 @@ class AsciiJohabIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return data_8bit.encode_johab_ascii
|
||||
|
||||
class AsciiJohabIncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for the PC Johab encoding (code page 1361).
|
||||
"""
|
||||
name = "johab-ascii"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -424,6 +522,9 @@ register_kuroko_codec(["cp1361", "ms1361", "johab", "x-johab", "johab-ascii"],
|
||||
|
||||
|
||||
class EbcdicJohabIncrementalEncoder(BaseEbcdicIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for code page 1364, a stateful EBCDIC variant of Johab.
|
||||
"""
|
||||
name = "johab-ebcdic"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -434,6 +535,9 @@ class EbcdicJohabIncrementalEncoder(BaseEbcdicIncrementalEncoder):
|
||||
return data_8bit.encode_johab_ebcdic
|
||||
|
||||
class EbcdicJohabIncrementalDecoder(BaseEbcdicIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for code page 1364, a stateful EBCDIC variant of Johab.
|
||||
"""
|
||||
name = "johab-ebcdic"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -443,12 +547,22 @@ class EbcdicJohabIncrementalDecoder(BaseEbcdicIncrementalDecoder):
|
||||
def dbcshost_decode():
|
||||
return data_8bit.decode_johab_ebcdic
|
||||
|
||||
register_kuroko_codec(["cp933", "ibm-933", "933", "x-IBM933", "ibm-1364", "x-IBM1364",
|
||||
register_kuroko_codec(["cp933", "ibm-933", "933", "x-IBM933", "cp1364", "ibm-1364", "x-IBM1364",
|
||||
"johab-ebcdic"],
|
||||
EbcdicJohabIncrementalEncoder, EbcdicJohabIncrementalDecoder)
|
||||
|
||||
|
||||
class JisEncodingIncrementalEncoder(Iso2022JpIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for 7-bit stateful Japanese with all features.
|
||||
|
||||
This differs from the ISO-2022-JP encoder in that it will:
|
||||
|
||||
- Encode forms present in 1978 JIS but simplified by (and absent in) 1983 JIS to 1978 JIS.
|
||||
- For characters not present in either table, try JIS X 0212, 2000 JIS and 2004 JIS in that order.
|
||||
- For characters not present in any JIS set, try GB 2312 and Wansung.
|
||||
- Preserve width of katakana.
|
||||
"""
|
||||
name = "jis_encoding"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -477,6 +591,18 @@ class JisEncodingIncrementalEncoder(Iso2022JpIncrementalEncoder):
|
||||
attitude = "eager"
|
||||
|
||||
class JisEncodingIncrementalDecoder(Iso2022JpIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for 7-bit stateful Japanese.
|
||||
|
||||
This is differs from the ISO-2022-JP decoder in that it will:
|
||||
|
||||
- Decode 1978 JIS with a separate table, including 1978 JIS, NEC extensions and IBM backports.
|
||||
- Accept and decode extensions from ISO-2022-JP-2 (and -1), ISO-2022-JP-3 and ISO-2022-JP-2004.
|
||||
- Not generate an error for immediately concatenated JIS-Kanji→ASCII→JIS-Kanji designations.
|
||||
- Accept katakana via Shift Out / Shift In.
|
||||
|
||||
This is used as the decoder for all other ISO-2022-JP variants besides plain ISO-2022-JP.
|
||||
"""
|
||||
name = "jis_encoding"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -518,6 +644,12 @@ register_kuroko_codec(["jis_encoding", "csjisencoding", "jis", "jis7"],
|
||||
|
||||
|
||||
class Iso2022Jp1IncrementalEncoder(Iso2022JpIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for 7-bit stateful Japanese with JIS X 0212.
|
||||
|
||||
This differs from the ISO-2022-JP encoder in that it will encode to JIS X 0212, and does so
|
||||
whenever possible (i.e. it will favour it over any web extensions to JIS X 0208).
|
||||
"""
|
||||
name = "iso-2022-jp-1"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -525,6 +657,7 @@ class Iso2022Jp1IncrementalEncoder(Iso2022JpIncrementalEncoder):
|
||||
return [None, None]
|
||||
@lazy_property
|
||||
def encodes_dbcs():
|
||||
# Favour JIS X 0212 over any extensions in the web JIS X 0208 table.
|
||||
return [None, None, data_7bit.encode_jis90p2, more_dbdata.encode_jis7]
|
||||
escs_onebyte = {0: 0x42, 1: 0x4A}
|
||||
escs_twobyte = {3: 0x42, 2: 0x44}
|
||||
@ -535,6 +668,11 @@ register_kuroko_codec(["iso-2022-jp-1", "iso2022-jp-1", "iso2022jp-1"],
|
||||
|
||||
|
||||
class Iso2022JpExtIncrementalEncoder(Iso2022JpIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for 7-bit stateful Japanese.
|
||||
|
||||
This differs from the ISO-2022-JP encoder in that it preserves katakana width.
|
||||
"""
|
||||
name = "iso-2022-jp-ext"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -552,6 +690,9 @@ register_kuroko_codec(["iso-2022-jp-ext", "iso2022-jp-ext", "iso2022jp-ext"],
|
||||
|
||||
|
||||
class Iso2022Jp2IncrementalEncoder(Iso2022JpIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for 7-bit stateful Japanese with multilingual extensions.
|
||||
"""
|
||||
name = "iso-2022-jp-2"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -559,6 +700,7 @@ class Iso2022Jp2IncrementalEncoder(Iso2022JpIncrementalEncoder):
|
||||
return [None, None]
|
||||
@lazy_property
|
||||
def encodes_dbcs():
|
||||
# Favour JIS X 0212 over any extensions in the web JIS X 0208 table.
|
||||
return [None, None,
|
||||
data_7bit.encode_jis90p2,
|
||||
more_dbdata.encode_jis7,
|
||||
@ -579,19 +721,10 @@ register_kuroko_codec(["iso-2022-jp-2", "iso2022-jp-2", "iso2022jp-2", "csISO202
|
||||
Iso2022Jp2IncrementalEncoder, JisEncodingIncrementalDecoder)
|
||||
|
||||
|
||||
# Bit confusing to explain what this bit is doing, so let me explain:
|
||||
# The JIS X 0213 variants of ISO-2022-JP should encode to JIS X 0213 before encoding to any
|
||||
# extension to JIS X 0208 (assuming they "should" encode to extensions at all). So we remove any
|
||||
# characters that are encoded to different locations in JIS X 0213.
|
||||
# Since NEC Row 13 is retained in JIS X 0213, but should be encoded in the JIS X 0213 state not
|
||||
# the JIS X 0208 state, it is also excluded.
|
||||
# This also removes certain Unicode characters that are mapped differently by Microsoft/WHATWG
|
||||
# versus by JIS X 0213, e.g. the fullwidth tilde, in the hope that the JIS X 0213 tables would
|
||||
# more dependably round trip.
|
||||
let encode_jis7_reduced = xraydict(more_dbdata.encode_jis7, {}, [33537, 33634, 33663, 33735, 33864, 33972, 34012, 34131, 34137, 34224, 35061, 35100, 35346, 35383, 35449, 35495, 35518, 35551, 35574, 35711, 36080, 36084, 36114, 20008, 20193, 20224, 20227, 20310, 20362, 20370, 20372, 20378, 20425, 20544, 20514, 20510, 20550, 20546, 20592, 20628, 37086, 37141, 37159, 20810, 20893, 37335, 37338, 37357, 37358, 37348, 37349, 37386, 37392, 21013, 37434, 37436, 37440, 37433, 37454, 37457, 37465, 37479, 37496, 37512, 21158, 37543, 21167, 37584, 37587, 37591, 21211, 37593, 37600, 37607, 37625, 21248, 37627, 21255, 37631, 37634, 37662, 37661, 21284, 37669, 37665, 37704, 37719, 37744, 21395, 21426, 37830, 37854, 37957, 21642, 21660, 21673, 21759, 21894, 38557, 38575, 38707, 38715, 38733, 38735, 38741, 22444, 22472, 22471, 38999, 39013, 22686, 22795, 22875, 22877, 39326, 22948, 39502, 39644, 23382, 39794, 39797, 39823, 39857, 23488, 23512, 23532, 39936, 23582, 23718, 23738, 23847, 23874, 23891, 40299, 23917, 40304, 23992, 23993, 40473, 40657, 24372, 24389, 24423, 24503, 24542, 24714, 24789, 24818, 8470, 8481, 24880, 24887, 8544, 8545, 8546, 8547, 8548, 8549, 8550, 8551, 8552, 8553, 8560, 8561, 8562, 8563, 8564, 8565, 8566, 8567, 8568, 8569, 24984, 8721, 8730, 8735, 8736, 8741, 8745, 8746, 8747, 8750, 8757, 8786, 8801, 8869, 25254, 8895, 25589, 9312, 9313, 9314, 9315, 9316, 9317, 9318, 9319, 9320, 9321, 9322, 9323, 9324, 9325, 9326, 9327, 9328, 9329, 9330, 9331, 25696, 25757, 25806, 26112, 26121, 26133, 26142, 26148, 26161, 26199, 26201, 26213, 26227, 26265, 26272, 26290, 26303, 26362, 26363, 26470, 26555, 26560, 26625, 26692, 26706, 26824, 26831, 26984, 27032, 27106, 27184, 27206, 27243, 27251, 27262, 27364, 27606, 27711, 27740, 27782, 27866, 27908, 28039, 28076, 28111, 28156, 28199, 28220, 28252, 28351, 28552, 28597, 28661, 12317, 12319, 28677, 28679, 28712, 28859, 28805, 28843, 28943, 28932, 28998, 28999, 29020, 29121, 29182, 12849, 12850, 12857, 12964, 12965, 12966, 12967, 12968, 29361, 29374, 13059, 13069, 13076, 13080, 13090, 13091, 13094, 13095, 13099, 13110, 13115, 13129, 13130, 13133, 13137, 13143, 29559, 13179, 13180, 13181, 13182, 13198, 13199, 13212, 13213, 13214, 13217, 13252, 29641, 13261, 29654, 29667, 29703, 29734, 29738, 29742, 29794, 29833, 29855, 29953, 29999, 30063, 30363, 30364, 30366, 30374, 30534, 30753, 30798, 30820, 63785, 31024, 31124, 31131, 63964, 64015, 64016, 64017, 64019, 64020, 64021, 64022, 64025, 64026, 64027, 64031, 64032, 64033, 64034, 64036, 64038, 31441, 31463, 31467, 31646, 32072, 32092, 32160, 32183, 32214, 32338, 32394, 65282, 65287, 65293, 65374, 32583, 65508])
|
||||
|
||||
|
||||
class Iso2022Jp3IncrementalEncoder(Iso2022JpIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for 7-bit stateful Japanese with JIS X 0213-2000.
|
||||
"""
|
||||
name = "iso-2022-jp-3"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -600,7 +733,7 @@ class Iso2022Jp3IncrementalEncoder(Iso2022JpIncrementalEncoder):
|
||||
@lazy_property
|
||||
def encodes_dbcs():
|
||||
return [None, None, None,
|
||||
encode_jis7_reduced,
|
||||
data_7bit.encode_jis7_reduced,
|
||||
data_7bit.encode_jis00,
|
||||
data_7bit.encode_jis00p2]
|
||||
escs_onebyte = {0: 0x42, 1: 0x4A, 2: 0x49}
|
||||
@ -612,6 +745,9 @@ register_kuroko_codec(["iso-2022-jp-3", "iso2022-jp-3", "iso2022jp-3"],
|
||||
|
||||
|
||||
class Iso2022Jp2004IncrementalEncoder(Iso2022JpIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for 7-bit stateful Japanese with JIS X 0213-2004.
|
||||
"""
|
||||
name = "iso-2022-jp-2004"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -620,7 +756,7 @@ class Iso2022Jp2004IncrementalEncoder(Iso2022JpIncrementalEncoder):
|
||||
@lazy_property
|
||||
def encodes_dbcs():
|
||||
return [None, None, None,
|
||||
encode_jis7_reduced,
|
||||
data_7bit.encode_jis7_reduced,
|
||||
data_7bit.encode_jis00p2,
|
||||
data_7bit.encode_jis04]
|
||||
escs_onebyte = {0: 0x42, 1: 0x4A, 2: 0x49}
|
||||
@ -632,6 +768,9 @@ register_kuroko_codec(["iso-2022-jp-2004", "iso2022-jp-2004", "iso2022jp-2004"],
|
||||
|
||||
|
||||
class Utf32IncrementalEncoder(IncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for UTF-32 with byte order mark.
|
||||
"""
|
||||
name = "utf-32"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -650,6 +789,7 @@ class Utf32IncrementalEncoder(IncrementalEncoder):
|
||||
else:
|
||||
raise ValueError("unexpected endian value: " + repr(self.endian))
|
||||
def encode(string, final = False):
|
||||
"""Implements `IncrementalEncoder.encode`"""
|
||||
let out = ByteCatenator()
|
||||
let offset = 0
|
||||
if self.include_bom and self.state == -1:
|
||||
@ -672,13 +812,19 @@ class Utf32IncrementalEncoder(IncrementalEncoder):
|
||||
if offset < 0:
|
||||
offset += len(string)
|
||||
def reset():
|
||||
"""Implements `IncrementalEncoder.reset`"""
|
||||
self.state = -1
|
||||
def getstate():
|
||||
"""Implements `IncrementalEncoder.getstate`"""
|
||||
return self.state
|
||||
def setstate(state):
|
||||
"""Implements `IncrementalEncoder.setstate`"""
|
||||
self.state = state
|
||||
|
||||
class Utf32IncrementalDecoder(IncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for UTF-32, detected byte order, removing any byte order mark.
|
||||
"""
|
||||
name = "utf-32"
|
||||
html5name = None
|
||||
force_endian = None # subclass may set to "little" or "big"
|
||||
@ -688,6 +834,7 @@ class Utf32IncrementalDecoder(IncrementalDecoder):
|
||||
state = None
|
||||
pending = b""
|
||||
def decode(data_in, final = False):
|
||||
"""Implements `IncrementalDecoder.decode`"""
|
||||
let data = self.pending + data_in
|
||||
self.pending = b""
|
||||
let out = StringCatenator()
|
||||
@ -754,34 +901,49 @@ class Utf32IncrementalDecoder(IncrementalDecoder):
|
||||
out.add(errorret[0])
|
||||
offset = errorret[1]
|
||||
if offset < 0:
|
||||
offset += len(string)
|
||||
offset += len(data)
|
||||
def reset():
|
||||
"""Implements `IncrementalDecoder.reset`"""
|
||||
self.pending = b""
|
||||
self.state = -1
|
||||
def getstate():
|
||||
"""Implements `IncrementalDecoder.getstate`"""
|
||||
return (self.pending, self.state)
|
||||
def setstate(state):
|
||||
"""Implements `IncrementalDecoder.setstate`"""
|
||||
self.pending = state[0]
|
||||
self.state = state[1]
|
||||
|
||||
class Utf32BeIncrementalEncoder(Utf32IncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for UTF-32, big endian, without a byte order mark.
|
||||
"""
|
||||
name = "utf-32be"
|
||||
html5name = None
|
||||
endian = "big"
|
||||
include_bom = False
|
||||
|
||||
class Utf32BeIncrementalDecoder(Utf32IncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for UTF-32, big endian, without a byte order mark.
|
||||
"""
|
||||
name = "utf-32be"
|
||||
html5name = None
|
||||
force_endian = "big"
|
||||
|
||||
class Utf32LeIncrementalEncoder(Utf32IncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for UTF-32, little endian, without a byte order mark.
|
||||
"""
|
||||
name = "utf-32le"
|
||||
html5name = None
|
||||
endian = "little"
|
||||
include_bom = False
|
||||
|
||||
class Utf32LeIncrementalDecoder(Utf32IncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for UTF-32, little endian, without a byte order mark.
|
||||
"""
|
||||
name = "utf-32le"
|
||||
html5name = None
|
||||
force_endian = "little"
|
||||
@ -795,6 +957,11 @@ register_kuroko_codec(["utf-32be", "utf-32-be"],
|
||||
|
||||
|
||||
class HzIncrementalEncoder(IncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for HZ-GB-2312 (Usenet simplified Chinese).
|
||||
|
||||
This is an old scheme for embedding GB 2312 data into a pure ASCII stream.
|
||||
"""
|
||||
name = "hz-gb-2312"
|
||||
html5name = None
|
||||
def ensure_state_number(state, out):
|
||||
@ -818,6 +985,7 @@ class HzIncrementalEncoder(IncrementalEncoder):
|
||||
raise ValueError("set to invalid state: " + repr(state))
|
||||
self.state = state
|
||||
def encode(string, final = False):
|
||||
"""Implements `IncrementalEncoder.encode`"""
|
||||
let out = ByteCatenator()
|
||||
let offset = 0
|
||||
while 1: # offset can be arbitrarily changed by the error handler, so not a for
|
||||
@ -850,18 +1018,27 @@ class HzIncrementalEncoder(IncrementalEncoder):
|
||||
if offset < 0:
|
||||
offset += len(string)
|
||||
def reset():
|
||||
"""Implements `IncrementalEncoder.reset`"""
|
||||
self.state = 0
|
||||
self.linelength = 0
|
||||
def getstate():
|
||||
"""Implements `IncrementalEncoder.getstate`"""
|
||||
return (self.state, self.linelength)
|
||||
def setstate(state):
|
||||
"""Implements `IncrementalEncoder.setstate`"""
|
||||
self.state = state[0]
|
||||
self.linelength = state[1]
|
||||
|
||||
class HzIncrementalDecoder(IncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for HZ-GB-2312 (Usenet simplified Chinese).
|
||||
|
||||
This is an old scheme for embedding GB 2312 data into a pure ASCII stream.
|
||||
"""
|
||||
name = "hz-gb-2312"
|
||||
html5name = None
|
||||
def decode(data_in, final = False):
|
||||
"""Implements `IncrementalDecoder.decode`"""
|
||||
let data = self.pending + data_in
|
||||
self.pending = b""
|
||||
let out = StringCatenator()
|
||||
@ -920,13 +1097,16 @@ class HzIncrementalDecoder(IncrementalDecoder):
|
||||
out.add(errorret[0])
|
||||
offset = errorret[1]
|
||||
if offset < 0:
|
||||
offset += len(string)
|
||||
offset += len(data)
|
||||
def reset():
|
||||
"""Implements `IncrementalDecoder.reset`"""
|
||||
self.pending = b""
|
||||
self.state_set = 0
|
||||
def getstate():
|
||||
"""Implements `IncrementalDecoder.getstate`"""
|
||||
return (self.pending, self.state_set)
|
||||
def setstate(state):
|
||||
"""Implements `IncrementalDecoder.setstate`"""
|
||||
self.pending = state[0]
|
||||
self.state_set = state[1]
|
||||
|
||||
@ -935,6 +1115,14 @@ register_kuroko_codec(["hz-gb-2312", "hz", "hzgb", "hz_gb"],
|
||||
|
||||
|
||||
class JapaneseAutodetectIncrementalDecoder(IncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for the automatic "Japanese" character encoding option.
|
||||
|
||||
This will attempt to interpret the stream as the web versions of ISO-2022-JP, Shift_JIS and
|
||||
EUC-JP, as well as UTF-8, at once, and start returning the data once it has narrowed it down
|
||||
to one. If it fails to narrow it down conclusively, it will wait until the final call before
|
||||
making an educated guess. If it doesn't seem to be any of them, it will raise `ValueError`.
|
||||
"""
|
||||
name = "japanese"
|
||||
html5name = None
|
||||
# State flags:
|
||||
@ -951,11 +1139,14 @@ class JapaneseAutodetectIncrementalDecoder(IncrementalDecoder):
|
||||
self.utf = lookup("utf-8-sig").incrementaldecoder("strict")
|
||||
self.reset()
|
||||
def decode(data, final = False):
|
||||
"""Implements `IncrementalDecoder.decode`"""
|
||||
if not (self.state & 0x01):
|
||||
try:
|
||||
self.pendingjis.add(self.jis.decode(data, final))
|
||||
except UnicodeDecodeError:
|
||||
self.state |= 0x01
|
||||
if self.jis.state_set != 0:
|
||||
self.state |= 0x0E
|
||||
#
|
||||
if not (self.state & 0x02):
|
||||
try:
|
||||
@ -1029,6 +1220,7 @@ class JapaneseAutodetectIncrementalDecoder(IncrementalDecoder):
|
||||
return ret
|
||||
return ""
|
||||
def reset():
|
||||
"""Implements `IncrementalDecoder.reset`"""
|
||||
self.state = 0
|
||||
self.pending = b""
|
||||
self.jis.reset()
|
||||
@ -1040,12 +1232,14 @@ class JapaneseAutodetectIncrementalDecoder(IncrementalDecoder):
|
||||
self.utf.reset()
|
||||
self.pendingutf = StringCatenator()
|
||||
def getstate():
|
||||
"""Implements `IncrementalDecoder.getstate`"""
|
||||
return (self.jis.getstate(), self.pendingjis.getvalue(),
|
||||
self.sjis.getstate(), self.pendingsjis.getvalue(),
|
||||
self.ujis.getstate(), self.pendingujis.getvalue(),
|
||||
self.utf.getstate(), self.pendingutf.getvalue(),
|
||||
self.state)
|
||||
def setstate(state):
|
||||
"""Implements `IncrementalDecoder.setstate`"""
|
||||
self.jis.setstate(state[0])
|
||||
self.pendingjis = StringCatenator()
|
||||
self.pendingjis.add(state[1])
|
||||
@ -1067,6 +1261,9 @@ register_kuroko_codec(["japanese"], Utf8SigIncrementalEncoder,
|
||||
|
||||
|
||||
class Iso2022NonJpIncrementalEncoder(IncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder subclass, base class for ISO-2022-KR and ISO-2022-CN. Not used directly.
|
||||
"""
|
||||
name = None
|
||||
html5name = None
|
||||
encodes = []
|
||||
@ -1095,6 +1292,7 @@ class Iso2022NonJpIncrementalEncoder(IncrementalEncoder):
|
||||
self.super3_desig = state
|
||||
def run_prelude(out):
|
||||
def encode(string, final = False):
|
||||
"""Implements `IncrementalEncoder.encode`"""
|
||||
let out = ByteCatenator()
|
||||
let offset = 0
|
||||
if self.shift_desig == None and self.super_desig == None and self.super3_desig == None:
|
||||
@ -1156,19 +1354,25 @@ class Iso2022NonJpIncrementalEncoder(IncrementalEncoder):
|
||||
if offset < 0:
|
||||
offset += len(string)
|
||||
def reset():
|
||||
"""Implements `IncrementalEncoder.reset`"""
|
||||
self.shift = False
|
||||
self.shift_desig = None
|
||||
self.super_desig = None
|
||||
self.super3_desig = None
|
||||
def getstate():
|
||||
"""Implements `IncrementalEncoder.getstate`"""
|
||||
return (self.shift, self.shift_desig, self.super_desig, self.super3_desig)
|
||||
def setstate(state):
|
||||
"""Implements `IncrementalEncoder.setstate`"""
|
||||
self.shift = state[0]
|
||||
self.shift_desig = state[1]
|
||||
self.super_desig = state[2]
|
||||
self.super3_desig = state[2]
|
||||
|
||||
class Iso2022NonJpIncrementalDecoder(IncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder subclass, base class for ISO-2022-KR and ISO-2022-CN. Not used directly.
|
||||
"""
|
||||
name = None
|
||||
html5name = None
|
||||
decodes = []
|
||||
@ -1176,6 +1380,7 @@ class Iso2022NonJpIncrementalDecoder(IncrementalDecoder):
|
||||
escs_super = {}
|
||||
escs_super3 = {}
|
||||
def decode(data_in, final = False):
|
||||
"""Implements `IncrementalDecoder.decode`"""
|
||||
let data = self.pending + data_in
|
||||
self.pending = b""
|
||||
let out = StringCatenator()
|
||||
@ -1285,16 +1490,19 @@ class Iso2022NonJpIncrementalDecoder(IncrementalDecoder):
|
||||
out.add(errorret[0])
|
||||
offset = errorret[1]
|
||||
if offset < 0:
|
||||
offset += len(string)
|
||||
offset += len(data)
|
||||
def reset():
|
||||
"""Implements `IncrementalDecoder.reset`"""
|
||||
self.pending = b""
|
||||
self.shift = False
|
||||
self.shift_desig = None
|
||||
self.super_desig = None
|
||||
self.super3_desig = None
|
||||
def getstate():
|
||||
"""Implements `IncrementalDecoder.getstate`"""
|
||||
return (self.pending, self.shift, self.shift_desig, self.super_desig, self.super3_desig)
|
||||
def setstate(state):
|
||||
"""Implements `IncrementalDecoder.setstate`"""
|
||||
self.pending = state[0]
|
||||
self.shift = state[1]
|
||||
self.shift_desig = state[2]
|
||||
@ -1302,6 +1510,9 @@ class Iso2022NonJpIncrementalDecoder(IncrementalDecoder):
|
||||
self.super3_desig = state[4]
|
||||
|
||||
class Iso2022KrIncrementalEncoder(Iso2022NonJpIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for ISO-2022-KR (7-bit stateful Korean, South).
|
||||
"""
|
||||
name = "iso-2022-kr"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -1315,6 +1526,9 @@ class Iso2022KrIncrementalEncoder(Iso2022NonJpIncrementalEncoder):
|
||||
self.ensure_shift_designation(0, out)
|
||||
|
||||
class Iso2022KrIncrementalDecoder(Iso2022NonJpIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for ISO-2022-KR (7-bit stateful Korean, South).
|
||||
"""
|
||||
name = "iso-2022-kr"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -1326,6 +1540,11 @@ register_kuroko_codec(["iso-2022-kr", "iso2022-kr", "iso2022kr", "csiso2022kr"],
|
||||
Iso2022KrIncrementalEncoder, Iso2022KrIncrementalDecoder)
|
||||
|
||||
class Iso2022CnIncrementalEncoder(Iso2022NonJpIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for ISO-2022-CN (7-bit stateful Chinese).
|
||||
|
||||
ISO-2022-CN-Ext is not included (it requires a much larger set of tables and is very rare).
|
||||
"""
|
||||
name = "iso-2022-cn"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -1335,6 +1554,11 @@ class Iso2022CnIncrementalEncoder(Iso2022NonJpIncrementalEncoder):
|
||||
escs_super = {2: 0x48}
|
||||
|
||||
class Iso2022CnIncrementalDecoder(Iso2022NonJpIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for ISO-2022-CN (7-bit stateful Chinese).
|
||||
|
||||
ISO-2022-CN-Ext is not included (it requires a much larger set of tables and is very rare).
|
||||
"""
|
||||
name = "iso-2022-cn"
|
||||
html5name = None
|
||||
@lazy_property
|
||||
|
@ -1,3 +1,6 @@
|
||||
"""
|
||||
Defines 7-bit mapping data for `codecs.dbextra`.
|
||||
"""
|
||||
from collections import xraydict
|
||||
from codecs.infrastructure import encodesto7bit, decodesto7bit, lazy_property
|
||||
from codecs.dbdata import more_dbdata, Windows949IncrementalEncoder, Windows949IncrementalDecoder
|
||||
@ -29,6 +32,20 @@ class _DBExtraData7Bit:
|
||||
return {65408: 64, 65409: 65, 65410: 66, 65411: 67, 65412: 68, 65413: 69, 65414: 70, 65415: 71, 65416: 72, 65417: 73, 65418: 74, 65419: 75, 65420: 76, 65421: 77, 65422: 78, 65423: 79, 65424: 80, 65425: 81, 65426: 82, 65427: 83, 65428: 84, 65429: 85, 65430: 86, 65431: 87, 65432: 88, 65433: 89, 65434: 90, 65435: 91, 65436: 92, 65437: 93, 65438: 94, 65439: 95, 65377: 33, 65378: 34, 65379: 35, 65380: 36, 65381: 37, 65382: 38, 65383: 39, 65384: 40, 65385: 41, 65386: 42, 65387: 43, 65388: 44, 65389: 45, 65390: 46, 65391: 47, 65392: 48, 65393: 49, 65394: 50, 65395: 51, 65396: 52, 65397: 53, 65398: 54, 65399: 55, 65400: 56, 65401: 57, 65402: 58, 65403: 59, 65404: 60, 65405: 61, 65406: 62, 65407: 63, }
|
||||
|
||||
|
||||
# Bit confusing to explain what this bit is doing, so let me explain:
|
||||
# The JIS X 0213 variants of ISO-2022-JP should encode to JIS X 0213 before encoding to any
|
||||
# extension to JIS X 0208 (assuming they "should" encode to extensions at all). So we remove any
|
||||
# characters that are encoded to different locations in JIS X 0213.
|
||||
# Since NEC Row 13 is retained in JIS X 0213, but should be encoded in the JIS X 0213 state not
|
||||
# the JIS X 0208 state, it is also excluded.
|
||||
# This also removes certain Unicode characters that are mapped differently by Microsoft/WHATWG
|
||||
# versus by JIS X 0213, e.g. the fullwidth tilde, in the hope that the JIS X 0213 tables would
|
||||
# more dependably round trip.
|
||||
@lazy_property
|
||||
def encode_jis7_reduced():
|
||||
return xraydict(more_dbdata.encode_jis7, {}, [33537, 33634, 33663, 33735, 33864, 33972, 34012, 34131, 34137, 34224, 35061, 35100, 35346, 35383, 35449, 35495, 35518, 35551, 35574, 35711, 36080, 36084, 36114, 20008, 20193, 20224, 20227, 20310, 20362, 20370, 20372, 20378, 20425, 20544, 20514, 20510, 20550, 20546, 20592, 20628, 37086, 37141, 37159, 20810, 20893, 37335, 37338, 37357, 37358, 37348, 37349, 37386, 37392, 21013, 37434, 37436, 37440, 37433, 37454, 37457, 37465, 37479, 37496, 37512, 21158, 37543, 21167, 37584, 37587, 37591, 21211, 37593, 37600, 37607, 37625, 21248, 37627, 21255, 37631, 37634, 37662, 37661, 21284, 37669, 37665, 37704, 37719, 37744, 21395, 21426, 37830, 37854, 37957, 21642, 21660, 21673, 21759, 21894, 38557, 38575, 38707, 38715, 38733, 38735, 38741, 22444, 22472, 22471, 38999, 39013, 22686, 22795, 22875, 22877, 39326, 22948, 39502, 39644, 23382, 39794, 39797, 39823, 39857, 23488, 23512, 23532, 39936, 23582, 23718, 23738, 23847, 23874, 23891, 40299, 23917, 40304, 23992, 23993, 40473, 40657, 24372, 24389, 24423, 24503, 24542, 24714, 24789, 24818, 8470, 8481, 24880, 24887, 8544, 8545, 8546, 8547, 8548, 8549, 8550, 8551, 8552, 8553, 8560, 8561, 8562, 8563, 8564, 8565, 8566, 8567, 8568, 8569, 24984, 8721, 8730, 8735, 8736, 8741, 8745, 8746, 8747, 8750, 8757, 8786, 8801, 8869, 25254, 8895, 25589, 9312, 9313, 9314, 9315, 9316, 9317, 9318, 9319, 9320, 9321, 9322, 9323, 9324, 9325, 9326, 9327, 9328, 9329, 9330, 9331, 25696, 25757, 25806, 26112, 26121, 26133, 26142, 26148, 26161, 26199, 26201, 26213, 26227, 26265, 26272, 26290, 26303, 26362, 26363, 26470, 26555, 26560, 26625, 26692, 26706, 26824, 26831, 26984, 27032, 27106, 27184, 27206, 27243, 27251, 27262, 27364, 27606, 27711, 27740, 27782, 27866, 27908, 28039, 28076, 28111, 28156, 28199, 28220, 28252, 28351, 28552, 28597, 28661, 12317, 12319, 28677, 28679, 28712, 28859, 28805, 28843, 28943, 28932, 28998, 28999, 29020, 29121, 29182, 12849, 12850, 12857, 12964, 12965, 12966, 12967, 12968, 29361, 29374, 13059, 13069, 13076, 13080, 13090, 13091, 13094, 13095, 13099, 13110, 13115, 13129, 13130, 13133, 13137, 13143, 29559, 13179, 13180, 13181, 13182, 13198, 13199, 13212, 13213, 13214, 13217, 13252, 29641, 13261, 29654, 29667, 29703, 29734, 29738, 29742, 29794, 29833, 29855, 29953, 29999, 30063, 30363, 30364, 30366, 30374, 30534, 30753, 30798, 30820, 63785, 31024, 31124, 31131, 63964, 64015, 64016, 64017, 64019, 64020, 64021, 64022, 64025, 64026, 64027, 64031, 64032, 64033, 64034, 64036, 64038, 31441, 31463, 31467, 31646, 32072, 32092, 32160, 32183, 32214, 32338, 32394, 65282, 65287, 65293, 65374, 32583, 65508])
|
||||
|
||||
|
||||
@lazy_property
|
||||
def decode_jis78():
|
||||
return xraydict(more_dbdata.decode_jis7, {(116, 33): 23597, (48, 34): 21854, (116, 34): 27097, (116, 35): 36965, (116, 36): 29814, (44, 36): 9472, (44, 37): 9473, (44, 38): 9474, (44, 39): 9475, (44, 40): 9476, (44, 41): 9477, (44, 42): 9478, (44, 43): 9479, (72, 46): 28497, (44, 44): 9480, (72, 48): 37297, (44, 45): 9481, (44, 50): 9486, (48, 51): 39994, (56, 52): 40572, (44, 46): 9482, (44, 51): 9487, (44, 47): 9483, (44, 48): 9484, (44, 49): 9485, (44, 58): 9494, (44, 59): 9495, (44, 52): 9488, (44, 53): 9489, (44, 62): 9498, (44, 63): 9499, (44, 64): 9500, (44, 65): 9501, (44, 66): 9502, (52, 67): 28748, (44, 67): 9503, (44, 68): 9504, (100, 70): 31725, (44, 69): 9505, (60, 72): 23650, (60, 73): 34306, (44, 54): 9490, (44, 55): 9491, (44, 56): 9492, (76, 77): 40629, (108, 77): 36046, (68, 79): 25681, (44, 57): 9493, (44, 70): 9506, (52, 82): 35563, (44, 71): 9507, (44, 72): 9508, (44, 73): 9509, (80, 86): 20397, (112, 87): 38765, (44, 74): 9510, (44, 75): 9511, (44, 60): 9496, (68, 91): 22778, (44, 61): 9497, (44, 76): 9512, (44, 77): 9513, (44, 78): 9514, (44, 79): 9515, (44, 80): 9516, (44, 81): 9517, (44, 82): 9518, (84, 100): 22775, (44, 83): 9519, (44, 84): 9520, (44, 85): 9521, (44, 86): 9522, (44, 87): 9523, (44, 88): 9524, (44, 107): 9543, (44, 89): 9525, (44, 90): 9526, (44, 91): 9527, (44, 92): 9528, (44, 93): 9529, (44, 94): 9530, (44, 95): 9531, (44, 96): 9532, (112, 116): 38938, (44, 97): 9533, (96, 118): 29796, (44, 98): 9534, (44, 99): 9535, (76, 121): 34282, (44, 100): 9536, (44, 101): 9537, (44, 102): 9538, (44, 103): 9539, (44, 104): 9540, (44, 105): 9541, (44, 106): 9542, (44, 108): 9544, (44, 109): 9545, (44, 110): 9546, (44, 111): 9547, (41, 33): 33, (105, 34): 34122, (41, 34): 34, (41, 35): 35, (41, 36): 36, (41, 37): 37, (41, 38): 38, (65, 40): 36068, (41, 39): 39, (41, 40): 40, (61, 43): 32353, (41, 41): 41, (41, 42): 42, (105, 46): 34222, (41, 43): 43, (73, 48): 27292, (41, 44): 44, (41, 45): 45, (41, 46): 46, (41, 47): 47, (41, 48): 48, (69, 54): 22625, (41, 49): 49, (41, 50): 50, (41, 51): 51, (41, 52): 52, (41, 53): 53, (41, 54): 54, (41, 55): 55, (41, 56): 56, (69, 63): 39002, (41, 57): 57, (41, 58): 58, (41, 59): 59, (41, 60): 60, (41, 61): 61, (41, 62): 62, (41, 63): 63, (41, 64): 64, (41, 65): 65, (41, 66): 66, (41, 67): 67, (41, 68): 68, (41, 69): 69, (41, 70): 70, (41, 71): 71, (41, 72): 72, (41, 73): 73, (41, 74): 74, (41, 75): 75, (41, 76): 76, (41, 77): 77, (41, 78): 78, (41, 79): 79, (69, 87): 31018, (41, 80): 80, (41, 81): 81, (77, 90): 36953, (105, 90): 34510, (57, 92): 31014, (41, 82): 82, (41, 88): 88, (65, 95): 25620, (41, 83): 83, (41, 84): 84, (41, 85): 85, (41, 86): 86, (41, 87): 87, (41, 89): 89, (41, 90): 90, (41, 91): 91, (41, 92): 165, (65, 105): 30246, (77, 105): 33802, (49, 107): 28976, (41, 93): 93, (57, 109): 40628, (109, 110): 36841, (69, 110): 27310, (41, 94): 94, (41, 95): 95, (41, 96): 96, (69, 115): 28644, (41, 97): 97, (41, 98): 98, (41, 99): 99, (41, 100): 100, (69, 120): 31153, (89, 120): 25785, (41, 101): 101, (41, 102): 102, (41, 103): 103, (41, 104): 104, (41, 105): 105, (41, 106): 106, (41, 107): 107, (41, 108): 108, (41, 109): 109, (41, 110): 110, (41, 111): 111, (41, 112): 112, (41, 113): 113, (41, 114): 114, (41, 115): 115, (41, 116): 116, (41, 117): 117, (41, 118): 118, (41, 119): 119, (41, 120): 120, (41, 121): 121, (41, 122): 122, (41, 123): 123, (41, 124): 124, (41, 125): 125, (41, 126): 8254, (42, 33): 65377, (54, 34): 20448, (42, 34): 65378, (106, 36): 34687, (42, 35): 65379, (42, 36): 65380, (42, 37): 65381, (42, 38): 65382, (50, 41): 40367, (50, 42): 40407, (42, 39): 65383, (42, 40): 65384, (42, 41): 65385, (42, 42): 65386, (42, 43): 65387, (42, 44): 65388, (42, 45): 65389, (42, 46): 65390, (42, 47): 65391, (42, 48): 65392, (42, 49): 65393, (42, 50): 65394, (42, 51): 65395, (42, 52): 65396, (90, 57): 25890, (94, 57): 28059, (42, 53): 65397, (42, 54): 65398, (42, 55): 65399, (42, 56): 65400, (42, 57): 65401, (42, 58): 65402, (42, 59): 65403, (70, 66): 28678, (42, 60): 65404, (42, 61): 65405,
|
||||
|
@ -1,3 +1,6 @@
|
||||
"""
|
||||
Defines 8-bit mapping data for `codecs.dbextra`.
|
||||
"""
|
||||
from collections import xraydict
|
||||
from codecs.infrastructure import lazy_property
|
||||
from codecs.dbdata import more_dbdata, XEucJpIncrementalEncoder, XEucJpIncrementalDecoder, Windows31JIncrementalEncoder, Windows31JIncrementalDecoder, Big5EtenIncrementalEncoder, Big5HkscsIncrementalDecoder
|
||||
|
@ -1,11 +1,19 @@
|
||||
"""Underpinning infrastructure for the codecs module."""
|
||||
|
||||
from codecs.isweblabel import map_weblabel
|
||||
def idstr(obj):
|
||||
def _idstr(obj):
|
||||
let reprd = object.__repr__(obj)
|
||||
return reprd.split(" at 0x")[1].split(">")[0]
|
||||
|
||||
let _encoder_registry = {}
|
||||
let _decoder_registry = {}
|
||||
def register_kuroko_codec(labels, incremental_encoder_class, incremental_decoder_class):
|
||||
"""
|
||||
Register a given `IncrementalEncoder` subclass and a given `IncrementalDecoder` subclass
|
||||
with a given list of labels. Usually, this is expected to include the encoding name, along
|
||||
with a list labels for aliases and/or subsets of the encoding. Either coder class may be `None`,
|
||||
if the encoder/decoder labels are being registered asymmetrically.
|
||||
"""
|
||||
for label in labels:
|
||||
let norm = label.replace("_", "-").lower()
|
||||
if incremental_encoder_class:
|
||||
@ -28,15 +36,32 @@ def register_kuroko_codec(labels, incremental_encoder_class, incremental_decoder
|
||||
_decoder_registry[norm] = incremental_decoder_class
|
||||
|
||||
class KurokoCodecInfo:
|
||||
"""
|
||||
Descriptor for the registered encoder and decoder for a given label. Has five members:
|
||||
|
||||
- `name`: the label covered by this descriptor.
|
||||
- `encode`: encode a complete Unicode sequence.
|
||||
- `decode`: decode a complete byte sequence.
|
||||
- `incrementalencoder`: IncrementalEncoder subclass.
|
||||
- `incrementaldecoder`: IncrementalDecoder subclass.
|
||||
"""
|
||||
def __init__(label, encoder, decoder):
|
||||
self.name = label
|
||||
self.incrementalencoder = encoder
|
||||
self.incrementaldecoder = decoder
|
||||
def encode(string, errors="strict"):
|
||||
"""
|
||||
Encode a complete Unicode sequence to a complete byte string.
|
||||
Semantic of name passed to `errors=` is as documented for `lookup_error()`.
|
||||
"""
|
||||
if self.incrementalencoder:
|
||||
return self.incrementalencoder(errors).encode(string, True)
|
||||
raise ValueError(f"unrecognised encoding or decode-only encoding: {self.name!r}")
|
||||
def decode(data, errors="strict"):
|
||||
"""
|
||||
Decode a complete byte sequence to a complete Unicode stream.
|
||||
Semantic of name passed to `errors=` is as documented for `lookup_error()`.
|
||||
"""
|
||||
if self.incrementaldecoder:
|
||||
return self.incrementaldecoder(errors).decode(data, True)
|
||||
raise ValueError(f"unrecognised encoding or encode-only encoding: {self.name!r}")
|
||||
@ -66,9 +91,17 @@ class KurokoCodecInfo:
|
||||
ret += " (HTML5 " + repr(dec.html5name) + ")"
|
||||
else:
|
||||
ret += "; no decoder"
|
||||
return ret + "; at 0x" + idstr(self) + ">"
|
||||
return ret + "; at 0x" + _idstr(self) + ">"
|
||||
|
||||
def lookup(label, web=False):
|
||||
"""
|
||||
Obtain a `KurokoCodecInfo` for a given label. If `web=False` (the default), will always succeed,
|
||||
but the resulting `KurokoCodecInfo` might be unable to encode and/or unable to decode if the
|
||||
label is not recognised in that direction. If `web=True`, will raise KeyError if the label is
|
||||
not a WHATWG-permitted label, and will map certain labels to undefined per the WHATWG spec.
|
||||
|
||||
Can be simply accessed as `codecs.lookup`.
|
||||
"""
|
||||
let proclabel = label.lower()
|
||||
if web:
|
||||
proclabel = map_weblabel(label)
|
||||
@ -85,15 +118,34 @@ def lookup(label, web=False):
|
||||
return KurokoCodecInfo(proclabel, enc, dec)
|
||||
|
||||
def encode(string, label, web=False, errors="strict"):
|
||||
"""
|
||||
Encode a complete Unicode sequence to a complete byte string in the given encoding. Semantic
|
||||
of the web= argument is the same as with `lookup()`. Semantic of name passed to errors= is as
|
||||
documented for `lookup_error()`.
|
||||
|
||||
Can be simply accessed as `codecs.encode`.
|
||||
"""
|
||||
return lookup(label, web = web).encode(string, errors=errors)
|
||||
|
||||
def decode(data, label, web=False, errors="strict"):
|
||||
"""
|
||||
Decode a complete byte sequence in the given encoding to a complete Unicode stream. Semantic
|
||||
of the web= argument is the same as with `lookup()`. Semantic of name passed to errors= is as
|
||||
documented for `lookup_error()`.
|
||||
|
||||
Can be simply accessed as `codecs.decode`.
|
||||
"""
|
||||
return lookup(label, web = web).decode(data, errors=errors)
|
||||
|
||||
# Constructor is e.g. UnicodeEncodeError(encoding, object, start, end, reason)
|
||||
# Wouldn't it be wonderful if Python bloody documented that anywhere (e.g. manual or docstring)?
|
||||
# -- Har.
|
||||
class UnicodeError(ValueError):
|
||||
"""
|
||||
Exception raised when an error is encountered or detected in the process of encoding or
|
||||
decoding. May instead be passed to a handler when not in strict mode. Contains machine-readable
|
||||
information about the error encountered, allowing approaches to respond to it.
|
||||
"""
|
||||
def __init__(encoding, object, start, end, reason):
|
||||
self.encoding = encoding
|
||||
self.object = object
|
||||
@ -113,27 +165,66 @@ class UnicodeError(ValueError):
|
||||
return f"codec for {self.encoding!r} cannot process sequence {slice!r}: {self.reason}"
|
||||
|
||||
class UnicodeEncodeError(UnicodeError):
|
||||
"""
|
||||
UnicodeError subclass raised when an error is encountered in the process of encoding.
|
||||
"""
|
||||
class UnicodeDecodeError(UnicodeError):
|
||||
"""
|
||||
UnicodeError subclass raised when an error is encountered in the process of decoding.
|
||||
"""
|
||||
|
||||
let _error_registry = {}
|
||||
|
||||
def register_error(name, handler):
|
||||
"""
|
||||
Reister a new error handler. The handler should be a function taking a `UnicodeError` and
|
||||
either raising an exception or returning a tuple of (substitute, resume_index). The substitute
|
||||
should be bytes (usually expected to be in ASCII) for a `UnicodeEncodeError`, str otherwise.
|
||||
"""
|
||||
_error_registry[name] = handler
|
||||
|
||||
def lookup_error(name):
|
||||
"""
|
||||
Look up an error handler function registered with a certain name. By default, the following
|
||||
are registered. It is important to note that nothing obligates a codec to actually *use* the
|
||||
error handler if it is not deemed possible or appropriate, and so specifying a non-strict
|
||||
error handler will not guarantee an exception will not be raised, especially when working with
|
||||
a codec which is not a "normal" text encoding (e.g. `undefined` or `inverse-base64`).
|
||||
|
||||
- `strict`: raise an exception.
|
||||
- `ignore`: skip invalid substrings. Not always recommended: can facilitate masked injection.
|
||||
- `replace`: insert a replacement character (decoding) or question mark (encoding).
|
||||
- `warnreplace`: like `replace` but prints a message to stderr; good for debugging.
|
||||
- `backslashreplace`: replace with Python/Kuroko style Unicode escapes. Note that this only
|
||||
matches JavaScript escape syntax for Basic Multilingual Plane characters. Encoding only.
|
||||
- `xmlcharrefreplace`: replace with HTML/XML numerical entities. Note that this will, per
|
||||
WHATWG, never generate entities for Shift Out, Shift In and Escape (i.e. when encoding to a
|
||||
stateful encoding which uses them, e.g. ISO-2022-JP), instead generating an entity for the
|
||||
replacement character. Encoding only.
|
||||
"""
|
||||
return _error_registry[name]
|
||||
|
||||
def strict_errors(exc):
|
||||
"""
|
||||
Handler for `strict` errors: raise the exception.
|
||||
"""
|
||||
raise exc
|
||||
register_error("strict", strict_errors)
|
||||
|
||||
def ignore_errors(exc):
|
||||
"""
|
||||
Handler for `ignore` errors: skip invalid sequences.
|
||||
"""
|
||||
if isinstance(exc, UnicodeEncodeError):
|
||||
return (b"", exc.end)
|
||||
return ("", exc.end)
|
||||
register_error("ignore", ignore_errors)
|
||||
|
||||
def replace_errors(exc):
|
||||
"""
|
||||
Handler for `replace` errors: insert replacement character (if decoding) or
|
||||
question mark (if encoding).
|
||||
"""
|
||||
if isinstance(exc, UnicodeEncodeError):
|
||||
return (b"?", exc.end)
|
||||
else if isinstance(exc, UnicodeDecodeError):
|
||||
@ -143,6 +234,10 @@ def replace_errors(exc):
|
||||
register_error("replace", replace_errors)
|
||||
|
||||
def warnreplace_errors(exc):
|
||||
"""
|
||||
Handler for `warnreplace` errors: insert replacement character (if decoding) or question mark
|
||||
(if encoding) and print a warning to `stderr`.
|
||||
"""
|
||||
import fileio
|
||||
fileio.stderr.write(type(exc).__name__ + ": " + str(exc) + "\n")
|
||||
if isinstance(exc, UnicodeEncodeError):
|
||||
@ -154,6 +249,11 @@ def warnreplace_errors(exc):
|
||||
register_error("warnreplace", warnreplace_errors)
|
||||
|
||||
def backslashreplace_errors(exc):
|
||||
"""
|
||||
Handler for `backslashreplace` errors: replace unencodable character with Python/Kuroko style
|
||||
escape sequence. For Basic Multilingual Plane characters, this also matches JavaScript; beyond
|
||||
that, they differ.
|
||||
"""
|
||||
if isinstance(exc, UnicodeEncodeError):
|
||||
# Work around str.format not supporting format specifiers
|
||||
let myhex = hex(ord(exc.object[exc.start])).split("x", 1)[1]
|
||||
@ -170,6 +270,11 @@ def backslashreplace_errors(exc):
|
||||
register_error("backslashreplace", backslashreplace_errors)
|
||||
|
||||
def xmlcharrefreplace_errors(exc):
|
||||
"""
|
||||
Handler for `xmlcharrefreplace` errors: replace unencodable character with XML numeric entity
|
||||
for the character unless it is Shift Out, Shift In or Escape, in which case insert the XML
|
||||
numeric entity for the replacement character (as stipulated by WHATWG for ISO-2022-JP).
|
||||
"""
|
||||
if isinstance(exc, UnicodeEncodeError):
|
||||
let codepoint = ord(exc.object[exc.start])
|
||||
# Per WHATWG (specified in its ISO-2022-JP encoder, the only one that
|
||||
@ -181,6 +286,10 @@ def xmlcharrefreplace_errors(exc):
|
||||
register_error("xmlcharrefreplace", xmlcharrefreplace_errors)
|
||||
|
||||
class ByteCatenator:
|
||||
"""
|
||||
Helper class for maintaining a stream to which `bytes` objects will be repeatedly catenated
|
||||
in place.
|
||||
"""
|
||||
def __init__():
|
||||
self.list = []
|
||||
def add(data):
|
||||
@ -189,6 +298,10 @@ class ByteCatenator:
|
||||
return b"".join(self.list)
|
||||
|
||||
class StringCatenator:
|
||||
"""
|
||||
Helper class for maintaining a stream to which `str` objects will be repeatedly catenated
|
||||
in place.
|
||||
"""
|
||||
def __init__():
|
||||
self.list = []
|
||||
def add(string):
|
||||
@ -197,6 +310,13 @@ class StringCatenator:
|
||||
return "".join(self.list)
|
||||
|
||||
class IncrementalEncoder:
|
||||
"""
|
||||
Incremental encoder, allowing more encoded data to be generated as more Unicode data is
|
||||
obtained. Note that the return values from `encode` are not guaranteed to encompass all data
|
||||
which has been passed in, until it is called with `final=True`.
|
||||
|
||||
This is the base class and should not be instantiated directly.
|
||||
"""
|
||||
name = None
|
||||
html5name = None
|
||||
def __init__(errors):
|
||||
@ -207,15 +327,41 @@ class IncrementalEncoder:
|
||||
let w = "(non-HTML5)"
|
||||
if self.html5name:
|
||||
w = f"(HTML5 {self.html5name!r})"
|
||||
let addr = idstr(self)
|
||||
let addr = _idstr(self)
|
||||
return f"<{c.__name__} instance: encoder for {self.name!r} {w} at 0x{addr}>"
|
||||
def encode(string, final = False):
|
||||
"""
|
||||
Passes the given string in to the encoder, and returns a sequence of bytes. When
|
||||
final=False, the return value might not represent the entire input (some of which may
|
||||
become represented at the start of the value returned by the next call). When final=True,
|
||||
all of the input will be represented, and any final state change sequence required by the
|
||||
encoding will be outputted.
|
||||
"""
|
||||
raise NotImplementedError("must be implemented by subclass")
|
||||
def reset():
|
||||
"""
|
||||
Reset encoder to initial state, without outputting, discarding any pending data.
|
||||
"""
|
||||
pass
|
||||
def getstate():
|
||||
"""
|
||||
Returns an arbitrary object encapsulating encoder state.
|
||||
"""
|
||||
pass
|
||||
def setstate(state):
|
||||
"""
|
||||
Sets encoder state to one previously returned by getstate().
|
||||
"""
|
||||
pass
|
||||
|
||||
class IncrementalDecoder:
|
||||
"""
|
||||
Incremental decoder, allowing more Unicode data to be generated as more encoded data is
|
||||
obtained. Note that the return values from `decode` are not guaranteed to encompass all data
|
||||
which has been passed in, until it is called with `final=True`.
|
||||
|
||||
This is the base class and should not be instantiated directly.
|
||||
"""
|
||||
name = None
|
||||
html5name = None
|
||||
def __init__(errors):
|
||||
@ -226,11 +372,20 @@ class IncrementalDecoder:
|
||||
let w = "(non-HTML5)"
|
||||
if self.html5name:
|
||||
w = f"(HTML5 {self.html5name!r})"
|
||||
let addr = idstr(self)
|
||||
let addr = _idstr(self)
|
||||
return f"<{c.__name__} instance: decoder for {self.name!r} {w} at 0x{addr}>"
|
||||
def decode(data_in, final = False):
|
||||
"""
|
||||
Passes the given bytes in to the encoder, and returns a Unicode string. When
|
||||
final=False, the return value might not represent the entire input (some of which may
|
||||
become represented at the start of the value returned by the next call). When final=True,
|
||||
all of the input will be represented, and an error will be generated if it is truncated.
|
||||
"""
|
||||
raise NotImplementedError("must be implemented by subclass")
|
||||
def _handle_truncation(out, unused, final, data, offset, leader):
|
||||
"""
|
||||
Helper function used by subclasses to handle any pending data when returning from `decode`.
|
||||
"""
|
||||
if len(leader) == 0:
|
||||
return out.getvalue()
|
||||
else if final:
|
||||
@ -242,13 +397,30 @@ class IncrementalDecoder:
|
||||
self.pending = bytes(leader)
|
||||
return out.getvalue()
|
||||
def reset():
|
||||
"""
|
||||
Reset decoder to initial state, without outputting, discarding any pending data.
|
||||
"""
|
||||
self.pending = b""
|
||||
def getstate():
|
||||
"""
|
||||
Returns an arbitrary object encapsulating decoder state.
|
||||
"""
|
||||
return self.pending
|
||||
def setstate(state):
|
||||
"""
|
||||
Sets decoder state to one previously returned by getstate().
|
||||
"""
|
||||
self.pending = state
|
||||
|
||||
class AsciiIncrementalEncoder(IncrementalEncoder):
|
||||
"""
|
||||
Encoder for ISO/IEC 4873-DV, and base class for simple _sensu lato_ extended ASCII encoders.
|
||||
Encoders for more complex cases, such as ISO-2022-JP, do not inherit from this class.
|
||||
|
||||
ISO/IEC 4873-DV is, as of the current (third) edition of ISO/IEC 4873, the same as what
|
||||
people usually mean when they say "ASCII" (i.e. an eighth bit exists but is never used, and
|
||||
backspace composition is not a thing which exists for encoding characters).
|
||||
"""
|
||||
# The obvious labels for ASCII are all Windows-1252 per WHATWG. Also, what people call
|
||||
# "ASCII" in 8-bit-byte contexts (without backspace combining) is properly ISO-4873-DV.
|
||||
name = "ecma-43-dv"
|
||||
@ -266,6 +438,7 @@ class AsciiIncrementalEncoder(IncrementalEncoder):
|
||||
if isinstance(i, tuple):
|
||||
self._lead_codes.setdefault(i[0], []).append(i)
|
||||
def encode(string_in, final = False):
|
||||
"""Implements `IncrementalEncoder.encode`"""
|
||||
let string = self.pending_lead + string_in
|
||||
self.pending_lead = ""
|
||||
let out = ByteCatenator()
|
||||
@ -313,13 +486,24 @@ class AsciiIncrementalEncoder(IncrementalEncoder):
|
||||
if offset < 0:
|
||||
offset += len(string)
|
||||
def reset():
|
||||
"""Implements `IncrementalEncoder.reset`"""
|
||||
self.pending_lead = ""
|
||||
def getstate():
|
||||
"""Implements `IncrementalEncoder.getstate`"""
|
||||
return self.pending_lead
|
||||
def setstate(state):
|
||||
"""Implements `IncrementalEncoder.setstate`"""
|
||||
self.pending_lead = state
|
||||
|
||||
class AsciiIncrementalDecoder(IncrementalDecoder):
|
||||
"""
|
||||
Decoder for ISO/IEC 4873-DV, and base class for simple _sensu lato_ extended ASCII decoders.
|
||||
Decoders for more complex cases, such as ISO-2022-JP, do not inherit from this class.
|
||||
|
||||
ISO/IEC 4873-DV is, as of the current (third) edition of ISO/IEC 4873, the same as what
|
||||
people usually mean when they say "ASCII" (i.e. an eighth bit exists but is never used, and
|
||||
backspace composition is not a thing which exists for encoding characters).
|
||||
"""
|
||||
name = "ecma-43-dv"
|
||||
html5name = None
|
||||
# For non-ASCII characters (this should work as a base class)
|
||||
@ -329,6 +513,7 @@ class AsciiIncrementalDecoder(IncrementalDecoder):
|
||||
trailrange = ()
|
||||
ascii_exceptions = ()
|
||||
def decode(data_in, final = False):
|
||||
"""Implements `IncrementalDecoder.decode`"""
|
||||
let data = self.pending + data_in
|
||||
self.pending = b""
|
||||
let out = StringCatenator()
|
||||
@ -392,13 +577,19 @@ class AsciiIncrementalDecoder(IncrementalDecoder):
|
||||
out.add(errorret[0])
|
||||
offset = errorret[1]
|
||||
if offset < 0:
|
||||
offset += len(string)
|
||||
offset += len(data)
|
||||
|
||||
register_kuroko_codec(["ecma-43-dv", "iso-4873-dv", "646", "cp367", "ibm367", "iso646-us",
|
||||
"iso-646.irv-1991", "iso-ir-6", "us", "csascii"],
|
||||
AsciiIncrementalEncoder, AsciiIncrementalDecoder)
|
||||
|
||||
class BaseEbcdicIncrementalEncoder(IncrementalEncoder):
|
||||
"""
|
||||
Base class for EBCDIC encoders.
|
||||
|
||||
On its own, it is only capable of encoding `U+3000` (as ``x'0E', x'40', x'40', x'0F'``); hence,
|
||||
it should not, generally speaking, be used directly.
|
||||
"""
|
||||
name = None
|
||||
html5name = None
|
||||
sbcs_encode = {}
|
||||
@ -407,6 +598,7 @@ class BaseEbcdicIncrementalEncoder(IncrementalEncoder):
|
||||
shift_to_dbcs = 0x0E
|
||||
shift_to_sbcs = 0x0F
|
||||
def encode(string, final = False):
|
||||
"""Implements `IncrementalEncoder.encode`"""
|
||||
let out = ByteCatenator()
|
||||
let offset = 0
|
||||
while 1: # offset can be arbitrarily changed by the error handler, so not a for
|
||||
@ -450,13 +642,22 @@ class BaseEbcdicIncrementalEncoder(IncrementalEncoder):
|
||||
if offset < 0:
|
||||
offset += len(string)
|
||||
def reset():
|
||||
"""Implements `IncrementalEncoder.reset`"""
|
||||
self.in_dbcshost = False
|
||||
def getstate():
|
||||
"""Implements `IncrementalEncoder.getstate`"""
|
||||
return self.in_dbcshost
|
||||
def setstate(state):
|
||||
"""Implements `IncrementalEncoder.setstate`"""
|
||||
self.in_dbcshost = state
|
||||
|
||||
class BaseEbcdicIncrementalDecoder(IncrementalDecoder):
|
||||
"""
|
||||
Base class for EBCDIC decoders.
|
||||
|
||||
On its own, it is only capable of decoding `U+3000` (from ``x'0E', x'40', x'40', x'0F'``); hence,
|
||||
it should not, generally speaking, be used directly.
|
||||
"""
|
||||
name = None
|
||||
html5name = None
|
||||
sbcs_decode = {}
|
||||
@ -465,6 +666,7 @@ class BaseEbcdicIncrementalDecoder(IncrementalDecoder):
|
||||
shift_to_dbcs = 0x0E
|
||||
shift_to_sbcs = 0x0F
|
||||
def decode(data_in, final = False):
|
||||
"""Implements `IncrementalDecoder.decode`"""
|
||||
let data = self.pending + data_in
|
||||
self.pending = b""
|
||||
let out = StringCatenator()
|
||||
@ -537,17 +739,24 @@ class BaseEbcdicIncrementalDecoder(IncrementalDecoder):
|
||||
out.add(errorret[0])
|
||||
offset = errorret[1]
|
||||
if offset < 0:
|
||||
offset += len(string)
|
||||
offset += len(data)
|
||||
def reset():
|
||||
"""Implements `IncrementalDecoder.reset`"""
|
||||
self.pending = b""
|
||||
self.in_dbcshost = False
|
||||
def getstate():
|
||||
"""Implements `IncrementalDecoder.getstate`"""
|
||||
return (self.pending, self.in_dbcshost)
|
||||
def setstate(state):
|
||||
"""Implements `IncrementalDecoder.setstate`"""
|
||||
self.pending = state[0]
|
||||
self.in_dbcshost = state[1]
|
||||
|
||||
class UndefinedIncrementalEncoder(IncrementalEncoder):
|
||||
"""
|
||||
Encoder which errors out on all input. For use on input for which encoding should not be
|
||||
attempted. Error handler is ignored.
|
||||
"""
|
||||
name = "undefined"
|
||||
html5name = "replacement"
|
||||
# WHATWG doesn't specify an encoder for "replacement" so follow Python "undefined" here.
|
||||
@ -558,6 +767,10 @@ class UndefinedIncrementalEncoder(IncrementalEncoder):
|
||||
strict_errors(error)
|
||||
|
||||
class UndefinedIncrementalDecoder(IncrementalDecoder):
|
||||
"""
|
||||
Decoder which errors out on all input. For use on input for which decoding should not be
|
||||
attempted. Error handler is honoured, and called once per non-empty `decode` method call.
|
||||
"""
|
||||
name = "undefined"
|
||||
html5name = "replacement"
|
||||
def decode(data, final = False):
|
||||
@ -574,6 +787,10 @@ register_kuroko_codec(
|
||||
|
||||
|
||||
def lazy_property(method):
|
||||
"""
|
||||
Like property(…), but memoises the value returned. The return value is assumed to be
|
||||
constant at the class level, i.e. the same for all instances.
|
||||
"""
|
||||
let memo = None
|
||||
def retriever(this):
|
||||
if memo == None:
|
||||
@ -583,6 +800,9 @@ def lazy_property(method):
|
||||
|
||||
|
||||
class encodesto7bit:
|
||||
"""
|
||||
Encoding map for a 7-bit set, wrapping an encoding map for an 8-bit EUC or EUC-superset encoding.
|
||||
"""
|
||||
def __init__(base):
|
||||
self.base = base
|
||||
def __contains__(key):
|
||||
@ -613,6 +833,9 @@ class encodesto7bit:
|
||||
|
||||
|
||||
class decodesto7bit:
|
||||
"""
|
||||
Decoding map for a 7-bit set, wrapping an decoding map for an 8-bit EUC or EUC-superset encoding.
|
||||
"""
|
||||
def __init__(base):
|
||||
self.base = base
|
||||
def __contains__(key):
|
||||
|
81
modules/codecs/pifonts.krk
Normal file
81
modules/codecs/pifonts.krk
Normal file
@ -0,0 +1,81 @@
|
||||
"""
|
||||
This module includes codecs implementing special handling for symbol fonts.
|
||||
"""
|
||||
|
||||
from codecs.infrastructure import register_kuroko_codec, ByteCatenator, StringCatenator, UnicodeEncodeError, UnicodeDecodeError, lookup_error, lookup, IncrementalEncoder, IncrementalDecoder, lazy_property
|
||||
from collections import xraydict
|
||||
|
||||
class Cp042IncrementalEncoder(IncrementalEncoder):
|
||||
"""
|
||||
Encoder for Windows code page 42 (GDI Symbol), and base class for symbol font encoders.
|
||||
|
||||
This maps characters to PUA with the low 8 bits matching the original byte encoding, similarly
|
||||
to `x-user-defined`, but using a different PUA range and including all non-C0 bytes, not
|
||||
only non-ASCII bytes.
|
||||
"""
|
||||
name = "cp042"
|
||||
html5name = None
|
||||
encoding_map = {}
|
||||
def encode(string, final = False):
|
||||
"""Implements `IncrementalEncoder.encode`"""
|
||||
let out = ByteCatenator()
|
||||
let offset = 0
|
||||
while 1: # offset can be arbitrarily changed by the error handler, so not a for
|
||||
if offset >= len(string):
|
||||
return out.getvalue()
|
||||
let i = string[offset]
|
||||
if ord(i) in self.encoding_map:
|
||||
let target = self.encoding_map[ord(i)]
|
||||
out.add(bytes([target]))
|
||||
offset += 1
|
||||
else if ord(i) < 0x100:
|
||||
# U+0020 thru U+00FF are accepted by GDI itself, but not by Code page 42
|
||||
# as implemented by Microsoft, which has caused problems:
|
||||
# http://archives.miloush.net/michkap/archive/2005/11/08/490495.html
|
||||
out.add(bytes([ord(i)]))
|
||||
offset += 1
|
||||
else if (0xF020 <= ord(i)) and (ord(i) < 0xF100):
|
||||
out.add(bytes([ord(i) - 0xF000]))
|
||||
offset += 1
|
||||
else if (0xF780 <= ord(i)) and (ord(i) < 0xF800):
|
||||
# Accept (not generate) the x-user-defined range as well, because why not?
|
||||
out.add(bytes([ord(i) - 0xF700]))
|
||||
offset += 1
|
||||
else:
|
||||
let error = UnicodeEncodeError(self.name, string, offset, offset + 1,
|
||||
"character not supported by target encoding")
|
||||
let errorret = lookup_error(self.errors)(error)
|
||||
out.add(errorret[0])
|
||||
offset = errorret[1]
|
||||
if offset < 0:
|
||||
offset += len(string)
|
||||
|
||||
class Cp042IncrementalDecoder(IncrementalDecoder):
|
||||
"""
|
||||
Decoder for Windows code page 42 (GDI Symbol), and base class for symbol font decoders.
|
||||
|
||||
This maps characters to PUA with the low 8 bits matching the original byte encoding, similarly
|
||||
to `x-user-defined`, but using a different PUA range and including all non-C0 bytes, not
|
||||
only non-ASCII bytes.
|
||||
"""
|
||||
name = "cp042"
|
||||
html5name = None
|
||||
decoding_map = {}
|
||||
def decode(data, final = False):
|
||||
"""Implements `IncrementalDecoder.decode`"""
|
||||
self.pending = b""
|
||||
let out = StringCatenator()
|
||||
let offset = 0
|
||||
for i in data:
|
||||
if i in self.decoding_map:
|
||||
out.add(chr(self.decoding_map[i]))
|
||||
else if i < 0x20:
|
||||
out.add(chr(i))
|
||||
else:
|
||||
out.add(chr(i + 0xF000))
|
||||
return out.getvalue()
|
||||
|
||||
register_kuroko_codec(["cp042"], Cp042IncrementalEncoder, Cp042IncrementalDecoder)
|
||||
|
||||
|
||||
|
@ -1,11 +1,18 @@
|
||||
"""
|
||||
This module includes some additional single-byte encodings not specified by WHATWG. As such, none
|
||||
of the codecs in this module should be used in HTML.
|
||||
This module includes some additional single-byte encodings not specified by WHATWG.
|
||||
|
||||
As such, none of the codecs in this module should be used in HTML.
|
||||
"""
|
||||
|
||||
from codecs.infrastructure import AsciiIncrementalEncoder, AsciiIncrementalDecoder, register_kuroko_codec, BaseEbcdicIncrementalEncoder, BaseEbcdicIncrementalDecoder, lazy_property
|
||||
|
||||
class Cp037IncrementalEncoder(BaseEbcdicIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for EBCDIC-037.
|
||||
|
||||
This is what might be considered the "default" EBCDIC set, and is used in the United States,
|
||||
the Netherlands, Portugal, Brazil, Australia and New Zealand, and on the ESA/390 in Canada.
|
||||
"""
|
||||
name = 'cp037'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -13,6 +20,12 @@ class Cp037IncrementalEncoder(BaseEbcdicIncrementalEncoder):
|
||||
return {0: 0, 1: 1, 2: 2, 3: 3, 156: 4, 9: 5, 134: 6, 127: 7, 151: 8, 141: 9, 142: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16, 17: 17, 18: 18, 19: 19, 157: 20, 133: 21, 8: 22, 135: 23, 24: 24, 25: 25, 146: 26, 143: 27, 28: 28, 29: 29, 30: 30, 31: 31, 128: 32, 129: 33, 130: 34, 131: 35, 132: 36, 10: 37, 23: 38, 27: 39, 136: 40, 137: 41, 138: 42, 139: 43, 140: 44, 5: 45, 6: 46, 7: 47, 144: 48, 145: 49, 22: 50, 147: 51, 148: 52, 149: 53, 150: 54, 4: 55, 152: 56, 153: 57, 154: 58, 155: 59, 20: 60, 21: 61, 158: 62, 26: 63, 32: 64, 160: 65, 226: 66, 228: 67, 224: 68, 225: 69, 227: 70, 229: 71, 231: 72, 241: 73, 162: 74, 46: 75, 60: 76, 40: 77, 43: 78, 124: 79, 38: 80, 233: 81, 234: 82, 235: 83, 232: 84, 237: 85, 238: 86, 239: 87, 236: 88, 223: 89, 33: 90, 36: 91, 42: 92, 41: 93, 59: 94, 172: 95, 45: 96, 47: 97, 194: 98, 196: 99, 192: 100, 193: 101, 195: 102, 197: 103, 199: 104, 209: 105, 166: 106, 44: 107, 37: 108, 95: 109, 62: 110, 63: 111, 248: 112, 201: 113, 202: 114, 203: 115, 200: 116, 205: 117, 206: 118, 207: 119, 204: 120, 96: 121, 58: 122, 35: 123, 64: 124, 39: 125, 61: 126, 34: 127, 216: 128, 97: 129, 98: 130, 99: 131, 100: 132, 101: 133, 102: 134, 103: 135, 104: 136, 105: 137, 171: 138, 187: 139, 240: 140, 253: 141, 254: 142, 177: 143, 176: 144, 106: 145, 107: 146, 108: 147, 109: 148, 110: 149, 111: 150, 112: 151, 113: 152, 114: 153, 170: 154, 186: 155, 230: 156, 184: 157, 198: 158, 164: 159, 181: 160, 126: 161, 115: 162, 116: 163, 117: 164, 118: 165, 119: 166, 120: 167, 121: 168, 122: 169, 161: 170, 191: 171, 208: 172, 221: 173, 222: 174, 174: 175, 94: 176, 163: 177, 165: 178, 183: 179, 169: 180, 167: 181, 182: 182, 188: 183, 189: 184, 190: 185, 91: 186, 93: 187, 175: 188, 168: 189, 180: 190, 215: 191, 123: 192, 65: 193, 66: 194, 67: 195, 68: 196, 69: 197, 70: 198, 71: 199, 72: 200, 73: 201, 173: 202, 244: 203, 246: 204, 242: 205, 243: 206, 245: 207, 125: 208, 74: 209, 75: 210, 76: 211, 77: 212, 78: 213, 79: 214, 80: 215, 81: 216, 82: 217, 185: 218, 251: 219, 252: 220, 249: 221, 250: 222, 255: 223, 92: 224, 247: 225, 83: 226, 84: 227, 85: 228, 86: 229, 87: 230, 88: 231, 89: 232, 90: 233, 178: 234, 212: 235, 214: 236, 210: 237, 211: 238, 213: 239, 48: 240, 49: 241, 50: 242, 51: 243, 52: 244, 53: 245, 54: 246, 55: 247, 56: 248, 57: 249, 179: 250, 219: 251, 220: 252, 217: 253, 218: 254, 159: 255}
|
||||
|
||||
class Cp037IncrementalDecoder(BaseEbcdicIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for EBCDIC-037.
|
||||
|
||||
This is what might be considered the "default" EBCDIC set, and is used in the United States,
|
||||
the Netherlands, Portugal, Brazil, Australia and New Zealand, and on the ESA/390 in Canada.
|
||||
"""
|
||||
name = 'cp037'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -23,6 +36,9 @@ register_kuroko_codec(['cp037', '037', 'csibm037', 'ebcdic-cp-ca', 'ebcdic-cp-nl
|
||||
|
||||
|
||||
class Cp273IncrementalEncoder(BaseEbcdicIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for EBCDIC-273 (used in German-speaking locales).
|
||||
"""
|
||||
name = 'cp273'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -30,6 +46,9 @@ class Cp273IncrementalEncoder(BaseEbcdicIncrementalEncoder):
|
||||
return {0: 0, 1: 1, 2: 2, 3: 3, 156: 4, 9: 5, 134: 6, 127: 7, 151: 8, 141: 9, 142: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16, 17: 17, 18: 18, 19: 19, 157: 20, 133: 21, 8: 22, 135: 23, 24: 24, 25: 25, 146: 26, 143: 27, 28: 28, 29: 29, 30: 30, 31: 31, 128: 32, 129: 33, 130: 34, 131: 35, 132: 36, 10: 37, 23: 38, 27: 39, 136: 40, 137: 41, 138: 42, 139: 43, 140: 44, 5: 45, 6: 46, 7: 47, 144: 48, 145: 49, 22: 50, 147: 51, 148: 52, 149: 53, 150: 54, 4: 55, 152: 56, 153: 57, 154: 58, 155: 59, 20: 60, 21: 61, 158: 62, 26: 63, 32: 64, 160: 65, 226: 66, 123: 67, 224: 68, 225: 69, 227: 70, 229: 71, 231: 72, 241: 73, 196: 74, 46: 75, 60: 76, 40: 77, 43: 78, 33: 79, 38: 80, 233: 81, 234: 82, 235: 83, 232: 84, 237: 85, 238: 86, 239: 87, 236: 88, 126: 89, 220: 90, 36: 91, 42: 92, 41: 93, 59: 94, 94: 95, 45: 96, 47: 97, 194: 98, 91: 99, 192: 100, 193: 101, 195: 102, 197: 103, 199: 104, 209: 105, 246: 106, 44: 107, 37: 108, 95: 109, 62: 110, 63: 111, 248: 112, 201: 113, 202: 114, 203: 115, 200: 116, 205: 117, 206: 118, 207: 119, 204: 120, 96: 121, 58: 122, 35: 123, 167: 124, 39: 125, 61: 126, 34: 127, 216: 128, 97: 129, 98: 130, 99: 131, 100: 132, 101: 133, 102: 134, 103: 135, 104: 136, 105: 137, 171: 138, 187: 139, 240: 140, 253: 141, 254: 142, 177: 143, 176: 144, 106: 145, 107: 146, 108: 147, 109: 148, 110: 149, 111: 150, 112: 151, 113: 152, 114: 153, 170: 154, 186: 155, 230: 156, 184: 157, 198: 158, 164: 159, 181: 160, 223: 161, 115: 162, 116: 163, 117: 164, 118: 165, 119: 166, 120: 167, 121: 168, 122: 169, 161: 170, 191: 171, 208: 172, 221: 173, 222: 174, 174: 175, 162: 176, 163: 177, 165: 178, 183: 179, 169: 180, 64: 181, 182: 182, 188: 183, 189: 184, 190: 185, 172: 186, 124: 187, 8254: 188, 168: 189, 180: 190, 215: 191, 228: 192, 65: 193, 66: 194, 67: 195, 68: 196, 69: 197, 70: 198, 71: 199, 72: 200, 73: 201, 173: 202, 244: 203, 166: 204, 242: 205, 243: 206, 245: 207, 252: 208, 74: 209, 75: 210, 76: 211, 77: 212, 78: 213, 79: 214, 80: 215, 81: 216, 82: 217, 185: 218, 251: 219, 125: 220, 249: 221, 250: 222, 255: 223, 214: 224, 247: 225, 83: 226, 84: 227, 85: 228, 86: 229, 87: 230, 88: 231, 89: 232, 90: 233, 178: 234, 212: 235, 92: 236, 210: 237, 211: 238, 213: 239, 48: 240, 49: 241, 50: 242, 51: 243, 52: 244, 53: 245, 54: 246, 55: 247, 56: 248, 57: 249, 179: 250, 219: 251, 93: 252, 217: 253, 218: 254, 159: 255}
|
||||
|
||||
class Cp273IncrementalDecoder(BaseEbcdicIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for EBCDIC-273 (used in German-speaking locales).
|
||||
"""
|
||||
name = 'cp273'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -40,6 +59,9 @@ register_kuroko_codec(['cp273', '273', 'ibm273', 'csibm273'], Cp273IncrementalEn
|
||||
|
||||
|
||||
class Cp424IncrementalEncoder(BaseEbcdicIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for EBCDIC-273 (used in Hebrew-speaking locales).
|
||||
"""
|
||||
name = 'cp424'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -47,6 +69,9 @@ class Cp424IncrementalEncoder(BaseEbcdicIncrementalEncoder):
|
||||
return {0: 0, 1: 1, 2: 2, 3: 3, 156: 4, 9: 5, 134: 6, 127: 7, 151: 8, 141: 9, 142: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16, 17: 17, 18: 18, 19: 19, 157: 20, 133: 21, 8: 22, 135: 23, 24: 24, 25: 25, 146: 26, 143: 27, 28: 28, 29: 29, 30: 30, 31: 31, 128: 32, 129: 33, 130: 34, 131: 35, 132: 36, 10: 37, 23: 38, 27: 39, 136: 40, 137: 41, 138: 42, 139: 43, 140: 44, 5: 45, 6: 46, 7: 47, 144: 48, 145: 49, 22: 50, 147: 51, 148: 52, 149: 53, 150: 54, 4: 55, 152: 56, 153: 57, 154: 58, 155: 59, 20: 60, 21: 61, 158: 62, 26: 63, 32: 64, 1488: 65, 1489: 66, 1490: 67, 1491: 68, 1492: 69, 1493: 70, 1494: 71, 1495: 72, 1496: 73, 162: 74, 46: 75, 60: 76, 40: 77, 43: 78, 124: 79, 38: 80, 1497: 81, 1498: 82, 1499: 83, 1500: 84, 1501: 85, 1502: 86, 1503: 87, 1504: 88, 1505: 89, 33: 90, 36: 91, 42: 92, 41: 93, 59: 94, 172: 95, 45: 96, 47: 97, 1506: 98, 1507: 99, 1508: 100, 1509: 101, 1510: 102, 1511: 103, 1512: 104, 1513: 105, 166: 106, 44: 107, 37: 108, 95: 109, 62: 110, 63: 111, 1514: 113, 160: 116, 8215: 120, 96: 121, 58: 122, 35: 123, 64: 124, 39: 125, 61: 126, 34: 127, 97: 129, 98: 130, 99: 131, 100: 132, 101: 133, 102: 134, 103: 135, 104: 136, 105: 137, 171: 138, 187: 139, 177: 143, 176: 144, 106: 145, 107: 146, 108: 147, 109: 148, 110: 149, 111: 150, 112: 151, 113: 152, 114: 153, 184: 157, 164: 159, 181: 160, 126: 161, 115: 162, 116: 163, 117: 164, 118: 165, 119: 166, 120: 167, 121: 168, 122: 169, 174: 175, 94: 176, 163: 177, 165: 178, 183: 179, 169: 180, 167: 181, 182: 182, 188: 183, 189: 184, 190: 185, 91: 186, 93: 187, 175: 188, 168: 189, 180: 190, 215: 191, 123: 192, 65: 193, 66: 194, 67: 195, 68: 196, 69: 197, 70: 198, 71: 199, 72: 200, 73: 201, 173: 202, 125: 208, 74: 209, 75: 210, 76: 211, 77: 212, 78: 213, 79: 214, 80: 215, 81: 216, 82: 217, 185: 218, 92: 224, 247: 225, 83: 226, 84: 227, 85: 228, 86: 229, 87: 230, 88: 231, 89: 232, 90: 233, 178: 234, 48: 240, 49: 241, 50: 242, 51: 243, 52: 244, 53: 245, 54: 246, 55: 247, 56: 248, 57: 249, 179: 250, 159: 255}
|
||||
|
||||
class Cp424IncrementalDecoder(BaseEbcdicIncrementalDecoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for EBCDIC-273 (used in Hebrew-speaking locales).
|
||||
"""
|
||||
name = 'cp424'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -57,6 +82,9 @@ register_kuroko_codec(['cp424', '424', 'csibm424', 'ebcdic-cp-he', 'ibm424'], Cp
|
||||
|
||||
|
||||
class Cp437IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for OEM-437 (the default, hardware or United States DOS encoding)
|
||||
"""
|
||||
name = 'cp437'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -64,16 +92,25 @@ class Cp437IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {199: 128, 252: 129, 233: 130, 226: 131, 228: 132, 224: 133, 229: 134, 231: 135, 234: 136, 235: 137, 232: 138, 239: 139, 238: 140, 236: 141, 196: 142, 197: 143, 201: 144, 230: 145, 198: 146, 244: 147, 246: 148, 242: 149, 251: 150, 249: 151, 255: 152, 214: 153, 220: 154, 162: 155, 163: 156, 165: 157, 8359: 158, 402: 159, 225: 160, 237: 161, 243: 162, 250: 163, 241: 164, 209: 165, 170: 166, 186: 167, 191: 168, 8976: 169, 172: 170, 189: 171, 188: 172, 161: 173, 171: 174, 187: 175, 9617: 176, 9618: 177, 9619: 178, 9474: 179, 9508: 180, 9569: 181, 9570: 182, 9558: 183, 9557: 184, 9571: 185, 9553: 186, 9559: 187, 9565: 188, 9564: 189, 9563: 190, 9488: 191, 9492: 192, 9524: 193, 9516: 194, 9500: 195, 9472: 196, 9532: 197, 9566: 198, 9567: 199, 9562: 200, 9556: 201, 9577: 202, 9574: 203, 9568: 204, 9552: 205, 9580: 206, 9575: 207, 9576: 208, 9572: 209, 9573: 210, 9561: 211, 9560: 212, 9554: 213, 9555: 214, 9579: 215, 9578: 216, 9496: 217, 9484: 218, 9608: 219, 9604: 220, 9612: 221, 9616: 222, 9600: 223, 945: 224, 223: 225, 915: 226, 960: 227, 931: 228, 963: 229, 181: 230, 964: 231, 934: 232, 920: 233, 937: 234, 948: 235, 8734: 236, 966: 237, 949: 238, 8745: 239, 8801: 240, 177: 241, 8805: 242, 8804: 243, 8992: 244, 8993: 245, 247: 246, 8776: 247, 176: 248, 8729: 249, 183: 250, 8730: 251, 8319: 252, 178: 253, 9632: 254, 160: 255}
|
||||
|
||||
class Cp437IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for OEM-437 (the default, hardware or United States DOS encoding)
|
||||
"""
|
||||
name = 'cp437'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
def decoding_map():
|
||||
return {128: 199, 129: 252, 130: 233, 131: 226, 132: 228, 133: 224, 134: 229, 135: 231, 136: 234, 137: 235, 138: 232, 139: 239, 140: 238, 141: 236, 142: 196, 143: 197, 144: 201, 145: 230, 146: 198, 147: 244, 148: 246, 149: 242, 150: 251, 151: 249, 152: 255, 153: 214, 154: 220, 155: 162, 156: 163, 157: 165, 158: 8359, 159: 402, 160: 225, 161: 237, 162: 243, 163: 250, 164: 241, 165: 209, 166: 170, 167: 186, 168: 191, 169: 8976, 170: 172, 171: 189, 172: 188, 173: 161, 174: 171, 175: 187, 176: 9617, 177: 9618, 178: 9619, 179: 9474, 180: 9508, 181: 9569, 182: 9570, 183: 9558, 184: 9557, 185: 9571, 186: 9553, 187: 9559, 188: 9565, 189: 9564, 190: 9563, 191: 9488, 192: 9492, 193: 9524, 194: 9516, 195: 9500, 196: 9472, 197: 9532, 198: 9566, 199: 9567, 200: 9562, 201: 9556, 202: 9577, 203: 9574, 204: 9568, 205: 9552, 206: 9580, 207: 9575, 208: 9576, 209: 9572, 210: 9573, 211: 9561, 212: 9560, 213: 9554, 214: 9555, 215: 9579, 216: 9578, 217: 9496, 218: 9484, 219: 9608, 220: 9604, 221: 9612, 222: 9616, 223: 9600, 224: 945, 225: 223, 226: 915, 227: 960, 228: 931, 229: 963, 230: 181, 231: 964, 232: 934, 233: 920, 234: 937, 235: 948, 236: 8734, 237: 966, 238: 949, 239: 8745, 240: 8801, 241: 177, 242: 8805, 243: 8804, 244: 8992, 245: 8993, 246: 247, 247: 8776, 248: 176, 249: 8729, 250: 183, 251: 8730, 252: 8319, 253: 178, 254: 9632, 255: 160}
|
||||
|
||||
register_kuroko_codec(['cp437', '437', 'cspc8codepage437', 'ibm437'], Cp437IncrementalEncoder, Cp437IncrementalDecoder)
|
||||
register_kuroko_codec(['cp437', '437', 'cspc8codepage437', 'ibm437', 'oem-us'], Cp437IncrementalEncoder, Cp437IncrementalDecoder)
|
||||
|
||||
|
||||
class Cp500IncrementalEncoder(BaseEbcdicIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for EBCDIC-500.
|
||||
|
||||
This is the so-called "International" EBCDIC locale, used in Belgium and Switzerland, as well
|
||||
as on the AS/400 in Canada.
|
||||
"""
|
||||
name = 'cp500'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -81,6 +118,12 @@ class Cp500IncrementalEncoder(BaseEbcdicIncrementalEncoder):
|
||||
return {0: 0, 1: 1, 2: 2, 3: 3, 156: 4, 9: 5, 134: 6, 127: 7, 151: 8, 141: 9, 142: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16, 17: 17, 18: 18, 19: 19, 157: 20, 133: 21, 8: 22, 135: 23, 24: 24, 25: 25, 146: 26, 143: 27, 28: 28, 29: 29, 30: 30, 31: 31, 128: 32, 129: 33, 130: 34, 131: 35, 132: 36, 10: 37, 23: 38, 27: 39, 136: 40, 137: 41, 138: 42, 139: 43, 140: 44, 5: 45, 6: 46, 7: 47, 144: 48, 145: 49, 22: 50, 147: 51, 148: 52, 149: 53, 150: 54, 4: 55, 152: 56, 153: 57, 154: 58, 155: 59, 20: 60, 21: 61, 158: 62, 26: 63, 32: 64, 160: 65, 226: 66, 228: 67, 224: 68, 225: 69, 227: 70, 229: 71, 231: 72, 241: 73, 91: 74, 46: 75, 60: 76, 40: 77, 43: 78, 33: 79, 38: 80, 233: 81, 234: 82, 235: 83, 232: 84, 237: 85, 238: 86, 239: 87, 236: 88, 223: 89, 93: 90, 36: 91, 42: 92, 41: 93, 59: 94, 94: 95, 45: 96, 47: 97, 194: 98, 196: 99, 192: 100, 193: 101, 195: 102, 197: 103, 199: 104, 209: 105, 166: 106, 44: 107, 37: 108, 95: 109, 62: 110, 63: 111, 248: 112, 201: 113, 202: 114, 203: 115, 200: 116, 205: 117, 206: 118, 207: 119, 204: 120, 96: 121, 58: 122, 35: 123, 64: 124, 39: 125, 61: 126, 34: 127, 216: 128, 97: 129, 98: 130, 99: 131, 100: 132, 101: 133, 102: 134, 103: 135, 104: 136, 105: 137, 171: 138, 187: 139, 240: 140, 253: 141, 254: 142, 177: 143, 176: 144, 106: 145, 107: 146, 108: 147, 109: 148, 110: 149, 111: 150, 112: 151, 113: 152, 114: 153, 170: 154, 186: 155, 230: 156, 184: 157, 198: 158, 164: 159, 181: 160, 126: 161, 115: 162, 116: 163, 117: 164, 118: 165, 119: 166, 120: 167, 121: 168, 122: 169, 161: 170, 191: 171, 208: 172, 221: 173, 222: 174, 174: 175, 162: 176, 163: 177, 165: 178, 183: 179, 169: 180, 167: 181, 182: 182, 188: 183, 189: 184, 190: 185, 172: 186, 124: 187, 175: 188, 168: 189, 180: 190, 215: 191, 123: 192, 65: 193, 66: 194, 67: 195, 68: 196, 69: 197, 70: 198, 71: 199, 72: 200, 73: 201, 173: 202, 244: 203, 246: 204, 242: 205, 243: 206, 245: 207, 125: 208, 74: 209, 75: 210, 76: 211, 77: 212, 78: 213, 79: 214, 80: 215, 81: 216, 82: 217, 185: 218, 251: 219, 252: 220, 249: 221, 250: 222, 255: 223, 92: 224, 247: 225, 83: 226, 84: 227, 85: 228, 86: 229, 87: 230, 88: 231, 89: 232, 90: 233, 178: 234, 212: 235, 214: 236, 210: 237, 211: 238, 213: 239, 48: 240, 49: 241, 50: 242, 51: 243, 52: 244, 53: 245, 54: 246, 55: 247, 56: 248, 57: 249, 179: 250, 219: 251, 220: 252, 217: 253, 218: 254, 159: 255}
|
||||
|
||||
class Cp500IncrementalDecoder(BaseEbcdicIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for EBCDIC-500.
|
||||
|
||||
This is the so-called "International" EBCDIC locale, used in Belgium and Switzerland, as well
|
||||
as on the AS/400 in Canada.
|
||||
"""
|
||||
name = 'cp500'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -91,6 +134,13 @@ register_kuroko_codec(['cp500', '500', 'csibm500', 'ebcdic-cp-be', 'ebcdic-cp-ch
|
||||
|
||||
|
||||
class Cp720IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for OEM-720 (Arabic Letters with Box Drawing)
|
||||
|
||||
Note: OEM-720 competed with OEM-864 (which used a different layout, did not include box drawing
|
||||
characters, included positional forms rather than general letters for Arabic characters, and
|
||||
included separate East Arabic digits).
|
||||
"""
|
||||
name = 'cp720'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -98,6 +148,13 @@ class Cp720IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {128: 128, 129: 129, 233: 130, 226: 131, 132: 132, 224: 133, 134: 134, 231: 135, 234: 136, 235: 137, 232: 138, 239: 139, 238: 140, 141: 141, 142: 142, 143: 143, 144: 144, 1617: 145, 1618: 146, 244: 147, 164: 148, 1600: 149, 251: 150, 249: 151, 1569: 152, 1570: 153, 1571: 154, 1572: 155, 163: 156, 1573: 157, 1574: 158, 1575: 159, 1576: 160, 1577: 161, 1578: 162, 1579: 163, 1580: 164, 1581: 165, 1582: 166, 1583: 167, 1584: 168, 1585: 169, 1586: 170, 1587: 171, 1588: 172, 1589: 173, 171: 174, 187: 175, 9617: 176, 9618: 177, 9619: 178, 9474: 179, 9508: 180, 9569: 181, 9570: 182, 9558: 183, 9557: 184, 9571: 185, 9553: 186, 9559: 187, 9565: 188, 9564: 189, 9563: 190, 9488: 191, 9492: 192, 9524: 193, 9516: 194, 9500: 195, 9472: 196, 9532: 197, 9566: 198, 9567: 199, 9562: 200, 9556: 201, 9577: 202, 9574: 203, 9568: 204, 9552: 205, 9580: 206, 9575: 207, 9576: 208, 9572: 209, 9573: 210, 9561: 211, 9560: 212, 9554: 213, 9555: 214, 9579: 215, 9578: 216, 9496: 217, 9484: 218, 9608: 219, 9604: 220, 9612: 221, 9616: 222, 9600: 223, 1590: 224, 1591: 225, 1592: 226, 1593: 227, 1594: 228, 1601: 229, 181: 230, 1602: 231, 1603: 232, 1604: 233, 1605: 234, 1606: 235, 1607: 236, 1608: 237, 1609: 238, 1610: 239, 8801: 240, 1611: 241, 1612: 242, 1613: 243, 1614: 244, 1615: 245, 1616: 246, 8776: 247, 176: 248, 8729: 249, 183: 250, 8730: 251, 8319: 252, 178: 253, 9632: 254, 160: 255}
|
||||
|
||||
class Cp720IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for OEM-720 (Arabic Letters with Box Drawing)
|
||||
|
||||
Note: OEM-720 competed with OEM-864 (which used a different layout, did not include box drawing
|
||||
characters, included positional forms rather than general letters for Arabic characters, and
|
||||
included separate East Arabic digits).
|
||||
"""
|
||||
name = 'cp720'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -108,6 +165,12 @@ register_kuroko_codec(['cp720'], Cp720IncrementalEncoder, Cp720IncrementalDecode
|
||||
|
||||
|
||||
class Cp737IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for OEM-737 (Greek with Box Drawing).
|
||||
|
||||
Note: OEM-737 competed with OEM-869 (which used a different Greek layout and preserved only a
|
||||
subset of the box drawing characters, but included letters with combined trema/acute).
|
||||
"""
|
||||
name = 'cp737'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -115,6 +178,12 @@ class Cp737IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {913: 128, 914: 129, 915: 130, 916: 131, 917: 132, 918: 133, 919: 134, 920: 135, 921: 136, 922: 137, 923: 138, 924: 139, 925: 140, 926: 141, 927: 142, 928: 143, 929: 144, 931: 145, 932: 146, 933: 147, 934: 148, 935: 149, 936: 150, 937: 151, 945: 152, 946: 153, 947: 154, 948: 155, 949: 156, 950: 157, 951: 158, 952: 159, 953: 160, 954: 161, 955: 162, 956: 163, 957: 164, 958: 165, 959: 166, 960: 167, 961: 168, 963: 169, 962: 170, 964: 171, 965: 172, 966: 173, 967: 174, 968: 175, 9617: 176, 9618: 177, 9619: 178, 9474: 179, 9508: 180, 9569: 181, 9570: 182, 9558: 183, 9557: 184, 9571: 185, 9553: 186, 9559: 187, 9565: 188, 9564: 189, 9563: 190, 9488: 191, 9492: 192, 9524: 193, 9516: 194, 9500: 195, 9472: 196, 9532: 197, 9566: 198, 9567: 199, 9562: 200, 9556: 201, 9577: 202, 9574: 203, 9568: 204, 9552: 205, 9580: 206, 9575: 207, 9576: 208, 9572: 209, 9573: 210, 9561: 211, 9560: 212, 9554: 213, 9555: 214, 9579: 215, 9578: 216, 9496: 217, 9484: 218, 9608: 219, 9604: 220, 9612: 221, 9616: 222, 9600: 223, 969: 224, 940: 225, 941: 226, 942: 227, 970: 228, 943: 229, 972: 230, 973: 231, 971: 232, 974: 233, 902: 234, 904: 235, 905: 236, 906: 237, 908: 238, 910: 239, 911: 240, 177: 241, 8805: 242, 8804: 243, 938: 244, 939: 245, 247: 246, 8776: 247, 176: 248, 8729: 249, 183: 250, 8730: 251, 8319: 252, 178: 253, 9632: 254, 160: 255}
|
||||
|
||||
class Cp737IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for OEM-737 (Greek with Box Drawing).
|
||||
|
||||
Note: OEM-737 competed with OEM-869 (which used a different Greek layout and preserved only a
|
||||
subset of the box drawing characters, but included letters with combined trema/acute).
|
||||
"""
|
||||
name = 'cp737'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -125,6 +194,9 @@ register_kuroko_codec(['cp737'], Cp737IncrementalEncoder, Cp737IncrementalDecode
|
||||
|
||||
|
||||
class Cp775IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for OEM-775 (Baltic Rim)
|
||||
"""
|
||||
name = 'cp775'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -132,6 +204,9 @@ class Cp775IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {262: 128, 252: 129, 233: 130, 257: 131, 228: 132, 291: 133, 229: 134, 263: 135, 322: 136, 275: 137, 342: 138, 343: 139, 299: 140, 377: 141, 196: 142, 197: 143, 201: 144, 230: 145, 198: 146, 333: 147, 246: 148, 290: 149, 162: 150, 346: 151, 347: 152, 214: 153, 220: 154, 248: 155, 163: 156, 216: 157, 215: 158, 164: 159, 256: 160, 298: 161, 243: 162, 379: 163, 380: 164, 378: 165, 8221: 166, 166: 167, 169: 168, 174: 169, 172: 170, 189: 171, 188: 172, 321: 173, 171: 174, 187: 175, 9617: 176, 9618: 177, 9619: 178, 9474: 179, 9508: 180, 260: 181, 268: 182, 280: 183, 278: 184, 9571: 185, 9553: 186, 9559: 187, 9565: 188, 302: 189, 352: 190, 9488: 191, 9492: 192, 9524: 193, 9516: 194, 9500: 195, 9472: 196, 9532: 197, 370: 198, 362: 199, 9562: 200, 9556: 201, 9577: 202, 9574: 203, 9568: 204, 9552: 205, 9580: 206, 381: 207, 261: 208, 269: 209, 281: 210, 279: 211, 303: 212, 353: 213, 371: 214, 363: 215, 382: 216, 9496: 217, 9484: 218, 9608: 219, 9604: 220, 9612: 221, 9616: 222, 9600: 223, 211: 224, 223: 225, 332: 226, 323: 227, 245: 228, 213: 229, 181: 230, 324: 231, 310: 232, 311: 233, 315: 234, 316: 235, 326: 236, 274: 237, 325: 238, 8217: 239, 173: 240, 177: 241, 8220: 242, 190: 243, 182: 244, 167: 245, 247: 246, 8222: 247, 176: 248, 8729: 249, 183: 250, 185: 251, 179: 252, 178: 253, 9632: 254, 160: 255}
|
||||
|
||||
class Cp775IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for OEM-775 (Baltic Rim)
|
||||
"""
|
||||
name = 'cp775'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -142,6 +217,9 @@ register_kuroko_codec(['cp775', '775', 'cspc775baltic', 'ibm775'], Cp775Incremen
|
||||
|
||||
|
||||
class Cp850IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for OEM-850 (Western Europe and Canada)
|
||||
"""
|
||||
name = 'cp850'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -149,6 +227,9 @@ class Cp850IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {199: 128, 252: 129, 233: 130, 226: 131, 228: 132, 224: 133, 229: 134, 231: 135, 234: 136, 235: 137, 232: 138, 239: 139, 238: 140, 236: 141, 196: 142, 197: 143, 201: 144, 230: 145, 198: 146, 244: 147, 246: 148, 242: 149, 251: 150, 249: 151, 255: 152, 214: 153, 220: 154, 248: 155, 163: 156, 216: 157, 215: 158, 402: 159, 225: 160, 237: 161, 243: 162, 250: 163, 241: 164, 209: 165, 170: 166, 186: 167, 191: 168, 174: 169, 172: 170, 189: 171, 188: 172, 161: 173, 171: 174, 187: 175, 9617: 176, 9618: 177, 9619: 178, 9474: 179, 9508: 180, 193: 181, 194: 182, 192: 183, 169: 184, 9571: 185, 9553: 186, 9559: 187, 9565: 188, 162: 189, 165: 190, 9488: 191, 9492: 192, 9524: 193, 9516: 194, 9500: 195, 9472: 196, 9532: 197, 227: 198, 195: 199, 9562: 200, 9556: 201, 9577: 202, 9574: 203, 9568: 204, 9552: 205, 9580: 206, 164: 207, 240: 208, 208: 209, 202: 210, 203: 211, 200: 212, 305: 213, 205: 214, 206: 215, 207: 216, 9496: 217, 9484: 218, 9608: 219, 9604: 220, 166: 221, 204: 222, 9600: 223, 211: 224, 223: 225, 212: 226, 210: 227, 245: 228, 213: 229, 181: 230, 254: 231, 222: 232, 218: 233, 219: 234, 217: 235, 253: 236, 221: 237, 175: 238, 180: 239, 173: 240, 177: 241, 8215: 242, 190: 243, 182: 244, 167: 245, 247: 246, 184: 247, 176: 248, 168: 249, 183: 250, 185: 251, 179: 252, 178: 253, 9632: 254, 160: 255}
|
||||
|
||||
class Cp850IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for OEM-850 (Western Europe and Canada)
|
||||
"""
|
||||
name = 'cp850'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -159,6 +240,9 @@ register_kuroko_codec(['cp850', '850', 'cspc850multilingual', 'ibm850'], Cp850In
|
||||
|
||||
|
||||
class Cp852IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for OEM-852 (Central Europe)
|
||||
"""
|
||||
name = 'cp852'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -166,6 +250,9 @@ class Cp852IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {199: 128, 252: 129, 233: 130, 226: 131, 228: 132, 367: 133, 263: 134, 231: 135, 322: 136, 235: 137, 336: 138, 337: 139, 238: 140, 377: 141, 196: 142, 262: 143, 201: 144, 313: 145, 314: 146, 244: 147, 246: 148, 317: 149, 318: 150, 346: 151, 347: 152, 214: 153, 220: 154, 356: 155, 357: 156, 321: 157, 215: 158, 269: 159, 225: 160, 237: 161, 243: 162, 250: 163, 260: 164, 261: 165, 381: 166, 382: 167, 280: 168, 281: 169, 172: 170, 378: 171, 268: 172, 351: 173, 171: 174, 187: 175, 9617: 176, 9618: 177, 9619: 178, 9474: 179, 9508: 180, 193: 181, 194: 182, 282: 183, 350: 184, 9571: 185, 9553: 186, 9559: 187, 9565: 188, 379: 189, 380: 190, 9488: 191, 9492: 192, 9524: 193, 9516: 194, 9500: 195, 9472: 196, 9532: 197, 258: 198, 259: 199, 9562: 200, 9556: 201, 9577: 202, 9574: 203, 9568: 204, 9552: 205, 9580: 206, 164: 207, 273: 208, 272: 209, 270: 210, 203: 211, 271: 212, 327: 213, 205: 214, 206: 215, 283: 216, 9496: 217, 9484: 218, 9608: 219, 9604: 220, 354: 221, 366: 222, 9600: 223, 211: 224, 223: 225, 212: 226, 323: 227, 324: 228, 328: 229, 352: 230, 353: 231, 340: 232, 218: 233, 341: 234, 368: 235, 253: 236, 221: 237, 355: 238, 180: 239, 173: 240, 733: 241, 731: 242, 711: 243, 728: 244, 167: 245, 247: 246, 184: 247, 176: 248, 168: 249, 729: 250, 369: 251, 344: 252, 345: 253, 9632: 254, 160: 255}
|
||||
|
||||
class Cp852IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for OEM-852 (Central Europe)
|
||||
"""
|
||||
name = 'cp852'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -176,6 +263,14 @@ register_kuroko_codec(['cp852', '852', 'cspcp852', 'ibm852'], Cp852IncrementalEn
|
||||
|
||||
|
||||
class Cp855IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for OEM-855 (Balkan Cyrillic).
|
||||
|
||||
Note: OEM-855 competed with OEM-866 for Cyrillic; OEM-866 preserved all box drawing characters
|
||||
(rather then only a subset) and was more popular for Russian, but did not provide coverage
|
||||
for all of the different South Slavic Cyrillic orthographies, unlike OEM-855. Their layouts
|
||||
for Cyrillic are entirely different.
|
||||
"""
|
||||
name = 'cp855'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -183,6 +278,14 @@ class Cp855IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {1106: 128, 1026: 129, 1107: 130, 1027: 131, 1105: 132, 1025: 133, 1108: 134, 1028: 135, 1109: 136, 1029: 137, 1110: 138, 1030: 139, 1111: 140, 1031: 141, 1112: 142, 1032: 143, 1113: 144, 1033: 145, 1114: 146, 1034: 147, 1115: 148, 1035: 149, 1116: 150, 1036: 151, 1118: 152, 1038: 153, 1119: 154, 1039: 155, 1102: 156, 1070: 157, 1098: 158, 1066: 159, 1072: 160, 1040: 161, 1073: 162, 1041: 163, 1094: 164, 1062: 165, 1076: 166, 1044: 167, 1077: 168, 1045: 169, 1092: 170, 1060: 171, 1075: 172, 1043: 173, 171: 174, 187: 175, 9617: 176, 9618: 177, 9619: 178, 9474: 179, 9508: 180, 1093: 181, 1061: 182, 1080: 183, 1048: 184, 9571: 185, 9553: 186, 9559: 187, 9565: 188, 1081: 189, 1049: 190, 9488: 191, 9492: 192, 9524: 193, 9516: 194, 9500: 195, 9472: 196, 9532: 197, 1082: 198, 1050: 199, 9562: 200, 9556: 201, 9577: 202, 9574: 203, 9568: 204, 9552: 205, 9580: 206, 164: 207, 1083: 208, 1051: 209, 1084: 210, 1052: 211, 1085: 212, 1053: 213, 1086: 214, 1054: 215, 1087: 216, 9496: 217, 9484: 218, 9608: 219, 9604: 220, 1055: 221, 1103: 222, 9600: 223, 1071: 224, 1088: 225, 1056: 226, 1089: 227, 1057: 228, 1090: 229, 1058: 230, 1091: 231, 1059: 232, 1078: 233, 1046: 234, 1074: 235, 1042: 236, 1100: 237, 1068: 238, 8470: 239, 173: 240, 1099: 241, 1067: 242, 1079: 243, 1047: 244, 1096: 245, 1064: 246, 1101: 247, 1069: 248, 1097: 249, 1065: 250, 1095: 251, 1063: 252, 167: 253, 9632: 254, 160: 255}
|
||||
|
||||
class Cp855IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for OEM-855 (Balkan Cyrillic).
|
||||
|
||||
Note: OEM-855 competed with OEM-866 for Cyrillic; OEM-866 preserved all box drawing characters
|
||||
(rather then only a subset) and was more popular for Russian, but did not provide coverage
|
||||
for all of the different South Slavic Cyrillic orthographies, unlike OEM-855. Their layouts
|
||||
for Cyrillic are entirely different.
|
||||
"""
|
||||
name = 'cp855'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -193,6 +296,12 @@ register_kuroko_codec(['cp855', '855', 'csibm855', 'ibm855'], Cp855IncrementalEn
|
||||
|
||||
|
||||
class Cp856IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for OEM-856 (Hebrew).
|
||||
|
||||
Note: OEM-856 competed with OEM-862 for Hebrew, although they encoded the Hebrew letters in the
|
||||
same layout. OEM-862 preserved all box drawing characters, while OEM-856 preserved a subset only.
|
||||
"""
|
||||
name = 'cp856'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -200,6 +309,12 @@ class Cp856IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {1488: 128, 1489: 129, 1490: 130, 1491: 131, 1492: 132, 1493: 133, 1494: 134, 1495: 135, 1496: 136, 1497: 137, 1498: 138, 1499: 139, 1500: 140, 1501: 141, 1502: 142, 1503: 143, 1504: 144, 1505: 145, 1506: 146, 1507: 147, 1508: 148, 1509: 149, 1510: 150, 1511: 151, 1512: 152, 1513: 153, 1514: 154, 163: 156, 215: 158, 174: 169, 172: 170, 189: 171, 188: 172, 171: 174, 187: 175, 9617: 176, 9618: 177, 9619: 178, 9474: 179, 9508: 180, 169: 184, 9571: 185, 9553: 186, 9559: 187, 9565: 188, 162: 189, 165: 190, 9488: 191, 9492: 192, 9524: 193, 9516: 194, 9500: 195, 9472: 196, 9532: 197, 9562: 200, 9556: 201, 9577: 202, 9574: 203, 9568: 204, 9552: 205, 9580: 206, 164: 207, 9496: 217, 9484: 218, 9608: 219, 9604: 220, 166: 221, 9600: 223, 181: 230, 175: 238, 180: 239, 173: 240, 177: 241, 8215: 242, 190: 243, 182: 244, 167: 245, 247: 246, 184: 247, 176: 248, 168: 249, 183: 250, 185: 251, 179: 252, 178: 253, 9632: 254, 160: 255}
|
||||
|
||||
class Cp856IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for OEM-856 (Hebrew).
|
||||
|
||||
Note: OEM-856 competed with OEM-862 for Hebrew, although they encoded the Hebrew letters in the
|
||||
same layout. OEM-862 preserved all box drawing characters, while OEM-856 preserved a subset only.
|
||||
"""
|
||||
name = 'cp856'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -210,6 +325,9 @@ register_kuroko_codec(['cp856'], Cp856IncrementalEncoder, Cp856IncrementalDecode
|
||||
|
||||
|
||||
class Cp857IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for OEM-857 (Turkish).
|
||||
"""
|
||||
name = 'cp857'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -217,6 +335,9 @@ class Cp857IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {199: 128, 252: 129, 233: 130, 226: 131, 228: 132, 224: 133, 229: 134, 231: 135, 234: 136, 235: 137, 232: 138, 239: 139, 238: 140, 305: 141, 196: 142, 197: 143, 201: 144, 230: 145, 198: 146, 244: 147, 246: 148, 242: 149, 251: 150, 249: 151, 304: 152, 214: 153, 220: 154, 248: 155, 163: 156, 216: 157, 350: 158, 351: 159, 225: 160, 237: 161, 243: 162, 250: 163, 241: 164, 209: 165, 286: 166, 287: 167, 191: 168, 174: 169, 172: 170, 189: 171, 188: 172, 161: 173, 171: 174, 187: 175, 9617: 176, 9618: 177, 9619: 178, 9474: 179, 9508: 180, 193: 181, 194: 182, 192: 183, 169: 184, 9571: 185, 9553: 186, 9559: 187, 9565: 188, 162: 189, 165: 190, 9488: 191, 9492: 192, 9524: 193, 9516: 194, 9500: 195, 9472: 196, 9532: 197, 227: 198, 195: 199, 9562: 200, 9556: 201, 9577: 202, 9574: 203, 9568: 204, 9552: 205, 9580: 206, 164: 207, 186: 208, 170: 209, 202: 210, 203: 211, 200: 212, 205: 214, 206: 215, 207: 216, 9496: 217, 9484: 218, 9608: 219, 9604: 220, 166: 221, 204: 222, 9600: 223, 211: 224, 223: 225, 212: 226, 210: 227, 245: 228, 213: 229, 181: 230, 215: 232, 218: 233, 219: 234, 217: 235, 236: 236, 255: 237, 175: 238, 180: 239, 173: 240, 177: 241, 190: 243, 182: 244, 167: 245, 247: 246, 184: 247, 176: 248, 168: 249, 183: 250, 185: 251, 179: 252, 178: 253, 9632: 254, 160: 255}
|
||||
|
||||
class Cp857IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for OEM-857 (Turkish).
|
||||
"""
|
||||
name = 'cp857'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -227,6 +348,9 @@ register_kuroko_codec(['cp857', '857', 'csibm857', 'ibm857'], Cp857IncrementalEn
|
||||
|
||||
|
||||
class Cp858IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for OEM-858 (Western Europe and Canada with the Euro sign).
|
||||
"""
|
||||
name = 'cp858'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -234,6 +358,9 @@ class Cp858IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {199: 128, 252: 129, 233: 130, 226: 131, 228: 132, 224: 133, 229: 134, 231: 135, 234: 136, 235: 137, 232: 138, 239: 139, 238: 140, 236: 141, 196: 142, 197: 143, 201: 144, 230: 145, 198: 146, 244: 147, 246: 148, 242: 149, 251: 150, 249: 151, 255: 152, 214: 153, 220: 154, 248: 155, 163: 156, 216: 157, 215: 158, 402: 159, 225: 160, 237: 161, 243: 162, 250: 163, 241: 164, 209: 165, 170: 166, 186: 167, 191: 168, 174: 169, 172: 170, 189: 171, 188: 172, 161: 173, 171: 174, 187: 175, 9617: 176, 9618: 177, 9619: 178, 9474: 179, 9508: 180, 193: 181, 194: 182, 192: 183, 169: 184, 9571: 185, 9553: 186, 9559: 187, 9565: 188, 162: 189, 165: 190, 9488: 191, 9492: 192, 9524: 193, 9516: 194, 9500: 195, 9472: 196, 9532: 197, 227: 198, 195: 199, 9562: 200, 9556: 201, 9577: 202, 9574: 203, 9568: 204, 9552: 205, 9580: 206, 164: 207, 240: 208, 208: 209, 202: 210, 203: 211, 200: 212, 8364: 213, 205: 214, 206: 215, 207: 216, 9496: 217, 9484: 218, 9608: 219, 9604: 220, 166: 221, 204: 222, 9600: 223, 211: 224, 223: 225, 212: 226, 210: 227, 245: 228, 213: 229, 181: 230, 254: 231, 222: 232, 218: 233, 219: 234, 217: 235, 253: 236, 221: 237, 175: 238, 180: 239, 173: 240, 177: 241, 8215: 242, 190: 243, 182: 244, 167: 245, 247: 246, 184: 247, 176: 248, 168: 249, 183: 250, 185: 251, 179: 252, 178: 253, 9632: 254, 160: 255}
|
||||
|
||||
class Cp858IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for OEM-858 (Western Europe and Canada with the Euro sign).
|
||||
"""
|
||||
name = 'cp858'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -244,6 +371,9 @@ register_kuroko_codec(['cp858', '858', 'csibm858', 'ibm858'], Cp858IncrementalEn
|
||||
|
||||
|
||||
class Cp860IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for OEM-860 (European Portugese).
|
||||
"""
|
||||
name = 'cp860'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -251,6 +381,9 @@ class Cp860IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {199: 128, 252: 129, 233: 130, 226: 131, 227: 132, 224: 133, 193: 134, 231: 135, 234: 136, 202: 137, 232: 138, 205: 139, 212: 140, 236: 141, 195: 142, 194: 143, 201: 144, 192: 145, 200: 146, 244: 147, 245: 148, 242: 149, 218: 150, 249: 151, 204: 152, 213: 153, 220: 154, 162: 155, 163: 156, 217: 157, 8359: 158, 211: 159, 225: 160, 237: 161, 243: 162, 250: 163, 241: 164, 209: 165, 170: 166, 186: 167, 191: 168, 210: 169, 172: 170, 189: 171, 188: 172, 161: 173, 171: 174, 187: 175, 9617: 176, 9618: 177, 9619: 178, 9474: 179, 9508: 180, 9569: 181, 9570: 182, 9558: 183, 9557: 184, 9571: 185, 9553: 186, 9559: 187, 9565: 188, 9564: 189, 9563: 190, 9488: 191, 9492: 192, 9524: 193, 9516: 194, 9500: 195, 9472: 196, 9532: 197, 9566: 198, 9567: 199, 9562: 200, 9556: 201, 9577: 202, 9574: 203, 9568: 204, 9552: 205, 9580: 206, 9575: 207, 9576: 208, 9572: 209, 9573: 210, 9561: 211, 9560: 212, 9554: 213, 9555: 214, 9579: 215, 9578: 216, 9496: 217, 9484: 218, 9608: 219, 9604: 220, 9612: 221, 9616: 222, 9600: 223, 945: 224, 223: 225, 915: 226, 960: 227, 931: 228, 963: 229, 181: 230, 964: 231, 934: 232, 920: 233, 937: 234, 948: 235, 8734: 236, 966: 237, 949: 238, 8745: 239, 8801: 240, 177: 241, 8805: 242, 8804: 243, 8992: 244, 8993: 245, 247: 246, 8776: 247, 176: 248, 8729: 249, 183: 250, 8730: 251, 8319: 252, 178: 253, 9632: 254, 160: 255}
|
||||
|
||||
class Cp860IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for OEM-860 (European Portugese).
|
||||
"""
|
||||
name = 'cp860'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -261,6 +394,9 @@ register_kuroko_codec(['cp860', '860', 'csibm860', 'ibm860'], Cp860IncrementalEn
|
||||
|
||||
|
||||
class Cp861IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for OEM-861 (Icelandic).
|
||||
"""
|
||||
name = 'cp861'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -268,6 +404,9 @@ class Cp861IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {199: 128, 252: 129, 233: 130, 226: 131, 228: 132, 224: 133, 229: 134, 231: 135, 234: 136, 235: 137, 232: 138, 208: 139, 240: 140, 222: 141, 196: 142, 197: 143, 201: 144, 230: 145, 198: 146, 244: 147, 246: 148, 254: 149, 251: 150, 221: 151, 253: 152, 214: 153, 220: 154, 248: 155, 163: 156, 216: 157, 8359: 158, 402: 159, 225: 160, 237: 161, 243: 162, 250: 163, 193: 164, 205: 165, 211: 166, 218: 167, 191: 168, 8976: 169, 172: 170, 189: 171, 188: 172, 161: 173, 171: 174, 187: 175, 9617: 176, 9618: 177, 9619: 178, 9474: 179, 9508: 180, 9569: 181, 9570: 182, 9558: 183, 9557: 184, 9571: 185, 9553: 186, 9559: 187, 9565: 188, 9564: 189, 9563: 190, 9488: 191, 9492: 192, 9524: 193, 9516: 194, 9500: 195, 9472: 196, 9532: 197, 9566: 198, 9567: 199, 9562: 200, 9556: 201, 9577: 202, 9574: 203, 9568: 204, 9552: 205, 9580: 206, 9575: 207, 9576: 208, 9572: 209, 9573: 210, 9561: 211, 9560: 212, 9554: 213, 9555: 214, 9579: 215, 9578: 216, 9496: 217, 9484: 218, 9608: 219, 9604: 220, 9612: 221, 9616: 222, 9600: 223, 945: 224, 223: 225, 915: 226, 960: 227, 931: 228, 963: 229, 181: 230, 964: 231, 934: 232, 920: 233, 937: 234, 948: 235, 8734: 236, 966: 237, 949: 238, 8745: 239, 8801: 240, 177: 241, 8805: 242, 8804: 243, 8992: 244, 8993: 245, 247: 246, 8776: 247, 176: 248, 8729: 249, 183: 250, 8730: 251, 8319: 252, 178: 253, 9632: 254, 160: 255}
|
||||
|
||||
class Cp861IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for OEM-861 (Icelandic).
|
||||
"""
|
||||
name = 'cp861'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -278,6 +417,12 @@ register_kuroko_codec(['cp861', '861', 'cp-is', 'csibm861', 'ibm861'], Cp861Incr
|
||||
|
||||
|
||||
class Cp862IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for OEM-862 (Hebrew and Box Drawing).
|
||||
|
||||
Note: OEM-862 competed with OEM-856 for Hebrew, although they encoded the Hebrew letters in the
|
||||
same layout. OEM-862 preserved all box drawing characters, while OEM-856 preserved a subset only.
|
||||
"""
|
||||
name = 'cp862'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -285,6 +430,12 @@ class Cp862IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {1488: 128, 1489: 129, 1490: 130, 1491: 131, 1492: 132, 1493: 133, 1494: 134, 1495: 135, 1496: 136, 1497: 137, 1498: 138, 1499: 139, 1500: 140, 1501: 141, 1502: 142, 1503: 143, 1504: 144, 1505: 145, 1506: 146, 1507: 147, 1508: 148, 1509: 149, 1510: 150, 1511: 151, 1512: 152, 1513: 153, 1514: 154, 162: 155, 163: 156, 165: 157, 8359: 158, 402: 159, 225: 160, 237: 161, 243: 162, 250: 163, 241: 164, 209: 165, 170: 166, 186: 167, 191: 168, 8976: 169, 172: 170, 189: 171, 188: 172, 161: 173, 171: 174, 187: 175, 9617: 176, 9618: 177, 9619: 178, 9474: 179, 9508: 180, 9569: 181, 9570: 182, 9558: 183, 9557: 184, 9571: 185, 9553: 186, 9559: 187, 9565: 188, 9564: 189, 9563: 190, 9488: 191, 9492: 192, 9524: 193, 9516: 194, 9500: 195, 9472: 196, 9532: 197, 9566: 198, 9567: 199, 9562: 200, 9556: 201, 9577: 202, 9574: 203, 9568: 204, 9552: 205, 9580: 206, 9575: 207, 9576: 208, 9572: 209, 9573: 210, 9561: 211, 9560: 212, 9554: 213, 9555: 214, 9579: 215, 9578: 216, 9496: 217, 9484: 218, 9608: 219, 9604: 220, 9612: 221, 9616: 222, 9600: 223, 945: 224, 223: 225, 915: 226, 960: 227, 931: 228, 963: 229, 181: 230, 964: 231, 934: 232, 920: 233, 937: 234, 948: 235, 8734: 236, 966: 237, 949: 238, 8745: 239, 8801: 240, 177: 241, 8805: 242, 8804: 243, 8992: 244, 8993: 245, 247: 246, 8776: 247, 176: 248, 8729: 249, 183: 250, 8730: 251, 8319: 252, 178: 253, 9632: 254, 160: 255}
|
||||
|
||||
class Cp862IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for OEM-862 (Hebrew and Box Drawing).
|
||||
|
||||
Note: OEM-862 competed with OEM-856 for Hebrew, although they encoded the Hebrew letters in the
|
||||
same layout. OEM-862 preserved all box drawing characters, while OEM-856 preserved a subset only.
|
||||
"""
|
||||
name = 'cp862'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -295,6 +446,9 @@ register_kuroko_codec(['cp862', '862', 'cspc862latinhebrew', 'ibm862'], Cp862Inc
|
||||
|
||||
|
||||
class Cp863IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for OEM-863 (Canadian French).
|
||||
"""
|
||||
name = 'cp863'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -302,6 +456,9 @@ class Cp863IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {199: 128, 252: 129, 233: 130, 226: 131, 194: 132, 224: 133, 182: 134, 231: 135, 234: 136, 235: 137, 232: 138, 239: 139, 238: 140, 8215: 141, 192: 142, 167: 143, 201: 144, 200: 145, 202: 146, 244: 147, 203: 148, 207: 149, 251: 150, 249: 151, 164: 152, 212: 153, 220: 154, 162: 155, 163: 156, 217: 157, 219: 158, 402: 159, 166: 160, 180: 161, 243: 162, 250: 163, 168: 164, 184: 165, 179: 166, 175: 167, 206: 168, 8976: 169, 172: 170, 189: 171, 188: 172, 190: 173, 171: 174, 187: 175, 9617: 176, 9618: 177, 9619: 178, 9474: 179, 9508: 180, 9569: 181, 9570: 182, 9558: 183, 9557: 184, 9571: 185, 9553: 186, 9559: 187, 9565: 188, 9564: 189, 9563: 190, 9488: 191, 9492: 192, 9524: 193, 9516: 194, 9500: 195, 9472: 196, 9532: 197, 9566: 198, 9567: 199, 9562: 200, 9556: 201, 9577: 202, 9574: 203, 9568: 204, 9552: 205, 9580: 206, 9575: 207, 9576: 208, 9572: 209, 9573: 210, 9561: 211, 9560: 212, 9554: 213, 9555: 214, 9579: 215, 9578: 216, 9496: 217, 9484: 218, 9608: 219, 9604: 220, 9612: 221, 9616: 222, 9600: 223, 945: 224, 223: 225, 915: 226, 960: 227, 931: 228, 963: 229, 181: 230, 964: 231, 934: 232, 920: 233, 937: 234, 948: 235, 8734: 236, 966: 237, 949: 238, 8745: 239, 8801: 240, 177: 241, 8805: 242, 8804: 243, 8992: 244, 8993: 245, 247: 246, 8776: 247, 176: 248, 8729: 249, 183: 250, 8730: 251, 8319: 252, 178: 253, 9632: 254, 160: 255}
|
||||
|
||||
class Cp863IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for OEM-863 (Canadian French).
|
||||
"""
|
||||
name = 'cp863'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -312,6 +469,13 @@ register_kuroko_codec(['cp863', '863', 'csibm863', 'ibm863'], Cp863IncrementalEn
|
||||
|
||||
|
||||
class Cp864IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for OEM-864 (Arabic Positional Forms)
|
||||
|
||||
Note: OEM-864 competed with OEM-720 (which used a different layout, included box drawing
|
||||
characters, included general letters rather than positional forms of Arabic characters, and
|
||||
didn't include separate East Arabic digits).
|
||||
"""
|
||||
name = 'cp864'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -319,6 +483,13 @@ class Cp864IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {176: 128, 183: 129, 8729: 130, 8730: 131, 9618: 132, 9472: 133, 9474: 134, 9532: 135, 9508: 136, 9516: 137, 9500: 138, 9524: 139, 9488: 140, 9484: 141, 9492: 142, 9496: 143, 946: 144, 8734: 145, 966: 146, 177: 147, 189: 148, 188: 149, 8776: 150, 171: 151, 187: 152, 65271: 153, 65272: 154, 65275: 157, 65276: 158, 160: 160, 173: 161, 65154: 162, 163: 163, 164: 164, 65156: 165, 65166: 168, 65167: 169, 65173: 170, 65177: 171, 1548: 172, 65181: 173, 65185: 174, 65189: 175, 1632: 176, 1633: 177, 1634: 178, 1635: 179, 1636: 180, 1637: 181, 1638: 182, 1639: 183, 1640: 184, 1641: 185, 65233: 186, 1563: 187, 65201: 188, 65205: 189, 65209: 190, 1567: 191, 162: 192, 65152: 193, 65153: 194, 65155: 195, 65157: 196, 65226: 197, 65163: 198, 65165: 199, 65169: 200, 65171: 201, 65175: 202, 65179: 203, 65183: 204, 65187: 205, 65191: 206, 65193: 207, 65195: 208, 65197: 209, 65199: 210, 65203: 211, 65207: 212, 65211: 213, 65215: 214, 65217: 215, 65221: 216, 65227: 217, 65231: 218, 166: 219, 172: 220, 247: 221, 215: 222, 65225: 223, 1600: 224, 65235: 225, 65239: 226, 65243: 227, 65247: 228, 65251: 229, 65255: 230, 65259: 231, 65261: 232, 65263: 233, 65267: 234, 65213: 235, 65228: 236, 65230: 237, 65229: 238, 65249: 239, 65149: 240, 1617: 241, 65253: 242, 65257: 243, 65260: 244, 65264: 245, 65266: 246, 65232: 247, 65237: 248, 65269: 249, 65270: 250, 65245: 251, 65241: 252, 65265: 253, 9632: 254}
|
||||
|
||||
class Cp864IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for OEM-864 (Arabic Positional Forms)
|
||||
|
||||
Note: OEM-864 competed with OEM-720 (which used a different layout, included box drawing
|
||||
characters, included general letters rather than positional forms of Arabic characters, and
|
||||
didn't include separate East Arabic digits).
|
||||
"""
|
||||
name = 'cp864'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -329,6 +500,9 @@ register_kuroko_codec(['cp864', '864', 'csibm864', 'ibm864'], Cp864IncrementalEn
|
||||
|
||||
|
||||
class Cp865IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for OEM-865 (Continental Nordic)
|
||||
"""
|
||||
name = 'cp865'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -336,6 +510,9 @@ class Cp865IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {199: 128, 252: 129, 233: 130, 226: 131, 228: 132, 224: 133, 229: 134, 231: 135, 234: 136, 235: 137, 232: 138, 239: 139, 238: 140, 236: 141, 196: 142, 197: 143, 201: 144, 230: 145, 198: 146, 244: 147, 246: 148, 242: 149, 251: 150, 249: 151, 255: 152, 214: 153, 220: 154, 248: 155, 163: 156, 216: 157, 8359: 158, 402: 159, 225: 160, 237: 161, 243: 162, 250: 163, 241: 164, 209: 165, 170: 166, 186: 167, 191: 168, 8976: 169, 172: 170, 189: 171, 188: 172, 161: 173, 171: 174, 164: 175, 9617: 176, 9618: 177, 9619: 178, 9474: 179, 9508: 180, 9569: 181, 9570: 182, 9558: 183, 9557: 184, 9571: 185, 9553: 186, 9559: 187, 9565: 188, 9564: 189, 9563: 190, 9488: 191, 9492: 192, 9524: 193, 9516: 194, 9500: 195, 9472: 196, 9532: 197, 9566: 198, 9567: 199, 9562: 200, 9556: 201, 9577: 202, 9574: 203, 9568: 204, 9552: 205, 9580: 206, 9575: 207, 9576: 208, 9572: 209, 9573: 210, 9561: 211, 9560: 212, 9554: 213, 9555: 214, 9579: 215, 9578: 216, 9496: 217, 9484: 218, 9608: 219, 9604: 220, 9612: 221, 9616: 222, 9600: 223, 945: 224, 223: 225, 915: 226, 960: 227, 931: 228, 963: 229, 181: 230, 964: 231, 934: 232, 920: 233, 937: 234, 948: 235, 8734: 236, 966: 237, 949: 238, 8745: 239, 8801: 240, 177: 241, 8805: 242, 8804: 243, 8992: 244, 8993: 245, 247: 246, 8776: 247, 176: 248, 8729: 249, 183: 250, 8730: 251, 8319: 252, 178: 253, 9632: 254, 160: 255}
|
||||
|
||||
class Cp865IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for OEM-865 (Continental Nordic)
|
||||
"""
|
||||
name = 'cp865'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -346,6 +523,12 @@ register_kuroko_codec(['cp865', '865', 'csibm865', 'ibm865'], Cp865IncrementalEn
|
||||
|
||||
|
||||
class Cp869IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for OEM-869 (Greek).
|
||||
|
||||
Note: OEM-869 competed with OEM-737 (which used a different Greek layout and preserved all of
|
||||
the box drawing characters rather than a subset, but omitted letters with combined trema/acute).
|
||||
"""
|
||||
name = 'cp869'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -353,6 +536,12 @@ class Cp869IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {902: 134, 183: 136, 172: 137, 166: 138, 8216: 139, 8217: 140, 904: 141, 8213: 142, 905: 143, 906: 144, 938: 145, 908: 146, 910: 149, 939: 150, 169: 151, 911: 152, 178: 153, 179: 154, 940: 155, 163: 156, 941: 157, 942: 158, 943: 159, 970: 160, 912: 161, 972: 162, 973: 163, 913: 164, 914: 165, 915: 166, 916: 167, 917: 168, 918: 169, 919: 170, 189: 171, 920: 172, 921: 173, 171: 174, 187: 175, 9617: 176, 9618: 177, 9619: 178, 9474: 179, 9508: 180, 922: 181, 923: 182, 924: 183, 925: 184, 9571: 185, 9553: 186, 9559: 187, 9565: 188, 926: 189, 927: 190, 9488: 191, 9492: 192, 9524: 193, 9516: 194, 9500: 195, 9472: 196, 9532: 197, 928: 198, 929: 199, 9562: 200, 9556: 201, 9577: 202, 9574: 203, 9568: 204, 9552: 205, 9580: 206, 931: 207, 932: 208, 933: 209, 934: 210, 935: 211, 936: 212, 937: 213, 945: 214, 946: 215, 947: 216, 9496: 217, 9484: 218, 9608: 219, 9604: 220, 948: 221, 949: 222, 9600: 223, 950: 224, 951: 225, 952: 226, 953: 227, 954: 228, 955: 229, 956: 230, 957: 231, 958: 232, 959: 233, 960: 234, 961: 235, 963: 236, 962: 237, 964: 238, 900: 239, 173: 240, 177: 241, 965: 242, 966: 243, 967: 244, 167: 245, 968: 246, 901: 247, 176: 248, 168: 249, 969: 250, 971: 251, 944: 252, 974: 253, 9632: 254, 160: 255}
|
||||
|
||||
class Cp869IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for OEM-869 (Greek).
|
||||
|
||||
Note: OEM-869 competed with OEM-737 (which used a different Greek layout and preserved all of
|
||||
the box drawing characters rather than a subset, but omitted letters with combined trema/acute).
|
||||
"""
|
||||
name = 'cp869'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -362,24 +551,10 @@ class Cp869IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
register_kuroko_codec(['cp869', '869', 'cp-gr', 'csibm869', 'ibm869'], Cp869IncrementalEncoder, Cp869IncrementalDecoder)
|
||||
|
||||
|
||||
class Cp874IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
name = 'cp874'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
def encoding_map():
|
||||
return {8364: 128, 8230: 133, 8216: 145, 8217: 146, 8220: 147, 8221: 148, 8226: 149, 8211: 150, 8212: 151, 160: 160, 3585: 161, 3586: 162, 3587: 163, 3588: 164, 3589: 165, 3590: 166, 3591: 167, 3592: 168, 3593: 169, 3594: 170, 3595: 171, 3596: 172, 3597: 173, 3598: 174, 3599: 175, 3600: 176, 3601: 177, 3602: 178, 3603: 179, 3604: 180, 3605: 181, 3606: 182, 3607: 183, 3608: 184, 3609: 185, 3610: 186, 3611: 187, 3612: 188, 3613: 189, 3614: 190, 3615: 191, 3616: 192, 3617: 193, 3618: 194, 3619: 195, 3620: 196, 3621: 197, 3622: 198, 3623: 199, 3624: 200, 3625: 201, 3626: 202, 3627: 203, 3628: 204, 3629: 205, 3630: 206, 3631: 207, 3632: 208, 3633: 209, 3634: 210, 3635: 211, 3636: 212, 3637: 213, 3638: 214, 3639: 215, 3640: 216, 3641: 217, 3642: 218, 3647: 223, 3648: 224, 3649: 225, 3650: 226, 3651: 227, 3652: 228, 3653: 229, 3654: 230, 3655: 231, 3656: 232, 3657: 233, 3658: 234, 3659: 235, 3660: 236, 3661: 237, 3662: 238, 3663: 239, 3664: 240, 3665: 241, 3666: 242, 3667: 243, 3668: 244, 3669: 245, 3670: 246, 3671: 247, 3672: 248, 3673: 249, 3674: 250, 3675: 251}
|
||||
|
||||
class Cp874IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
name = 'cp874'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
def decoding_map():
|
||||
return {128: 8364, 133: 8230, 145: 8216, 146: 8217, 147: 8220, 148: 8221, 149: 8226, 150: 8211, 151: 8212, 160: 160, 161: 3585, 162: 3586, 163: 3587, 164: 3588, 165: 3589, 166: 3590, 167: 3591, 168: 3592, 169: 3593, 170: 3594, 171: 3595, 172: 3596, 173: 3597, 174: 3598, 175: 3599, 176: 3600, 177: 3601, 178: 3602, 179: 3603, 180: 3604, 181: 3605, 182: 3606, 183: 3607, 184: 3608, 185: 3609, 186: 3610, 187: 3611, 188: 3612, 189: 3613, 190: 3614, 191: 3615, 192: 3616, 193: 3617, 194: 3618, 195: 3619, 196: 3620, 197: 3621, 198: 3622, 199: 3623, 200: 3624, 201: 3625, 202: 3626, 203: 3627, 204: 3628, 205: 3629, 206: 3630, 207: 3631, 208: 3632, 209: 3633, 210: 3634, 211: 3635, 212: 3636, 213: 3637, 214: 3638, 215: 3639, 216: 3640, 217: 3641, 218: 3642, 223: 3647, 224: 3648, 225: 3649, 226: 3650, 227: 3651, 228: 3652, 229: 3653, 230: 3654, 231: 3655, 232: 3656, 233: 3657, 234: 3658, 235: 3659, 236: 3660, 237: 3661, 238: 3662, 239: 3663, 240: 3664, 241: 3665, 242: 3666, 243: 3667, 244: 3668, 245: 3669, 246: 3670, 247: 3671, 248: 3672, 249: 3673, 250: 3674, 251: 3675}
|
||||
|
||||
register_kuroko_codec(['cp874'], Cp874IncrementalEncoder, Cp874IncrementalDecoder)
|
||||
|
||||
|
||||
class Cp875IncrementalEncoder(BaseEbcdicIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for EBCDIC-875 (used in Greek-speaking locales).
|
||||
"""
|
||||
name = 'cp875'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -387,6 +562,9 @@ class Cp875IncrementalEncoder(BaseEbcdicIncrementalEncoder):
|
||||
return {0: 0, 1: 1, 2: 2, 3: 3, 156: 4, 9: 5, 134: 6, 127: 7, 151: 8, 141: 9, 142: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16, 17: 17, 18: 18, 19: 19, 157: 20, 133: 21, 8: 22, 135: 23, 24: 24, 25: 25, 146: 26, 143: 27, 28: 28, 29: 29, 30: 30, 31: 31, 128: 32, 129: 33, 130: 34, 131: 35, 132: 36, 10: 37, 23: 38, 27: 39, 136: 40, 137: 41, 138: 42, 139: 43, 140: 44, 5: 45, 6: 46, 7: 47, 144: 48, 145: 49, 22: 50, 147: 51, 148: 52, 149: 53, 150: 54, 4: 55, 152: 56, 153: 57, 154: 58, 155: 59, 20: 60, 21: 61, 158: 62, 26: 253, 32: 64, 913: 65, 914: 66, 915: 67, 916: 68, 917: 69, 918: 70, 919: 71, 920: 72, 921: 73, 91: 74, 46: 75, 60: 76, 40: 77, 43: 78, 33: 79, 38: 80, 922: 81, 923: 82, 924: 83, 925: 84, 926: 85, 927: 86, 928: 87, 929: 88, 931: 89, 93: 90, 36: 91, 42: 92, 41: 93, 59: 94, 94: 95, 45: 96, 47: 97, 932: 98, 933: 99, 934: 100, 935: 101, 936: 102, 937: 103, 938: 104, 939: 105, 124: 106, 44: 107, 37: 108, 95: 109, 62: 110, 63: 111, 168: 112, 902: 113, 904: 114, 905: 115, 160: 116, 906: 117, 908: 118, 910: 119, 911: 120, 96: 121, 58: 122, 35: 123, 64: 124, 39: 125, 61: 126, 34: 127, 901: 128, 97: 129, 98: 130, 99: 131, 100: 132, 101: 133, 102: 134, 103: 135, 104: 136, 105: 137, 945: 138, 946: 139, 947: 140, 948: 141, 949: 142, 950: 143, 176: 144, 106: 145, 107: 146, 108: 147, 109: 148, 110: 149, 111: 150, 112: 151, 113: 152, 114: 153, 951: 154, 952: 155, 953: 156, 954: 157, 955: 158, 956: 159, 180: 160, 126: 161, 115: 162, 116: 163, 117: 164, 118: 165, 119: 166, 120: 167, 121: 168, 122: 169, 957: 170, 958: 171, 959: 172, 960: 173, 961: 174, 963: 175, 163: 176, 940: 177, 941: 178, 942: 179, 970: 180, 943: 181, 972: 182, 973: 183, 971: 184, 974: 185, 962: 186, 964: 187, 965: 188, 966: 189, 967: 190, 968: 191, 123: 192, 65: 193, 66: 194, 67: 195, 68: 196, 69: 197, 70: 198, 71: 199, 72: 200, 73: 201, 173: 202, 969: 203, 912: 204, 944: 205, 8216: 206, 8213: 207, 125: 208, 74: 209, 75: 210, 76: 211, 77: 212, 78: 213, 79: 214, 80: 215, 81: 216, 82: 217, 177: 218, 189: 219, 903: 221, 8217: 222, 166: 223, 92: 224, 83: 226, 84: 227, 85: 228, 86: 229, 87: 230, 88: 231, 89: 232, 90: 233, 178: 234, 167: 235, 171: 238, 172: 239, 48: 240, 49: 241, 50: 242, 51: 243, 52: 244, 53: 245, 54: 246, 55: 247, 56: 248, 57: 249, 179: 250, 169: 251, 187: 254, 159: 255}
|
||||
|
||||
class Cp875IncrementalDecoder(BaseEbcdicIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for EBCDIC-875 (used in Greek-speaking locales).
|
||||
"""
|
||||
name = 'cp875'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -397,6 +575,9 @@ register_kuroko_codec(['cp875'], Cp875IncrementalEncoder, Cp875IncrementalDecode
|
||||
|
||||
|
||||
class Cp1006IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for OEM-1006 (Urdu).
|
||||
"""
|
||||
name = 'cp1006'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -404,6 +585,9 @@ class Cp1006IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {128: 128, 129: 129, 130: 130, 131: 131, 132: 132, 133: 133, 134: 134, 135: 135, 136: 136, 137: 137, 138: 138, 139: 139, 140: 140, 141: 141, 142: 142, 143: 143, 144: 144, 145: 145, 146: 146, 147: 147, 148: 148, 149: 149, 150: 150, 151: 151, 152: 152, 153: 153, 154: 154, 155: 155, 156: 156, 157: 157, 158: 158, 159: 159, 160: 160, 1776: 161, 1777: 162, 1778: 163, 1779: 164, 1780: 165, 1781: 166, 1782: 167, 1783: 168, 1784: 169, 1785: 170, 1548: 171, 1563: 172, 173: 173, 1567: 174, 65153: 175, 65165: 176, 65166: 178, 65167: 179, 65169: 180, 64342: 181, 64344: 182, 65171: 183, 65173: 184, 65175: 185, 64358: 186, 64360: 187, 65177: 188, 65179: 189, 65181: 190, 65183: 191, 64378: 192, 64380: 193, 65185: 194, 65187: 195, 65189: 196, 65191: 197, 65193: 198, 64388: 199, 65195: 200, 65197: 201, 64396: 202, 65199: 203, 64394: 204, 65201: 205, 65203: 206, 65205: 207, 65207: 208, 65209: 209, 65211: 210, 65213: 211, 65215: 212, 65217: 213, 65221: 214, 65225: 215, 65226: 216, 65227: 217, 65228: 218, 65229: 219, 65230: 220, 65231: 221, 65232: 222, 65233: 223, 65235: 224, 65237: 225, 65239: 226, 65241: 227, 65243: 228, 64402: 229, 64404: 230, 65245: 231, 65247: 232, 65248: 233, 65249: 234, 65251: 235, 64414: 236, 65253: 237, 65255: 238, 65157: 239, 65261: 240, 64422: 241, 64424: 242, 64425: 243, 64426: 244, 65152: 245, 65161: 246, 65162: 247, 65163: 248, 65265: 249, 65266: 250, 65267: 251, 64432: 252, 64430: 253, 65148: 254, 65149: 255}
|
||||
|
||||
class Cp1006IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for OEM-1006 (Urdu).
|
||||
"""
|
||||
name = 'cp1006'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -414,6 +598,9 @@ register_kuroko_codec(['cp1006'], Cp1006IncrementalEncoder, Cp1006IncrementalDec
|
||||
|
||||
|
||||
class Cp1026IncrementalEncoder(BaseEbcdicIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for EBCDIC-1026 (used in Turkish-speaking locales).
|
||||
"""
|
||||
name = 'cp1026'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -421,6 +608,9 @@ class Cp1026IncrementalEncoder(BaseEbcdicIncrementalEncoder):
|
||||
return {0: 0, 1: 1, 2: 2, 3: 3, 156: 4, 9: 5, 134: 6, 127: 7, 151: 8, 141: 9, 142: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16, 17: 17, 18: 18, 19: 19, 157: 20, 133: 21, 8: 22, 135: 23, 24: 24, 25: 25, 146: 26, 143: 27, 28: 28, 29: 29, 30: 30, 31: 31, 128: 32, 129: 33, 130: 34, 131: 35, 132: 36, 10: 37, 23: 38, 27: 39, 136: 40, 137: 41, 138: 42, 139: 43, 140: 44, 5: 45, 6: 46, 7: 47, 144: 48, 145: 49, 22: 50, 147: 51, 148: 52, 149: 53, 150: 54, 4: 55, 152: 56, 153: 57, 154: 58, 155: 59, 20: 60, 21: 61, 158: 62, 26: 63, 32: 64, 160: 65, 226: 66, 228: 67, 224: 68, 225: 69, 227: 70, 229: 71, 123: 72, 241: 73, 199: 74, 46: 75, 60: 76, 40: 77, 43: 78, 33: 79, 38: 80, 233: 81, 234: 82, 235: 83, 232: 84, 237: 85, 238: 86, 239: 87, 236: 88, 223: 89, 286: 90, 304: 91, 42: 92, 41: 93, 59: 94, 94: 95, 45: 96, 47: 97, 194: 98, 196: 99, 192: 100, 193: 101, 195: 102, 197: 103, 91: 104, 209: 105, 351: 106, 44: 107, 37: 108, 95: 109, 62: 110, 63: 111, 248: 112, 201: 113, 202: 114, 203: 115, 200: 116, 205: 117, 206: 118, 207: 119, 204: 120, 305: 121, 58: 122, 214: 123, 350: 124, 39: 125, 61: 126, 220: 127, 216: 128, 97: 129, 98: 130, 99: 131, 100: 132, 101: 133, 102: 134, 103: 135, 104: 136, 105: 137, 171: 138, 187: 139, 125: 140, 96: 141, 166: 142, 177: 143, 176: 144, 106: 145, 107: 146, 108: 147, 109: 148, 110: 149, 111: 150, 112: 151, 113: 152, 114: 153, 170: 154, 186: 155, 230: 156, 184: 157, 198: 158, 164: 159, 181: 160, 246: 161, 115: 162, 116: 163, 117: 164, 118: 165, 119: 166, 120: 167, 121: 168, 122: 169, 161: 170, 191: 171, 93: 172, 36: 173, 64: 174, 174: 175, 162: 176, 163: 177, 165: 178, 183: 179, 169: 180, 167: 181, 182: 182, 188: 183, 189: 184, 190: 185, 172: 186, 124: 187, 175: 188, 168: 189, 180: 190, 215: 191, 231: 192, 65: 193, 66: 194, 67: 195, 68: 196, 69: 197, 70: 198, 71: 199, 72: 200, 73: 201, 173: 202, 244: 203, 126: 204, 242: 205, 243: 206, 245: 207, 287: 208, 74: 209, 75: 210, 76: 211, 77: 212, 78: 213, 79: 214, 80: 215, 81: 216, 82: 217, 185: 218, 251: 219, 92: 220, 249: 221, 250: 222, 255: 223, 252: 224, 247: 225, 83: 226, 84: 227, 85: 228, 86: 229, 87: 230, 88: 231, 89: 232, 90: 233, 178: 234, 212: 235, 35: 236, 210: 237, 211: 238, 213: 239, 48: 240, 49: 241, 50: 242, 51: 243, 52: 244, 53: 245, 54: 246, 55: 247, 56: 248, 57: 249, 179: 250, 219: 251, 34: 252, 217: 253, 218: 254, 159: 255}
|
||||
|
||||
class Cp1026IncrementalDecoder(BaseEbcdicIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for EBCDIC-1026 (used in Turkish-speaking locales).
|
||||
"""
|
||||
name = 'cp1026'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -431,6 +621,12 @@ register_kuroko_codec(['cp1026', '1026', 'csibm1026', 'ibm1026'], Cp1026Incremen
|
||||
|
||||
|
||||
class Cp1125IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for OEM-1125 (Ukrainian Cyrillic).
|
||||
|
||||
OEM-1125 is the Ukrainian standard RST 2018-91; due to both being modifications of the so-called
|
||||
Alternative Code Page, OEM-1125 and OEM-866 are compatible for the Russian/Bulgarian letters.
|
||||
"""
|
||||
name = 'cp1125'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -438,6 +634,12 @@ class Cp1125IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {1040: 128, 1041: 129, 1042: 130, 1043: 131, 1044: 132, 1045: 133, 1046: 134, 1047: 135, 1048: 136, 1049: 137, 1050: 138, 1051: 139, 1052: 140, 1053: 141, 1054: 142, 1055: 143, 1056: 144, 1057: 145, 1058: 146, 1059: 147, 1060: 148, 1061: 149, 1062: 150, 1063: 151, 1064: 152, 1065: 153, 1066: 154, 1067: 155, 1068: 156, 1069: 157, 1070: 158, 1071: 159, 1072: 160, 1073: 161, 1074: 162, 1075: 163, 1076: 164, 1077: 165, 1078: 166, 1079: 167, 1080: 168, 1081: 169, 1082: 170, 1083: 171, 1084: 172, 1085: 173, 1086: 174, 1087: 175, 9617: 176, 9618: 177, 9619: 178, 9474: 179, 9508: 180, 9569: 181, 9570: 182, 9558: 183, 9557: 184, 9571: 185, 9553: 186, 9559: 187, 9565: 188, 9564: 189, 9563: 190, 9488: 191, 9492: 192, 9524: 193, 9516: 194, 9500: 195, 9472: 196, 9532: 197, 9566: 198, 9567: 199, 9562: 200, 9556: 201, 9577: 202, 9574: 203, 9568: 204, 9552: 205, 9580: 206, 9575: 207, 9576: 208, 9572: 209, 9573: 210, 9561: 211, 9560: 212, 9554: 213, 9555: 214, 9579: 215, 9578: 216, 9496: 217, 9484: 218, 9608: 219, 9604: 220, 9612: 221, 9616: 222, 9600: 223, 1088: 224, 1089: 225, 1090: 226, 1091: 227, 1092: 228, 1093: 229, 1094: 230, 1095: 231, 1096: 232, 1097: 233, 1098: 234, 1099: 235, 1100: 236, 1101: 237, 1102: 238, 1103: 239, 1025: 240, 1105: 241, 1168: 242, 1169: 243, 1028: 244, 1108: 245, 1030: 246, 1110: 247, 1031: 248, 1111: 249, 183: 250, 8730: 251, 8470: 252, 164: 253, 9632: 254, 160: 255}
|
||||
|
||||
class Cp1125IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for OEM-1125 (Ukrainian Cyrillic).
|
||||
|
||||
OEM-1125 is the Ukrainian standard RST 2018-91; due to both being modifications of the so-called
|
||||
Alternative Code Page, OEM-1125 and OEM-866 are compatible for the Russian/Bulgarian letters.
|
||||
"""
|
||||
name = 'cp1125'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -448,6 +650,9 @@ register_kuroko_codec(['cp1125', '1125', 'ibm1125', 'cp866u', 'ruscii'], Cp1125I
|
||||
|
||||
|
||||
class Cp1140IncrementalEncoder(BaseEbcdicIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for EBCDIC-1140 (EBCDIC with Euro sign).
|
||||
"""
|
||||
name = 'cp1140'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -455,6 +660,9 @@ class Cp1140IncrementalEncoder(BaseEbcdicIncrementalEncoder):
|
||||
return {0: 0, 1: 1, 2: 2, 3: 3, 156: 4, 9: 5, 134: 6, 127: 7, 151: 8, 141: 9, 142: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16, 17: 17, 18: 18, 19: 19, 157: 20, 133: 21, 8: 22, 135: 23, 24: 24, 25: 25, 146: 26, 143: 27, 28: 28, 29: 29, 30: 30, 31: 31, 128: 32, 129: 33, 130: 34, 131: 35, 132: 36, 10: 37, 23: 38, 27: 39, 136: 40, 137: 41, 138: 42, 139: 43, 140: 44, 5: 45, 6: 46, 7: 47, 144: 48, 145: 49, 22: 50, 147: 51, 148: 52, 149: 53, 150: 54, 4: 55, 152: 56, 153: 57, 154: 58, 155: 59, 20: 60, 21: 61, 158: 62, 26: 63, 32: 64, 160: 65, 226: 66, 228: 67, 224: 68, 225: 69, 227: 70, 229: 71, 231: 72, 241: 73, 162: 74, 46: 75, 60: 76, 40: 77, 43: 78, 124: 79, 38: 80, 233: 81, 234: 82, 235: 83, 232: 84, 237: 85, 238: 86, 239: 87, 236: 88, 223: 89, 33: 90, 36: 91, 42: 92, 41: 93, 59: 94, 172: 95, 45: 96, 47: 97, 194: 98, 196: 99, 192: 100, 193: 101, 195: 102, 197: 103, 199: 104, 209: 105, 166: 106, 44: 107, 37: 108, 95: 109, 62: 110, 63: 111, 248: 112, 201: 113, 202: 114, 203: 115, 200: 116, 205: 117, 206: 118, 207: 119, 204: 120, 96: 121, 58: 122, 35: 123, 64: 124, 39: 125, 61: 126, 34: 127, 216: 128, 97: 129, 98: 130, 99: 131, 100: 132, 101: 133, 102: 134, 103: 135, 104: 136, 105: 137, 171: 138, 187: 139, 240: 140, 253: 141, 254: 142, 177: 143, 176: 144, 106: 145, 107: 146, 108: 147, 109: 148, 110: 149, 111: 150, 112: 151, 113: 152, 114: 153, 170: 154, 186: 155, 230: 156, 184: 157, 198: 158, 8364: 159, 181: 160, 126: 161, 115: 162, 116: 163, 117: 164, 118: 165, 119: 166, 120: 167, 121: 168, 122: 169, 161: 170, 191: 171, 208: 172, 221: 173, 222: 174, 174: 175, 94: 176, 163: 177, 165: 178, 183: 179, 169: 180, 167: 181, 182: 182, 188: 183, 189: 184, 190: 185, 91: 186, 93: 187, 175: 188, 168: 189, 180: 190, 215: 191, 123: 192, 65: 193, 66: 194, 67: 195, 68: 196, 69: 197, 70: 198, 71: 199, 72: 200, 73: 201, 173: 202, 244: 203, 246: 204, 242: 205, 243: 206, 245: 207, 125: 208, 74: 209, 75: 210, 76: 211, 77: 212, 78: 213, 79: 214, 80: 215, 81: 216, 82: 217, 185: 218, 251: 219, 252: 220, 249: 221, 250: 222, 255: 223, 92: 224, 247: 225, 83: 226, 84: 227, 85: 228, 86: 229, 87: 230, 88: 231, 89: 232, 90: 233, 178: 234, 212: 235, 214: 236, 210: 237, 211: 238, 213: 239, 48: 240, 49: 241, 50: 242, 51: 243, 52: 244, 53: 245, 54: 246, 55: 247, 56: 248, 57: 249, 179: 250, 219: 251, 220: 252, 217: 253, 218: 254, 159: 255}
|
||||
|
||||
class Cp1140IncrementalDecoder(BaseEbcdicIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for EBCDIC-1140 (EBCDIC with Euro sign).
|
||||
"""
|
||||
name = 'cp1140'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -465,6 +673,9 @@ register_kuroko_codec(['cp1140', '1140', 'ibm1140'], Cp1140IncrementalEncoder, C
|
||||
|
||||
|
||||
class HpRoman8IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for the HP 8-bit Roman encoding.
|
||||
"""
|
||||
name = 'hp-roman8'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -472,6 +683,9 @@ class HpRoman8IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {128: 128, 129: 129, 130: 130, 131: 131, 132: 132, 133: 133, 134: 134, 135: 135, 136: 136, 137: 137, 138: 138, 139: 139, 140: 140, 141: 141, 142: 142, 143: 143, 144: 144, 145: 145, 146: 146, 147: 147, 148: 148, 149: 149, 150: 150, 151: 151, 152: 152, 153: 153, 154: 154, 155: 155, 156: 156, 157: 157, 158: 158, 159: 159, 160: 160, 192: 161, 194: 162, 200: 163, 202: 164, 203: 165, 206: 166, 207: 167, 180: 168, 715: 169, 710: 170, 168: 171, 732: 172, 217: 173, 219: 174, 8356: 175, 175: 176, 221: 177, 253: 178, 176: 179, 199: 180, 231: 181, 209: 182, 241: 183, 161: 184, 191: 185, 164: 186, 163: 187, 165: 188, 167: 189, 402: 190, 162: 191, 226: 192, 234: 193, 244: 194, 251: 195, 225: 196, 233: 197, 243: 198, 250: 199, 224: 200, 232: 201, 242: 202, 249: 203, 228: 204, 235: 205, 246: 206, 252: 207, 197: 208, 238: 209, 216: 210, 198: 211, 229: 212, 237: 213, 248: 214, 230: 215, 196: 216, 236: 217, 214: 218, 220: 219, 201: 220, 239: 221, 223: 222, 212: 223, 193: 224, 195: 225, 227: 226, 208: 227, 240: 228, 205: 229, 204: 230, 211: 231, 210: 232, 213: 233, 245: 234, 352: 235, 353: 236, 218: 237, 376: 238, 255: 239, 222: 240, 254: 241, 183: 242, 181: 243, 182: 244, 190: 245, 8212: 246, 188: 247, 189: 248, 170: 249, 186: 250, 171: 251, 9632: 252, 187: 253, 177: 254}
|
||||
|
||||
class HpRoman8IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for the HP 8-bit Roman encoding.
|
||||
"""
|
||||
name = 'hp-roman8'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -482,6 +696,9 @@ register_kuroko_codec(['hp-roman8', 'roman8', 'r8', 'csHPRoman8', 'cp1051', 'ibm
|
||||
|
||||
|
||||
class Koi8TIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for the KOI8-T (KOI-8 Cyrillic for Tajik) encoding.
|
||||
"""
|
||||
name = 'koi8-t'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -489,6 +706,9 @@ class Koi8TIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {1179: 128, 1171: 129, 8218: 130, 1170: 131, 8222: 132, 8230: 133, 8224: 134, 8225: 135, 8240: 137, 1203: 138, 8249: 139, 1202: 140, 1207: 141, 1206: 142, 1178: 144, 8216: 145, 8217: 146, 8220: 147, 8221: 148, 8226: 149, 8211: 150, 8212: 151, 8482: 153, 8250: 155, 1263: 161, 1262: 162, 1105: 163, 164: 164, 1251: 165, 166: 166, 167: 167, 171: 171, 172: 172, 173: 173, 174: 174, 176: 176, 177: 177, 178: 178, 1025: 179, 1250: 181, 182: 182, 183: 183, 8470: 185, 187: 187, 169: 191, 1102: 192, 1072: 193, 1073: 194, 1094: 195, 1076: 196, 1077: 197, 1092: 198, 1075: 199, 1093: 200, 1080: 201, 1081: 202, 1082: 203, 1083: 204, 1084: 205, 1085: 206, 1086: 207, 1087: 208, 1103: 209, 1088: 210, 1089: 211, 1090: 212, 1091: 213, 1078: 214, 1074: 215, 1100: 216, 1099: 217, 1079: 218, 1096: 219, 1101: 220, 1097: 221, 1095: 222, 1098: 223, 1070: 224, 1040: 225, 1041: 226, 1062: 227, 1044: 228, 1045: 229, 1060: 230, 1043: 231, 1061: 232, 1048: 233, 1049: 234, 1050: 235, 1051: 236, 1052: 237, 1053: 238, 1054: 239, 1055: 240, 1071: 241, 1056: 242, 1057: 243, 1058: 244, 1059: 245, 1046: 246, 1042: 247, 1068: 248, 1067: 249, 1047: 250, 1064: 251, 1069: 252, 1065: 253, 1063: 254, 1066: 255}
|
||||
|
||||
class Koi8TIncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for the KOI8-T (KOI-8 Cyrillic for Tajik) encoding.
|
||||
"""
|
||||
name = 'koi8-t'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -499,6 +719,11 @@ register_kuroko_codec(['koi8-t'], Koi8TIncrementalEncoder, Koi8TIncrementalDecod
|
||||
|
||||
|
||||
class Kz1048IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for Kazakh standard KZ-1048.
|
||||
|
||||
This is an modification of Windows-1251 to add support for Kazakh.
|
||||
"""
|
||||
name = 'kz1048'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -506,6 +731,11 @@ class Kz1048IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {1026: 128, 1027: 129, 8218: 130, 1107: 131, 8222: 132, 8230: 133, 8224: 134, 8225: 135, 8364: 136, 8240: 137, 1033: 138, 8249: 139, 1034: 140, 1178: 141, 1210: 142, 1039: 143, 1106: 144, 8216: 145, 8217: 146, 8220: 147, 8221: 148, 8226: 149, 8211: 150, 8212: 151, 8482: 153, 1113: 154, 8250: 155, 1114: 156, 1179: 157, 1211: 158, 1119: 159, 160: 160, 1200: 161, 1201: 162, 1240: 163, 164: 164, 1256: 165, 166: 166, 167: 167, 1025: 168, 169: 169, 1170: 170, 171: 171, 172: 172, 173: 173, 174: 174, 1198: 175, 176: 176, 177: 177, 1030: 178, 1110: 179, 1257: 180, 181: 181, 182: 182, 183: 183, 1105: 184, 8470: 185, 1171: 186, 187: 187, 1241: 188, 1186: 189, 1187: 190, 1199: 191, 1040: 192, 1041: 193, 1042: 194, 1043: 195, 1044: 196, 1045: 197, 1046: 198, 1047: 199, 1048: 200, 1049: 201, 1050: 202, 1051: 203, 1052: 204, 1053: 205, 1054: 206, 1055: 207, 1056: 208, 1057: 209, 1058: 210, 1059: 211, 1060: 212, 1061: 213, 1062: 214, 1063: 215, 1064: 216, 1065: 217, 1066: 218, 1067: 219, 1068: 220, 1069: 221, 1070: 222, 1071: 223, 1072: 224, 1073: 225, 1074: 226, 1075: 227, 1076: 228, 1077: 229, 1078: 230, 1079: 231, 1080: 232, 1081: 233, 1082: 234, 1083: 235, 1084: 236, 1085: 237, 1086: 238, 1087: 239, 1088: 240, 1089: 241, 1090: 242, 1091: 243, 1092: 244, 1093: 245, 1094: 246, 1095: 247, 1096: 248, 1097: 249, 1098: 250, 1099: 251, 1100: 252, 1101: 253, 1102: 254, 1103: 255}
|
||||
|
||||
class Kz1048IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for Kazakh standard KZ-1048.
|
||||
|
||||
This is an modification of Windows-1251 to add support for Kazakh.
|
||||
"""
|
||||
name = 'kz1048'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -516,6 +746,11 @@ register_kuroko_codec(['kz1048', 'kz-1048', 'rk1048', 'strk1048-2002'], Kz1048In
|
||||
|
||||
|
||||
class PalmosIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for the PalmOS encoding.
|
||||
|
||||
This is an modification of ISO-8859-1 along similar lines to Windows-1252.
|
||||
"""
|
||||
name = 'palmos'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -523,6 +758,11 @@ class PalmosIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {8364: 128, 129: 129, 8218: 130, 402: 131, 8222: 132, 8230: 133, 8224: 134, 8225: 135, 710: 136, 8240: 137, 352: 138, 8249: 139, 338: 140, 9830: 141, 9827: 142, 9829: 143, 9824: 144, 8216: 145, 8217: 146, 8220: 147, 8221: 148, 8226: 149, 8211: 150, 8212: 151, 732: 152, 8482: 153, 353: 154, 155: 155, 339: 156, 157: 157, 158: 158, 376: 159, 160: 160, 161: 161, 162: 162, 163: 163, 164: 164, 165: 165, 166: 166, 167: 167, 168: 168, 169: 169, 170: 170, 171: 171, 172: 172, 173: 173, 174: 174, 175: 175, 176: 176, 177: 177, 178: 178, 179: 179, 180: 180, 181: 181, 182: 182, 183: 183, 184: 184, 185: 185, 186: 186, 187: 187, 188: 188, 189: 189, 190: 190, 191: 191, 192: 192, 193: 193, 194: 194, 195: 195, 196: 196, 197: 197, 198: 198, 199: 199, 200: 200, 201: 201, 202: 202, 203: 203, 204: 204, 205: 205, 206: 206, 207: 207, 208: 208, 209: 209, 210: 210, 211: 211, 212: 212, 213: 213, 214: 214, 215: 215, 216: 216, 217: 217, 218: 218, 219: 219, 220: 220, 221: 221, 222: 222, 223: 223, 224: 224, 225: 225, 226: 226, 227: 227, 228: 228, 229: 229, 230: 230, 231: 231, 232: 232, 233: 233, 234: 234, 235: 235, 236: 236, 237: 237, 238: 238, 239: 239, 240: 240, 241: 241, 242: 242, 243: 243, 244: 244, 245: 245, 246: 246, 247: 247, 248: 248, 249: 249, 250: 250, 251: 251, 252: 252, 253: 253, 254: 254, 255: 255}
|
||||
|
||||
class PalmosIncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for the PalmOS encoding.
|
||||
|
||||
This is an modification of ISO-8859-1 along similar lines to Windows-1252.
|
||||
"""
|
||||
name = 'palmos'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -533,6 +773,11 @@ register_kuroko_codec(['palmos'], PalmosIncrementalEncoder, PalmosIncrementalDec
|
||||
|
||||
|
||||
class Ptcp154IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for Paratype PTCP-154 (Asian Cyrillic).
|
||||
|
||||
This is an modification of Windows-1251 to add support for Asian Cyrillic orthographies.
|
||||
"""
|
||||
name = 'ptcp154'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -540,6 +785,11 @@ class Ptcp154IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {1174: 128, 1170: 129, 1262: 130, 1171: 131, 8222: 132, 8230: 133, 1206: 134, 1198: 135, 1202: 136, 1199: 137, 1184: 138, 1250: 139, 1186: 140, 1178: 141, 1210: 142, 1208: 143, 1175: 144, 8216: 145, 8217: 146, 8220: 147, 8221: 148, 8226: 149, 8211: 150, 8212: 151, 1203: 152, 1207: 153, 1185: 154, 1251: 155, 1187: 156, 1179: 157, 1211: 158, 1209: 159, 160: 160, 1038: 161, 1118: 162, 1032: 163, 1256: 164, 1176: 165, 1200: 166, 167: 167, 1025: 168, 169: 169, 1240: 170, 171: 171, 172: 172, 1263: 173, 174: 174, 1180: 175, 176: 176, 1201: 177, 1030: 178, 1110: 179, 1177: 180, 1257: 181, 182: 182, 183: 183, 1105: 184, 8470: 185, 1241: 186, 187: 187, 1112: 188, 1194: 189, 1195: 190, 1181: 191, 1040: 192, 1041: 193, 1042: 194, 1043: 195, 1044: 196, 1045: 197, 1046: 198, 1047: 199, 1048: 200, 1049: 201, 1050: 202, 1051: 203, 1052: 204, 1053: 205, 1054: 206, 1055: 207, 1056: 208, 1057: 209, 1058: 210, 1059: 211, 1060: 212, 1061: 213, 1062: 214, 1063: 215, 1064: 216, 1065: 217, 1066: 218, 1067: 219, 1068: 220, 1069: 221, 1070: 222, 1071: 223, 1072: 224, 1073: 225, 1074: 226, 1075: 227, 1076: 228, 1077: 229, 1078: 230, 1079: 231, 1080: 232, 1081: 233, 1082: 234, 1083: 235, 1084: 236, 1085: 237, 1086: 238, 1087: 239, 1088: 240, 1089: 241, 1090: 242, 1091: 243, 1092: 244, 1093: 245, 1094: 246, 1095: 247, 1096: 248, 1097: 249, 1098: 250, 1099: 251, 1100: 252, 1101: 253, 1102: 254, 1103: 255}
|
||||
|
||||
class Ptcp154IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for Paratype PTCP-154 (Asian Cyrillic).
|
||||
|
||||
This is an modification of Windows-1251 to add support for Asian Cyrillic orthographies.
|
||||
"""
|
||||
name = 'ptcp154'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -550,6 +800,9 @@ register_kuroko_codec(['ptcp154', 'csptcp154', 'pt154', 'cp154', 'cyrillic-asian
|
||||
|
||||
|
||||
class XMacArabicIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for the Macintosh Arabic encoding.
|
||||
"""
|
||||
name = 'x-mac-arabic'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -557,6 +810,9 @@ class XMacArabicIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {196: 128, 160: 129, 199: 130, 201: 131, 209: 132, 214: 133, 220: 134, 225: 135, 224: 136, 226: 137, 228: 138, 1722: 139, 171: 140, 231: 141, 233: 142, 232: 143, 234: 144, 235: 145, 237: 146, 8230: 147, 238: 148, 239: 149, 241: 150, 243: 151, 187: 152, 244: 153, 246: 154, 247: 155, 250: 156, 249: 157, 251: 158, 252: 159, 32: 160, 33: 161, 34: 162, 35: 163, 36: 164, 1642: 165, 38: 166, 39: 167, 40: 168, 41: 169, 42: 170, 43: 171, 1548: 172, 45: 173, 46: 174, 47: 175, 1632: 176, 1633: 177, 1634: 178, 1635: 179, 1636: 180, 1637: 181, 1638: 182, 1639: 183, 1640: 184, 1641: 185, 58: 186, 1563: 187, 60: 188, 61: 189, 62: 190, 1567: 191, 10058: 192, 1569: 193, 1570: 194, 1571: 195, 1572: 196, 1573: 197, 1574: 198, 1575: 199, 1576: 200, 1577: 201, 1578: 202, 1579: 203, 1580: 204, 1581: 205, 1582: 206, 1583: 207, 1584: 208, 1585: 209, 1586: 210, 1587: 211, 1588: 212, 1589: 213, 1590: 214, 1591: 215, 1592: 216, 1593: 217, 1594: 218, 91: 219, 92: 220, 93: 221, 94: 222, 95: 223, 1600: 224, 1601: 225, 1602: 226, 1603: 227, 1604: 228, 1605: 229, 1606: 230, 1607: 231, 1608: 232, 1609: 233, 1610: 234, 1611: 235, 1612: 236, 1613: 237, 1614: 238, 1615: 239, 1616: 240, 1617: 241, 1618: 242, 1662: 243, 1657: 244, 1670: 245, 1749: 246, 1700: 247, 1711: 248, 1672: 249, 1681: 250, 123: 251, 124: 252, 125: 253, 1688: 254, 1746: 255}
|
||||
|
||||
class XMacArabicIncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for the Macintosh Arabic encoding.
|
||||
"""
|
||||
name = 'x-mac-arabic'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -567,6 +823,9 @@ register_kuroko_codec(['mac-arabic', 'x-mac-arabic'], XMacArabicIncrementalEncod
|
||||
|
||||
|
||||
class XMacCeIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for the Macintosh Central European encoding.
|
||||
"""
|
||||
name = 'x-mac-ce'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -574,6 +833,9 @@ class XMacCeIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {196: 128, 256: 129, 257: 130, 201: 131, 260: 132, 214: 133, 220: 134, 225: 135, 261: 136, 268: 137, 228: 138, 269: 139, 262: 140, 263: 141, 233: 142, 377: 143, 378: 144, 270: 145, 237: 146, 271: 147, 274: 148, 275: 149, 278: 150, 243: 151, 279: 152, 244: 153, 246: 154, 245: 155, 250: 156, 282: 157, 283: 158, 252: 159, 8224: 160, 176: 161, 280: 162, 163: 163, 167: 164, 8226: 165, 182: 166, 223: 167, 174: 168, 169: 169, 8482: 170, 281: 171, 168: 172, 8800: 173, 291: 174, 302: 175, 303: 176, 298: 177, 8804: 178, 8805: 179, 299: 180, 310: 181, 8706: 182, 8721: 183, 322: 184, 315: 185, 316: 186, 317: 187, 318: 188, 313: 189, 314: 190, 325: 191, 326: 192, 323: 193, 172: 194, 8730: 195, 324: 196, 327: 197, 8710: 198, 171: 199, 187: 200, 8230: 201, 160: 202, 328: 203, 336: 204, 213: 205, 337: 206, 332: 207, 8211: 208, 8212: 209, 8220: 210, 8221: 211, 8216: 212, 8217: 213, 247: 214, 9674: 215, 333: 216, 340: 217, 341: 218, 344: 219, 8249: 220, 8250: 221, 345: 222, 342: 223, 343: 224, 352: 225, 8218: 226, 8222: 227, 353: 228, 346: 229, 347: 230, 193: 231, 356: 232, 357: 233, 205: 234, 381: 235, 382: 236, 362: 237, 211: 238, 212: 239, 363: 240, 366: 241, 218: 242, 367: 243, 368: 244, 369: 245, 370: 246, 371: 247, 221: 248, 253: 249, 311: 250, 379: 251, 321: 252, 380: 253, 290: 254, 711: 255}
|
||||
|
||||
class XMacCeIncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for the Macintosh Central European encoding.
|
||||
"""
|
||||
name = 'x-mac-ce'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -584,6 +846,13 @@ register_kuroko_codec(['mac-centeuro', 'x-mac-ce', 'mac-latin2', 'maccentraleuro
|
||||
|
||||
|
||||
class XMacCroatianIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for the Macintosh Croatian encoding.
|
||||
|
||||
In contrast to the Windows and ISO Central European encodings, the Macintosh Central European
|
||||
encoding did not include complete coverage for Gajica, hence a separate encoding was used.
|
||||
The two do not resemble one another except insofar as both derive from Macintosh Roman.
|
||||
"""
|
||||
name = 'x-mac-croatian'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -591,6 +860,13 @@ class XMacCroatianIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {196: 128, 197: 129, 199: 130, 201: 131, 209: 132, 214: 133, 220: 134, 225: 135, 224: 136, 226: 137, 228: 138, 227: 139, 229: 140, 231: 141, 233: 142, 232: 143, 234: 144, 235: 145, 237: 146, 236: 147, 238: 148, 239: 149, 241: 150, 243: 151, 242: 152, 244: 153, 246: 154, 245: 155, 250: 156, 249: 157, 251: 158, 252: 159, 8224: 160, 176: 161, 162: 162, 163: 163, 167: 164, 8226: 165, 182: 166, 223: 167, 174: 168, 352: 169, 8482: 170, 180: 171, 168: 172, 8800: 173, 381: 174, 216: 175, 8734: 176, 177: 177, 8804: 178, 8805: 179, 8710: 180, 181: 181, 8706: 182, 8721: 183, 8719: 184, 353: 185, 8747: 186, 170: 187, 186: 188, 937: 189, 382: 190, 248: 191, 191: 192, 161: 193, 172: 194, 8730: 195, 402: 196, 8776: 197, 262: 198, 171: 199, 268: 200, 8230: 201, 160: 202, 192: 203, 195: 204, 213: 205, 338: 206, 339: 207, 272: 208, 8212: 209, 8220: 210, 8221: 211, 8216: 212, 8217: 213, 247: 214, 9674: 215, 63743: 216, 169: 217, 8260: 218, 8364: 219, 8249: 220, 8250: 221, 198: 222, 187: 223, 8211: 224, 183: 225, 8218: 226, 8222: 227, 8240: 228, 194: 229, 263: 230, 193: 231, 269: 232, 200: 233, 205: 234, 206: 235, 207: 236, 204: 237, 211: 238, 212: 239, 273: 240, 210: 241, 218: 242, 219: 243, 217: 244, 305: 245, 710: 246, 732: 247, 175: 248, 960: 249, 203: 250, 730: 251, 184: 252, 202: 253, 230: 254, 711: 255}
|
||||
|
||||
class XMacCroatianIncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for the Macintosh Croatian encoding.
|
||||
|
||||
In contrast to the Windows and ISO Central European encodings, the Macintosh Central European
|
||||
encoding did not include complete coverage for Gajica, hence a separate encoding was used.
|
||||
The two do not resemble one another except insofar as both derive from Macintosh Roman.
|
||||
"""
|
||||
name = 'x-mac-croatian'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -601,6 +877,9 @@ register_kuroko_codec(['mac-croatian', 'x-mac-croatian'], XMacCroatianIncrementa
|
||||
|
||||
|
||||
class XMacFarsiIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for the Macintosh Farsi encoding.
|
||||
"""
|
||||
name = 'x-mac-farsi'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -608,6 +887,9 @@ class XMacFarsiIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {196: 128, 160: 129, 199: 130, 201: 131, 209: 132, 214: 133, 220: 134, 225: 135, 224: 136, 226: 137, 228: 138, 1722: 139, 171: 140, 231: 141, 233: 142, 232: 143, 234: 144, 235: 145, 237: 146, 8230: 147, 238: 148, 239: 149, 241: 150, 243: 151, 187: 152, 244: 153, 246: 154, 247: 155, 250: 156, 249: 157, 251: 158, 252: 159, 32: 160, 33: 161, 34: 162, 35: 163, 36: 164, 1642: 165, 38: 166, 39: 167, 40: 168, 41: 169, 42: 170, 43: 171, 1548: 172, 45: 173, 46: 174, 47: 175, 1776: 176, 1777: 177, 1778: 178, 1779: 179, 1780: 180, 1781: 181, 1782: 182, 1783: 183, 1784: 184, 1785: 185, 58: 186, 1563: 187, 60: 188, 61: 189, 62: 190, 1567: 191, 10058: 192, 1569: 193, 1570: 194, 1571: 195, 1572: 196, 1573: 197, 1574: 198, 1575: 199, 1576: 200, 1577: 201, 1578: 202, 1579: 203, 1580: 204, 1581: 205, 1582: 206, 1583: 207, 1584: 208, 1585: 209, 1586: 210, 1587: 211, 1588: 212, 1589: 213, 1590: 214, 1591: 215, 1592: 216, 1593: 217, 1594: 218, 91: 219, 92: 220, 93: 221, 94: 222, 95: 223, 1600: 224, 1601: 225, 1602: 226, 1603: 227, 1604: 228, 1605: 229, 1606: 230, 1607: 231, 1608: 232, 1609: 233, 1610: 234, 1611: 235, 1612: 236, 1613: 237, 1614: 238, 1615: 239, 1616: 240, 1617: 241, 1618: 242, 1662: 243, 1657: 244, 1670: 245, 1749: 246, 1700: 247, 1711: 248, 1672: 249, 1681: 250, 123: 251, 124: 252, 125: 253, 1688: 254, 1746: 255}
|
||||
|
||||
class XMacFarsiIncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for the Macintosh Farsi encoding.
|
||||
"""
|
||||
name = 'x-mac-farsi'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -618,6 +900,9 @@ register_kuroko_codec(['mac-farsi', 'x-mac-farsi'], XMacFarsiIncrementalEncoder,
|
||||
|
||||
|
||||
class XMacGreekIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for the Macintosh Greek encoding.
|
||||
"""
|
||||
name = 'x-mac-greek'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -625,6 +910,9 @@ class XMacGreekIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {196: 128, 185: 129, 178: 130, 201: 131, 179: 132, 214: 133, 220: 134, 901: 135, 224: 136, 226: 137, 228: 138, 900: 139, 168: 140, 231: 141, 233: 142, 232: 143, 234: 144, 235: 145, 163: 146, 8482: 147, 238: 148, 239: 149, 8226: 150, 189: 151, 8240: 152, 244: 153, 246: 154, 166: 155, 8364: 156, 249: 157, 251: 158, 252: 159, 8224: 160, 915: 161, 916: 162, 920: 163, 923: 164, 926: 165, 928: 166, 223: 167, 174: 168, 169: 169, 931: 170, 938: 171, 167: 172, 8800: 173, 176: 174, 183: 175, 913: 176, 177: 177, 8804: 178, 8805: 179, 165: 180, 914: 181, 917: 182, 918: 183, 919: 184, 921: 185, 922: 186, 924: 187, 934: 188, 939: 189, 936: 190, 937: 191, 940: 192, 925: 193, 172: 194, 927: 195, 929: 196, 8776: 197, 932: 198, 171: 199, 187: 200, 8230: 201, 160: 202, 933: 203, 935: 204, 902: 205, 904: 206, 339: 207, 8211: 208, 8213: 209, 8220: 210, 8221: 211, 8216: 212, 8217: 213, 247: 214, 905: 215, 906: 216, 908: 217, 910: 218, 941: 219, 942: 220, 943: 221, 972: 222, 911: 223, 973: 224, 945: 225, 946: 226, 968: 227, 948: 228, 949: 229, 966: 230, 947: 231, 951: 232, 953: 233, 958: 234, 954: 235, 955: 236, 956: 237, 957: 238, 959: 239, 960: 240, 974: 241, 961: 242, 963: 243, 964: 244, 952: 245, 969: 246, 962: 247, 967: 248, 965: 249, 950: 250, 970: 251, 971: 252, 912: 253, 944: 254, 173: 255}
|
||||
|
||||
class XMacGreekIncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for the Macintosh Greek encoding.
|
||||
"""
|
||||
name = 'x-mac-greek'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -635,6 +923,9 @@ register_kuroko_codec(['mac-greek', 'macgreek', 'x-mac-greek'], XMacGreekIncreme
|
||||
|
||||
|
||||
class XMacIcelandicIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for the Macintosh Icelandic encoding.
|
||||
"""
|
||||
name = 'x-mac-icelandic'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -642,6 +933,9 @@ class XMacIcelandicIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {196: 128, 197: 129, 199: 130, 201: 131, 209: 132, 214: 133, 220: 134, 225: 135, 224: 136, 226: 137, 228: 138, 227: 139, 229: 140, 231: 141, 233: 142, 232: 143, 234: 144, 235: 145, 237: 146, 236: 147, 238: 148, 239: 149, 241: 150, 243: 151, 242: 152, 244: 153, 246: 154, 245: 155, 250: 156, 249: 157, 251: 158, 252: 159, 221: 160, 176: 161, 162: 162, 163: 163, 167: 164, 8226: 165, 182: 166, 223: 167, 174: 168, 169: 169, 8482: 170, 180: 171, 168: 172, 8800: 173, 198: 174, 216: 175, 8734: 176, 177: 177, 8804: 178, 8805: 179, 165: 180, 181: 181, 8706: 182, 8721: 183, 8719: 184, 960: 185, 8747: 186, 170: 187, 186: 188, 937: 189, 230: 190, 248: 191, 191: 192, 161: 193, 172: 194, 8730: 195, 402: 196, 8776: 197, 8710: 198, 171: 199, 187: 200, 8230: 201, 160: 202, 192: 203, 195: 204, 213: 205, 338: 206, 339: 207, 8211: 208, 8212: 209, 8220: 210, 8221: 211, 8216: 212, 8217: 213, 247: 214, 9674: 215, 255: 216, 376: 217, 8260: 218, 8364: 219, 208: 220, 240: 221, 222: 222, 254: 223, 253: 224, 183: 225, 8218: 226, 8222: 227, 8240: 228, 194: 229, 202: 230, 193: 231, 203: 232, 200: 233, 205: 234, 206: 235, 207: 236, 204: 237, 211: 238, 212: 239, 63743: 240, 210: 241, 218: 242, 219: 243, 217: 244, 305: 245, 710: 246, 732: 247, 175: 248, 728: 249, 729: 250, 730: 251, 184: 252, 733: 253, 731: 254, 711: 255}
|
||||
|
||||
class XMacIcelandicIncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for the Macintosh Icelandic encoding.
|
||||
"""
|
||||
name = 'x-mac-icelandic'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -652,6 +946,9 @@ register_kuroko_codec(['mac-iceland', 'maciceland', 'x-mac-icelandic'], XMacIcel
|
||||
|
||||
|
||||
class XMacRomanianIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for the Macintosh Romanian encoding.
|
||||
"""
|
||||
name = 'x-mac-romanian'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -659,6 +956,9 @@ class XMacRomanianIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {196: 128, 197: 129, 199: 130, 201: 131, 209: 132, 214: 133, 220: 134, 225: 135, 224: 136, 226: 137, 228: 138, 227: 139, 229: 140, 231: 141, 233: 142, 232: 143, 234: 144, 235: 145, 237: 146, 236: 147, 238: 148, 239: 149, 241: 150, 243: 151, 242: 152, 244: 153, 246: 154, 245: 155, 250: 156, 249: 157, 251: 158, 252: 159, 8224: 160, 176: 161, 162: 162, 163: 163, 167: 164, 8226: 165, 182: 166, 223: 167, 174: 168, 169: 169, 8482: 170, 180: 171, 168: 172, 8800: 173, 258: 174, 536: 175, 8734: 176, 177: 177, 8804: 178, 8805: 179, 165: 180, 181: 181, 8706: 182, 8721: 183, 8719: 184, 960: 185, 8747: 186, 170: 187, 186: 188, 937: 189, 259: 190, 537: 191, 191: 192, 161: 193, 172: 194, 8730: 195, 402: 196, 8776: 197, 8710: 198, 171: 199, 187: 200, 8230: 201, 160: 202, 192: 203, 195: 204, 213: 205, 338: 206, 339: 207, 8211: 208, 8212: 209, 8220: 210, 8221: 211, 8216: 212, 8217: 213, 247: 214, 9674: 215, 255: 216, 376: 217, 8260: 218, 8364: 219, 8249: 220, 8250: 221, 538: 222, 539: 223, 8225: 224, 183: 225, 8218: 226, 8222: 227, 8240: 228, 194: 229, 202: 230, 193: 231, 203: 232, 200: 233, 205: 234, 206: 235, 207: 236, 204: 237, 211: 238, 212: 239, 63743: 240, 210: 241, 218: 242, 219: 243, 217: 244, 305: 245, 710: 246, 732: 247, 175: 248, 728: 249, 729: 250, 730: 251, 184: 252, 733: 253, 731: 254, 711: 255}
|
||||
|
||||
class XMacRomanianIncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for the Macintosh Romanian encoding.
|
||||
"""
|
||||
name = 'x-mac-romanian'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -669,6 +969,9 @@ register_kuroko_codec(['mac-romanian', 'x-mac-romanian'], XMacRomanianIncrementa
|
||||
|
||||
|
||||
class XMacTurkishIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""
|
||||
IncrementalEncoder implementation for the Macintosh Turkish encoding.
|
||||
"""
|
||||
name = 'x-mac-turkish'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -676,6 +979,9 @@ class XMacTurkishIncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {196: 128, 197: 129, 199: 130, 201: 131, 209: 132, 214: 133, 220: 134, 225: 135, 224: 136, 226: 137, 228: 138, 227: 139, 229: 140, 231: 141, 233: 142, 232: 143, 234: 144, 235: 145, 237: 146, 236: 147, 238: 148, 239: 149, 241: 150, 243: 151, 242: 152, 244: 153, 246: 154, 245: 155, 250: 156, 249: 157, 251: 158, 252: 159, 8224: 160, 176: 161, 162: 162, 163: 163, 167: 164, 8226: 165, 182: 166, 223: 167, 174: 168, 169: 169, 8482: 170, 180: 171, 168: 172, 8800: 173, 198: 174, 216: 175, 8734: 176, 177: 177, 8804: 178, 8805: 179, 165: 180, 181: 181, 8706: 182, 8721: 183, 8719: 184, 960: 185, 8747: 186, 170: 187, 186: 188, 937: 189, 230: 190, 248: 191, 191: 192, 161: 193, 172: 194, 8730: 195, 402: 196, 8776: 197, 8710: 198, 171: 199, 187: 200, 8230: 201, 160: 202, 192: 203, 195: 204, 213: 205, 338: 206, 339: 207, 8211: 208, 8212: 209, 8220: 210, 8221: 211, 8216: 212, 8217: 213, 247: 214, 9674: 215, 255: 216, 376: 217, 286: 218, 287: 219, 304: 220, 305: 221, 350: 222, 351: 223, 8225: 224, 183: 225, 8218: 226, 8222: 227, 8240: 228, 194: 229, 202: 230, 193: 231, 203: 232, 200: 233, 205: 234, 206: 235, 207: 236, 204: 237, 211: 238, 212: 239, 63743: 240, 210: 241, 218: 242, 219: 243, 217: 244, 63648: 245, 710: 246, 732: 247, 175: 248, 728: 249, 729: 250, 730: 251, 184: 252, 733: 253, 731: 254, 711: 255}
|
||||
|
||||
class XMacTurkishIncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""
|
||||
IncrementalDecoder implementation for the Macintosh Turkish encoding.
|
||||
"""
|
||||
name = 'x-mac-turkish'
|
||||
html5name = None
|
||||
@lazy_property
|
||||
|
@ -14,7 +14,11 @@ with fileio.open('tools/codectools/encodings.json') as f:
|
||||
for enc in i['encodings']:
|
||||
aliases[enc['name'].lower()] = enc['labels']
|
||||
|
||||
let boilerplate = '''# Generated by tools/codectools/gen_dbdata.krk from WHATWG encodings.json and indexes.json
|
||||
let boilerplate = '''"""
|
||||
Defines WHATWG-specified double-byte encodings which do not require dedicated implementations, and
|
||||
supplies data used by those (in `codecs.bespokecodecs`) which do.
|
||||
"""
|
||||
# Generated by tools/codectools/gen_dbdata.krk from WHATWG encodings.json and indexes.json
|
||||
|
||||
from collections import xraydict
|
||||
from codecs.infrastructure import AsciiIncrementalEncoder, AsciiIncrementalDecoder, register_kuroko_codec, encodesto7bit, decodesto7bit, lazy_property
|
||||
@ -22,6 +26,7 @@ from codecs.infrastructure import AsciiIncrementalEncoder, AsciiIncrementalDecod
|
||||
|
||||
let template = '''
|
||||
class {idname}IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""IncrementalEncoder implementation for {description}"""
|
||||
name = {mainlabel}
|
||||
html5name = {weblabel}
|
||||
@lazy_property
|
||||
@ -29,6 +34,7 @@ class {idname}IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {encode}
|
||||
|
||||
class {idname}IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""IncrementalDecoder implementation for {description}"""
|
||||
name = {mainlabel}
|
||||
html5name = {weblabel}
|
||||
@lazy_property
|
||||
@ -43,6 +49,7 @@ register_kuroko_codec({labels}, {idname}IncrementalEncoder, {idname}IncrementalD
|
||||
|
||||
let template_big5 = '''
|
||||
class {idnameenc}IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""IncrementalEncoder implementation for {description}"""
|
||||
name = {mainlabelenc}
|
||||
html5name = {weblabel}
|
||||
@lazy_property
|
||||
@ -50,6 +57,7 @@ class {idnameenc}IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {encode}
|
||||
|
||||
class {idnameenc2}IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
"""IncrementalEncoder implementation for {description2}"""
|
||||
name = {mainlabelenc2}
|
||||
html5name = None
|
||||
@lazy_property
|
||||
@ -57,6 +65,7 @@ class {idnameenc2}IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return xraydict({idnameenc}IncrementalEncoder("strict").encoding_map, {encode2})
|
||||
|
||||
class {idnamedec}IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
"""IncrementalDecoder implementation for {descriptiondec}"""
|
||||
name = {mainlabeldec}
|
||||
html5name = {weblabel}
|
||||
@lazy_property
|
||||
@ -226,6 +235,7 @@ with fileio.open('modules/codecs/dbdata.krk', 'w') as f:
|
||||
mainlabel=repr('windows-31j'),
|
||||
weblabel=repr('shift_jis'),
|
||||
labels=repr(aliases['shift_jis'] + ["cp932", "932", "mskanji", "shiftjis", "s_jis"]),
|
||||
description="Windows-31J (Shift_JIS as implemented by Microsoft).",
|
||||
encode=smartrepr(encode_shiftjis), decode=smartrepr(decode_shiftjis), idname='Windows31J',
|
||||
dbrange=repr(dbrange_shiftjis), tbrange=repr(tbrange_shiftjis),
|
||||
trailrange=repr(trailrange_shiftjis)))
|
||||
@ -233,6 +243,7 @@ with fileio.open('modules/codecs/dbdata.krk', 'w') as f:
|
||||
mainlabel=repr("x-euc-jp"),
|
||||
weblabel=repr("euc-jp"),
|
||||
labels=repr(aliases["euc-jp"] + ["eucjp", "ujis", "u_jis"]),
|
||||
description="EUC-JP (web version).",
|
||||
encode=smartrepr(encode_eucjp), decode=smartrepr(decode_eucjp), idname="XEucJp",
|
||||
dbrange=repr(dbrange_eucjp), tbrange=repr(tbrange_eucjp),
|
||||
trailrange=repr(trailrange_eucjp)))
|
||||
@ -241,6 +252,7 @@ with fileio.open('modules/codecs/dbdata.krk', 'w') as f:
|
||||
weblabel=repr("euc-kr"),
|
||||
labels=repr(aliases["euc-kr"] + ["cp949", "949", "ms949", "uhc", "euckr",
|
||||
"ks_c_5601", "ksx1001", "ks_x_1001"]),
|
||||
description="Unified Hangul Code (extended EUC-KR Wansung, Microsoft's KS C 5601 encoding).",
|
||||
encode=smartrepr(encode_uhc), decode=smartrepr(decode_uhc), idname="Windows949",
|
||||
dbrange=repr(dbrange_uhc), tbrange=repr(tbrange_uhc),
|
||||
trailrange=repr(trailrange_uhc)))
|
||||
@ -249,6 +261,9 @@ with fileio.open('modules/codecs/dbdata.krk', 'w') as f:
|
||||
mainlabelenc2=repr("big5-hkscs"),
|
||||
mainlabeldec=repr("big5-hkscs"),
|
||||
weblabel=repr("big5"),
|
||||
description="Big-5 (ETen version).",
|
||||
description2="Big-5 (HKSCS version).",
|
||||
descriptiondec="Big-5 (HKSCS version).",
|
||||
labels=repr(["big5", "cn-big5", "csbig5", "x-x-big5", "big5-eten", "cp950", "950", "ms950"]),
|
||||
labels2=repr(["big5-hkscs", "big5hkscs", "hkscs"]),
|
||||
encode=smartrepr(encode_big5eten), idnameenc="Big5Eten",
|
||||
|
@ -17,6 +17,9 @@ def build_sbmap(name):
|
||||
|
||||
let template = """
|
||||
class {idname}IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
'''
|
||||
IncrementalEncoder implementation for {description}
|
||||
'''
|
||||
name = {mainlabel}
|
||||
html5name = {weblabel}
|
||||
@lazy_property
|
||||
@ -24,6 +27,9 @@ class {idname}IncrementalEncoder(AsciiIncrementalEncoder):
|
||||
return {encode}
|
||||
|
||||
class {idname}IncrementalDecoder(AsciiIncrementalDecoder):
|
||||
'''
|
||||
IncrementalDecoder implementation for {description}
|
||||
'''
|
||||
name = {mainlabel}
|
||||
html5name = {weblabel}
|
||||
@lazy_property
|
||||
@ -36,7 +42,10 @@ register_kuroko_codec(
|
||||
{idname}IncrementalDecoder)
|
||||
"""
|
||||
|
||||
let boilerplate = """# Generated by tools/codectools/gen_sbencs.krk from WHATWG encodings.json and indexes.json
|
||||
let boilerplate = """'''
|
||||
Defines WHATWG-specified single-byte encodings.
|
||||
'''
|
||||
# Generated by tools/codectools/gen_sbencs.krk from WHATWG encodings.json and indexes.json
|
||||
|
||||
from codecs.infrastructure import AsciiIncrementalEncoder, AsciiIncrementalDecoder, register_kuroko_codec, lazy_property
|
||||
"""
|
||||
@ -73,7 +82,7 @@ let parity_labels = {
|
||||
# do). Also, it aliases "iso-ir-166" to the former (since it cites TIS-620) despite it having
|
||||
# an NBSP in the registration document (case in point).
|
||||
"windows-874": ["iso-8859-11-2001", "tis620", "tis-620-0", "tis-620-2529-0",
|
||||
"tis-620-2529-1", "iso-ir-166", "thai"],
|
||||
"tis-620-2529-1", "iso-ir-166", "thai", "cp874"],
|
||||
"iso-8859-13": ["l7", "latin7"],
|
||||
"iso-8859-14": ["iso-8859-14-1998", "l8", "latin8", "iso-ir-199", "iso_celtic"],
|
||||
"iso-8859-15": ["latin9"],
|
||||
@ -82,6 +91,62 @@ let parity_labels = {
|
||||
"x-mac-cyrillic": ["mac-cyrillic", "maccyrillic"],
|
||||
}
|
||||
|
||||
let descriptions = {
|
||||
"windows-1250": "Windows-1250 (Central Europe)",
|
||||
"windows-1251": "Windows-1251 (Cyrillic)",
|
||||
"windows-1252": "Windows-1252 (Western Europe), ISO-8859-1 modification/extension",
|
||||
"windows-1253": "Windows-1253 (Greek)",
|
||||
"windows-1254": "Windows-1254 (Turkish), ISO-8859-9 modification/extension",
|
||||
"windows-1255": "Windows-1255 (Logical order Hebrew with vowel points)",
|
||||
"windows-1256": "Windows-1256 (Arabic)",
|
||||
"windows-1257": "Windows-1257 (Baltic Rim)",
|
||||
"windows-1258": """Windows-1258 (Vietnam), basic implementation
|
||||
|
||||
Note that Windows-1258 includes a mixture of composed forms and combining characters,
|
||||
and that some grapheme clusters must be represented with a sequence of a composed
|
||||
form and a combining character, even though a fully composed form exists in Unicode
|
||||
taken from other encodings such as VISCII, since a fully composed form is not included,
|
||||
and a combining form is included for only one of the diacritics.
|
||||
|
||||
The encoder is a simple mapping which will accept text in the form generated by the decoder
|
||||
but, due to the above, some grapheme clusters will not be accepted in either NFC or NFD
|
||||
normalised form. The decoder does not convert its output to any normalised form. This follows
|
||||
both Python and WHATWG behaviour. Conversion of text between encodable form and either
|
||||
normalised form may need to be handled in a separate step by any code using this codec.""",
|
||||
"ibm866": """OEM-866 (Russian Cyrillic).
|
||||
|
||||
Note: OEM-866 competed with OEM-855 for Cyrillic; OEM-866 preserved all box drawing characters
|
||||
(rather then only a subset) and was more popular for Russian, but did not provide coverage
|
||||
for all of the different South Slavic Cyrillic orthographies, unlike OEM-855. Their layouts
|
||||
for Cyrillic are entirely different.""",
|
||||
"iso-8859-2": "ISO/IEC 8859-2 (Central European)",
|
||||
"iso-8859-3": "ISO/IEC 8859-3 (Maltese and Esperanto)",
|
||||
"iso-8859-4": "ISO/IEC 8859-4 (North European)",
|
||||
"iso-8859-5": "ISO/IEC 8859-5 (Cyrillic)",
|
||||
"iso-8859-6": "ISO/IEC 8859-6 (Arabic ASMO 708)",
|
||||
"iso-8859-7": "ISO/IEC 8859-7 (Greek ELOT 928)",
|
||||
"iso-8859-8": "ISO/IEC 8859-8 (Hebrew)",
|
||||
"iso-8859-8-i": "ISO/IEC 8859-8 (Hebrew)", # Artifact: they do the same thing inside codecs.
|
||||
"iso-8859-10": "ISO/IEC 8859-10 (Nordic)",
|
||||
"windows-874": "Windows-874 (Thai), TIS-620 / ISO-8859-11 modification/extension",
|
||||
"iso-8859-13": "ISO/IEC 8859-13 (Baltic Rim)",
|
||||
"iso-8859-14": "ISO/IEC 8859-14 (Celtic)",
|
||||
"iso-8859-15": "ISO/IEC 8859-15 (New Western European)",
|
||||
"iso-8859-16": "ISO/IEC 8859-16 (South-Eastern European; Romanian SR 14111)",
|
||||
"koi8-r": "the KOI8-R (KOI-8 Cyrillic for Russian) encoding.",
|
||||
"koi8-ru": "the KOI8-RU (KOI-8 Cyrillic for Belarusian, Ukrainian and Ruthenian) encoding.",
|
||||
"macintosh": "the Macintosh Roman encoding.",
|
||||
"x-mac-cyrillic": "the Macintosh Cyrillic encoding.",
|
||||
"x-user-defined": """the user-defined extended ASCII encoding.
|
||||
|
||||
This maps ASCII bytes as ASCII characters, and non-ASCII bytes to the private use
|
||||
range U+F780–F7FF, such that the low 8 bits always match the original byte.
|
||||
|
||||
This is sometimes useful for round-tripping arbitrary _sensu stricto_ extended
|
||||
ASCII data without caring about the non-ASCII part. Note however, that _sensu lato_
|
||||
extended ASCII may for example use ASCII bytes as trail bytes in a multi-byte code.""",
|
||||
}
|
||||
|
||||
let encode_xudef = {}
|
||||
let decode_xudef = {}
|
||||
for i in range(128):
|
||||
@ -110,7 +175,7 @@ with fileio.open("modules/codecs/sbencs.krk", "w") as outf:
|
||||
let decoding_map = built[1]
|
||||
let idname = name.title().replace("-", "")
|
||||
outf.write(template.format(mainlabel=repr(name), encode=repr(encoding_map),
|
||||
weblabel=repr(whatwgname),
|
||||
weblabel=repr(whatwgname), description=descriptions.get(name, "TODO"),
|
||||
decode=repr(decoding_map), labels=repr(labels), idname=idname))
|
||||
else:
|
||||
for enc in i["encodings"]:
|
||||
@ -119,18 +184,24 @@ with fileio.open("modules/codecs/sbencs.krk", "w") as outf:
|
||||
else:
|
||||
mapped_to_replacement.extend(enc["labels"])
|
||||
outf.write(template.format(mainlabel=repr("x-user-defined"), encode=repr(encode_xudef),
|
||||
weblabel=repr("x-user-defined"),
|
||||
weblabel=repr("x-user-defined"), description=descriptions.get("x-user-defined", "TODO"),
|
||||
decode=repr(decode_xudef), labels=repr(["x-user-defined"]), idname="XUserDefined"))
|
||||
|
||||
with fileio.open("modules/codecs/isweblabel.krk", "w") as outf:
|
||||
outf.write(f"""
|
||||
outf.write(f"""'''
|
||||
Allows checking the WHATWG status of a given label (listed, not listed, or mapped to undefined).
|
||||
'''
|
||||
# Generated by tools/codectools/gen_sbencs.krk from WHATWG encodings.json
|
||||
let weblabels = {all_weblabels!r}
|
||||
let mapped_to_replacement = {mapped_to_replacement!r}
|
||||
|
||||
def map_weblabel(label):
|
||||
'''
|
||||
If `label` is a regular WHATWG label, returns it; if it is a label mapped to Replacement,
|
||||
returns `"undefined"`; otherwise, returns `None`.
|
||||
'''
|
||||
if label in mapped_to_replacement:
|
||||
# WHATWG aliases these following to replacement to prevent their use in injection/XSS attacks.
|
||||
# WHATWG aliases these to replacement to prevent their use in injection/XSS attacks.
|
||||
return "undefined"
|
||||
else if label in weblabels:
|
||||
return label
|
||||
|
@ -81,14 +81,16 @@ let modules = [
|
||||
'tools.gendoc',
|
||||
|
||||
# Codecs module
|
||||
'codecs',
|
||||
'codecs.bespokecodecs',
|
||||
'codecs.binascii',
|
||||
'codecs.dbdata',
|
||||
'codecs.dbextra',
|
||||
'codecs.dbextra_data_7bit',
|
||||
'codecs.dbextra_data_8bit',
|
||||
'codecs.dbextra',
|
||||
'codecs.infrastructure',
|
||||
'codecs.isweblabel',
|
||||
'codecs.pifonts',
|
||||
'codecs.sbencs',
|
||||
'codecs.sbextra',
|
||||
]
|
||||
@ -156,7 +158,7 @@ def functionDoc(func):
|
||||
let doc = func.__doc__ if ('__doc__' in dir(func) and func.__doc__) else ''
|
||||
if '@arguments ' in doc:
|
||||
doc = '\n'.join([x for x in doc.split('\n') if '@arguments' not in x])
|
||||
return doc
|
||||
return "<p>" + doc + "</p>"
|
||||
|
||||
def processModules(modules):
|
||||
|
||||
@ -176,6 +178,13 @@ def processModules(modules):
|
||||
output.write('\n')
|
||||
|
||||
print('## ' + fixup(modulepath) + ' {#mod_' + modulepath.replace('.','_') + '}')
|
||||
let rsplit = lambda s,d,l: reversed("".join(reversed(i)) for i in "".join(reversed(s)).split(d, l))
|
||||
if "." in modulepath:
|
||||
let parent = rsplit(modulepath, ".", 1)[0]
|
||||
if parent in modules:
|
||||
let parentpath = fixup(parent).replace('.','_')
|
||||
print(f"\n<a href='mod_{parentpath}.html'>← {parent}</a>\n")
|
||||
|
||||
if '__doc__' in dir(module) and module.__doc__:
|
||||
print(module.__doc__.strip())
|
||||
docString[modulepath] = truncateString(module.__doc__)
|
||||
@ -279,6 +288,17 @@ def processModules(modules):
|
||||
else:
|
||||
other.append(Pair(member,obj))
|
||||
|
||||
if hasattr(module, "__ispackage__") and module.__ispackage__:
|
||||
print("\n### Package contents\n")
|
||||
print('\htmlonly<div class="krk-class-index"><ul>\n')
|
||||
for i in modules:
|
||||
if not i.startswith(name + "."):
|
||||
continue
|
||||
let uscored = fixup(i).replace('.','_')
|
||||
let relative = i[len(name) + 1:]
|
||||
print(f'<li><a class="el" href="mod_{uscored}.html">{relative}</a></li>\n')
|
||||
print('</ul></div>\endhtmlonly\n')
|
||||
|
||||
if classes:
|
||||
print('\n### Classes\n')
|
||||
classes.sort()
|
||||
|
Loading…
Reference in New Issue
Block a user