We used to convert the unicode object directly to a string in the server
encoding by calling Python's PyUnicode_AsEncodedString function. In other
words, we used Python's routines to do the encoding. However, that has a
few problems. First of all, it required keeping a mapping table of Python
encoding names and PostgreSQL encodings. But the real killer was that Python
doesn't support EUC_TW and MULE_INTERNAL encodings at all.
Instead, convert the Python unicode object to UTF-8, and use PostgreSQL's
encoding conversion functions to convert from UTF-8 to server encoding. We
were already doing the same in the other direction in PLyUnicode_FromString,
so this is more consistent, too.
Note: This makes SQL_ASCII to behave more leniently. We used to map
SQL_ASCII to Python's 'ascii', which on Python means strict 7-bit ASCII
only, so you got an error if the python string contained anything but pure
ASCII. You no longer get an error; you get the UTF-8 representation of the
string instead.
Backpatch to 9.0, where these conversions were introduced.
Jan Urbański
That caused the plpython_unicode regression test to fail on SQL_ASCII
encoding, as evidenced by the buildfarm. The reason is that with the patch,
you don't get the detail in the error message that you got before. That
detail is actually very informative, so rather than just adjust the expected
output, let's revert that part of the patch for now to make the buildfarm
green again, and figure out some other way to avoid the recursion of
PLy_elog() that doesn't lose the detail.
Windows encodings, "win1252" and so forth, are named differently in Python,
like "cp1252". Also, if the PyUnicode_AsEncodedString() function call fails
for some reason, use a plain ereport(), not a PLy_elog(), to report that
error. That avoids recursion and crash, if PLy_elog() tries to call
PLyUnicode_Bytes() again.
This fixes bug reported by Asif Naeem. Backpatch down to 9.0, before that
plpython didn't even try these conversions.
Jan Urbański, with minor comment improvements by me.
This moves the code around from one huge file into hopefully logical
and more manageable modules. For the most part, the code itself was
not touched, except: PLy_function_handler and PLy_trigger_handler were
renamed to PLy_exec_function and PLy_exec_trigger, because they were
not actually handlers in the PL handler sense, and it makes the naming
more similar to the way PL/pgSQL is organized. The initialization of
the procedure caches was separated into a new function
init_procedure_caches to keep the hash tables private to
plpy_procedures.c.
Jan Urbański and Peter Eisentraut