78 Commits

Author SHA1 Message Date
Magnus Hagander
5841aa86eb Explicitly bind gettext to the correct encoding on Windows.
Original patch from Hiroshi Inoue.
2009-01-22 10:09:48 +00:00
Magnus Hagander
cfb9c7f8b5 Use the new text domain names ("postgres-8.4" instead of "postgres")
Hiroshi Inoue
2009-01-19 15:34:23 +00:00
Tom Lane
1efd5ff89b Add a pg_encoding_mbcliplen() function that is just like pg_mbcliplen()
except the caller can specify the encoding to work in; this will be needed
for pg_stat_statements.  In passing, do some marginal efficiency hacking
and clean up some comments.  Also, prevent the single-byte-encoding code
path from fetching one byte past the stated length of the string (this
last is a bug that might need to be back-patched at some point).
2009-01-04 18:37:36 +00:00
Tom Lane
3be2448525 Add an explicit caution about how to use pg_do_encoding_conversion with
non-null-terminated input.  Per discussion with ITAGAKI Takahiro.
2008-11-11 03:01:20 +00:00
Tom Lane
2b74d45c1b pg_do_encoding_conversion cannot return NULL (at least not unless the input
is NULL), so remove some useless tests for the case.
2008-11-10 15:18:40 +00:00
Tom Lane
d1da215d32 Fix compiler warning introduced by recent patch. Tsk tsk. 2008-06-18 23:08:47 +00:00
Bruce Momjian
9de09c087d Move wchar2char() and char2wchar() from tsearch into /mb to be easier to
use for other modules;  also move pnstrdup().

Clean up code slightly.
2008-06-18 18:42:54 +00:00
Magnus Hagander
ea7f9648fe Explicitly bind gettext() to the UTF8 locale when in use.
This is required on Windows due to the special locale
handling for UTF8 that doesn't change the full environment.

Fixes crash with translated error messages per bugs 4180
and 4196.

Tom Lane
2008-05-27 12:24:42 +00:00
Tom Lane
ba1c463096 Clean up a few places where Datums were being treated as pointers without
going through DatumGetPointer or some other "official" conversion macro.
Not actually a bug, since Datum the same size as pointer is the only
supported case at the moment, but good cleanup for the future.

Gavin Sherry
2008-04-12 23:21:04 +00:00
Tom Lane
a9742f123c Remove incorrect (and ill-advised anyway) pfree's in pg_convert_from and
pg_convert_to.  Per bug #3866 from Andrew Gilligan.
2008-01-09 23:43:54 +00:00
Bruce Momjian
fdf5a5efb7 pgindent run for 8.3. 2007-11-15 21:14:46 +00:00
Tom Lane
8468146b03 Fix the inadvertent libpq ABI breakage discovered by Martin Pitt: the
renumbering of encoding IDs done between 8.2 and 8.3 turns out to break 8.2
initdb and psql if they are run with an 8.3beta1 libpq.so.  For the moment
we can rearrange the order of enum pg_enc to keep the same number for
everything except PG_JOHAB, which isn't a problem since there are no direct
references to it in the 8.2 programs anyway.  (This does force initdb
unfortunately.)

Going forward, we want to fix things so that encoding IDs can be changed
without an ABI break, and this commit includes the changes needed to allow
libpq's encoding IDs to be treated as fully independent of the backend's.
The main issue is that libpq clients should not include pg_wchar.h or
otherwise assume they know the specific values of libpq's encoding IDs,
since they might encounter version skew between pg_wchar.h and the libpq.so
they are using.  To fix, have libpq officially export functions needed for
encoding name<=>ID conversion and validity checking; it was doing this
anyway unofficially.

It's still the case that we can't renumber backend encoding IDs until the
next bump in libpq's major version number, since doing so will break the
8.2-era client programs.  However the code is now prepared to avoid this
type of problem in future.

Note that initdb is no longer a libpq client: we just pull in the two
source files we need directly.  The patch also fixes a few places that
were being sloppy about checking for an unrecognized encoding name.
2007-10-13 20:18:42 +00:00
Andrew Dunstan
a1b14ae1dd Add comments re text <-> bytea internal equivalence in convert routines. 2007-09-24 16:38:24 +00:00
Andrew Dunstan
82467e4e70 Use correct PG_GETARG macro in pg_convert 2007-09-24 14:59:37 +00:00
Andrew Dunstan
55613bf9cd Close previously open holes for invalidly encoded data to enter the
database via builtin functions, as recently discussed on -hackers.

chr() now returns a character in the database encoding. For UTF8 encoded databases
the argument is treated as a Unicode code point. For other multi-byte encodings
the argument must designate a strict ascii character, or an error is raised,
as is also the case if the argument is 0.

ascii() is adjusted so that it remains the inverse of chr().

The two argument form of convert() is gone, and the three argument form now
takes a bytea first argument and returns a bytea. To cover this loss three new
functions are introduced:
. convert_from(bytea, name) returns text - converts the first argument from the
  named encoding to the database encoding
. convert_to(text, name) returns bytea - converts the first argument from the
  database encoding to the named encoding
. length(bytea, name) returns int - gives the length of the first argument in
  characters in the named encoding
2007-09-18 17:41:17 +00:00
Tom Lane
fa98a86f65 Tweak the code in a couple of places to try to deliver more user-friendly
error messages when a single COPY line is too long for us to handle.  Per
example from Johann Spies.
2007-05-28 16:43:24 +00:00
Tom Lane
234a02b2a8 Replace direct assignments to VARATT_SIZEP(x) with SET_VARSIZE(x, len).
Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with
VARSIZE and VARDATA, and as a consequence almost no code was using the
longer names.  Rename the length fields of struct varlena and various
derived structures to catch anyplace that was accessing them directly;
and clean up various places so caught.  In itself this patch doesn't
change any behavior at all, but it is necessary infrastructure if we hope
to play any games with the representation of varlena headers.
Greg Stark and Tom Lane
2007-02-27 23:48:10 +00:00
Tom Lane
e9da20ab4d Fix machine-dependent crash in sqlchar_to_unicode(). Get rid of
bletcherous and unsafe manipulation of global encoding setting.
Clean up libxml reporting mechanism a bit (it still looks like a
dangling-pointer crash waiting to happen, though, not to mention
being far less than sane from a localization standpoint).
2006-12-24 00:57:48 +00:00
Peter Eisentraut
8c1de5fb00 Initial SQL/XML support: xml data type and initial set of functions. 2006-12-21 16:05:16 +00:00
Bruce Momjian
f99a569a2e pgindent run for 8.2. 2006-10-04 00:30:14 +00:00
Bruce Momjian
e0522505bd Remove 576 references of include files that were not needed. 2006-07-14 14:52:27 +00:00
Bruce Momjian
3a534ade39 Alphabetically order reference to include files, "G" - "M". 2006-07-11 17:04:13 +00:00
Tom Lane
c61a2f5841 Change the backend to reject strings containing invalidly-encoded multibyte
characters in all cases.  Formerly we mostly just threw warnings for invalid
input, and failed to detect it at all if no encoding conversion was required.
The tighter check is needed to defend against SQL-injection attacks as per
CVE-2006-2313 (further details will be published after release).  Embedded
zero (null) bytes will be rejected as well.  The checks are applied during
input to the backend (receipt from client or COPY IN), so it no longer seems
necessary to check in textin() and related routines; any string arriving at
those functions will already have been validated.  Conversion failure
reporting (for characters with no equivalent in the destination encoding)
has been cleaned up and made consistent while at it.

Also, fix a few longstanding errors in little-used encoding conversion
routines: win1251_to_iso, win866_to_iso, euc_tw_to_big5, euc_tw_to_mic,
mic_to_euc_tw were all broken to varying extents.

Patches by Tatsuo Ishii and Tom Lane.  Thanks to Akio Ishida and Yasuo Ohgaki
for identifying the security issues.
2006-05-21 20:05:21 +00:00
Neil Conway
d3a4d63387 mbutils was previously doing some allocations, including invoking
fmgr_info(), in the TopMemoryContext. I couldn't see that the code
actually leaked, but in general I think it's fragile to assume that
pfree'ing an FmgrInfo along with its fn_extra field is enough to
reclaim all the resources allocated by fmgr_info().  I changed the
code to do its allocations in a new child context of
TopMemoryContext, MbProcContext. When we want to release the
allocations we can just reset the context, which is cleaner.
2006-01-12 22:04:02 +00:00
Neil Conway
fb627b76cc Cosmetic code cleanup: fix a bunch of places that used "return (expr);"
rather than "return expr;" -- the latter style is used in most of the
tree. I kept the parentheses when they were necessary or useful because
the return expression was complex.
2006-01-11 08:43:13 +00:00
Neil Conway
762bcbdba2 Remove a confusing pair of parentheses. 2006-01-11 06:59:22 +00:00
Bruce Momjian
1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Tom Lane
8889685555 Suppress signed-vs-unsigned-char warnings. 2005-09-24 17:53:28 +00:00
Tom Lane
d78397d301 Change typreceive function API so that receive functions get the same
optional arguments as text input functions, ie, typioparam OID and
atttypmod.  Make all the datatypes that use typmod enforce it the same
way in typreceive as they do in typinput.  This fixes a problem with
failure to enforce length restrictions during COPY FROM BINARY.
2005-07-10 21:14:00 +00:00
Bruce Momjian
e3d7de6b99 Rename canonical encodings, per Peter:
UNICODE => UTF8
	ALT => WIN866
	WIN => WIN1251
	TCVN => WIN1258

The old codes continue to work.
2005-03-07 04:30:55 +00:00
Neil Conway
7069dbcc31 More minor cosmetic improvements:
- remove another senseless "extern" keyword that was applied to a
function definition
- change a foo more function signatures from "some_type foo()" to
"some_type foo(void)"
- rewrite another K&R style function definition
- make the type of the "action" function pointer in the KeyWord struct
in src/backend/utils/adt/formatting.c more precise
2004-10-13 01:25:13 +00:00
Bruce Momjian
b6b71b85bc Pgindent run for 8.0. 2004-08-29 05:07:03 +00:00
Tatsuo Ishii
e8c3205037 Add PQmbdsplen() which returns the "display length" of a character.
Still some works needed:
- UTF-8, MULE_INTERNAL always returns 1
2004-03-15 10:41:26 +00:00
PostgreSQL Daemon
969685ad44 $Header: -> $PostgreSQL Changes ... 2003-11-29 19:52:15 +00:00
Peter Eisentraut
feb4f44d29 Message editing: remove gratuitous variations in message wording, standardize
terms, add some clarifications, fix some untranslatable attempts at dynamic
message building.
2003-09-25 06:58:07 +00:00
Bruce Momjian
089003fb46 pgindent run. 2003-08-04 00:43:34 +00:00
Tom Lane
689eb53e47 Error message editing in backend/utils (except /adt). 2003-07-25 20:18:01 +00:00
Tom Lane
351372e585 Department of second thoughts: probably still need an IsTransactionState
test in there...
2003-04-27 18:01:46 +00:00
Tom Lane
5f15fa8d06 Clean up some problems in SetClientEncoding: failed to honor doit flag
in all cases, leaked TopMemoryContext memory in others.  Make the
interaction between SetClientEncoding and InitializeClientEncoding
cleaner and better documented.  I suspect these changes should be
back-patched into 7.3, but will wait on Tatsuo's verification.
2003-04-27 17:31:25 +00:00
Tom Lane
e4704001ea This patch fixes a bunch of spelling mistakes in comments throughout the
PostgreSQL source code.

Neil Conway
2003-03-10 22:28:22 +00:00
Tatsuo Ishii
e2a618fe25 Fix for GUC client_encoding variable not being handled
correctly. See following thread for more details.

Subject: [HACKERS] client_encoding directive is ignored in postgresql.conf
From: Tatsuo Ishii <t-ishii@sra.co.jp>
Date: Wed, 29 Jan 2003 22:24:04 +0900 (JST)
2003-02-19 14:31:26 +00:00
Tatsuo Ishii
ac47950238 Guard against 0 length string encoding conversion case. 2002-11-26 02:22:29 +00:00
Tom Lane
5123139210 Remove encoding lookups from grammar stage, push them back to places
where it's safe to do database access.  Along the way, fix core dump
for 'DEFAULT' parameters to CREATE DATABASE.  initdb forced due to
change in pg_proc entry.
2002-11-02 18:41:22 +00:00
Bruce Momjian
e50f52a074 pgindent run. 2002-09-04 20:31:48 +00:00
Peter Eisentraut
77f7763b55 Remove all traces of multibyte and locale options. Clean up comments
referring to "multibyte" where it really means character encoding.
2002-09-03 21:45:44 +00:00
Tatsuo Ishii
ed7baeaf4d Remove #ifdef MULTIBYTE per hackers list discussion. 2002-08-29 07:22:30 +00:00
Tatsuo Ishii
10b374aecf Fix bug in pg_convert() per report from MaC.Yui.
It pfree() wrong pointer.
2002-08-19 04:08:08 +00:00
Tatsuo Ishii
538b101595 Fix memory leak in SetClientEncoding(). 2002-08-14 05:33:34 +00:00
Tatsuo Ishii
3c63578a7e Load and keep conversion function info when SET CLIENT_ENCODING TO is
executed to prevent database access while performing encoding
conversion.
2002-08-08 06:35:26 +00:00
Tatsuo Ishii
0345f58496 Implement DROP CONVERSION
Add regression test
2002-07-25 10:07:13 +00:00