mirror of https://github.com/postgres/postgres
Hi, here are patches I promised (against 6.3.2): * character_length(), position(), substring() are now aware of multi-byte characters * add octet_length() * add --with-mb option to configure * new regression tests for EUC_KR (contributed by "Soonmyung. Hong" <hong@lunaris.hanmesoft.co.kr>) * add some test cases to the EUC_JP regression test * fix problem in regress/regress.sh in case of System V * fix toupper(), tolower() to handle 8bit chars note that: o patches for both configure.in and configure are included. maybe the one for configure is not necessary. o pg_proc.h was modified to add octet_length(). I used OIDs (1374-1379) for that. Please let me know if these numbers are not appropriate.
This commit is contained in:
parent
2cbcf46102
commit
f554af0a9f
|
@ -1,4 +1,4 @@
|
|||
postgresql 6.3 multi-byte(MB) patch PL2 README Mar 10 1998
|
||||
postgresql 6.3 multi-byte (MB) support README April 21 1998
|
||||
|
||||
Tatsuo Ishii
|
||||
t-ishii@sra.co.jp
|
||||
|
@ -6,13 +6,13 @@ postgresql 6.3 multi-byte(MB) patch PL2 README Mar 10 1998
|
|||
|
||||
Introduction
|
||||
|
||||
MB patch is intended for allowing PostgreSQL to handle multi-byte
|
||||
charachter sets such as EUC(Extende Unix Code), Unicode and Mule
|
||||
internal code. With the MB patch you can use multi-byte character sets
|
||||
in regexp and LIKE. The encoding system chosen is determined at the
|
||||
compile time.
|
||||
The MB support is intended for allowing PostgreSQL to handle
|
||||
multi-byte character sets such as EUC(Extended Unix Code), Unicode and
|
||||
Mule internal code. With the MB enabled you can use multi-byte
|
||||
character sets in regexp ,LIKE and some functions. The encoding system
|
||||
chosen is determined at the compile time.
|
||||
|
||||
The patch also fixes some problems concerning with 8-bit single byte
|
||||
MB also fixes some problems concerning with 8-bit single byte
|
||||
character sets including ISO8859. (I would not say all of problems
|
||||
have been fixed. I just confirmed that the regression test ran fine
|
||||
and a few French characters could be used with the patch. Please let
|
||||
|
@ -20,26 +20,33 @@ me know if you find any problem while using 8-bit characters)
|
|||
|
||||
How to use
|
||||
|
||||
After applying the MB patch, create src/Makefile.custom with a line
|
||||
including:
|
||||
create src/Makefile.custom with a line including:
|
||||
|
||||
MB=encoding_system
|
||||
MB=encoding_system
|
||||
|
||||
or run configure with the mb option:
|
||||
|
||||
% configure --with-mb=encoding_system
|
||||
|
||||
where encoding_system is one of:
|
||||
|
||||
EUC_JP Japanese EUC
|
||||
EUC_CN Chinese EUC
|
||||
EUC_KR Korean EUC
|
||||
EUC_TW Taiwan EUC
|
||||
UNICODE Unicode(UTF-8)
|
||||
MULE_INTERNAL Mule internal
|
||||
EUC_JP Japanese EUC
|
||||
EUC_CN Chinese EUC
|
||||
EUC_KR Korean EUC
|
||||
EUC_TW Taiwan EUC
|
||||
UNICODE Unicode(UTF-8)
|
||||
MULE_INTERNAL Mule internal
|
||||
|
||||
Example:
|
||||
|
||||
% cat Makefile.custom
|
||||
MB=EUC_JP
|
||||
% cat Makefile.custom
|
||||
MB=EUC_JP
|
||||
|
||||
If MB is not defined, nothing is changed except better supporting for
|
||||
or
|
||||
|
||||
% configure --with-mb=EUC_JP
|
||||
|
||||
If MB is disabled, nothing is changed except better supporting for
|
||||
8-bit single byte character sets.
|
||||
|
||||
References
|
||||
|
@ -59,6 +66,19 @@ Unicode: http://www.unicode.org/
|
|||
|
||||
History
|
||||
|
||||
April 21, 1998 some enhancements/fixes
|
||||
* character_length(), position(), substring() are now aware of
|
||||
multi-byte characters
|
||||
* add octet_length()
|
||||
* add --with-mb option to configure
|
||||
* new regression tests for EUC_KR
|
||||
(contributed by "Soonmyung. Hong" <hong@lunaris.hanmesoft.co.kr>)
|
||||
* add some test cases to the EUC_JP regression test
|
||||
* fix problem in regress/regress.sh in case of System V
|
||||
* fix toupper(), tolower() to handle 8bit chars
|
||||
|
||||
Mar 25, 1998 MB PL2 is incorporated into PostgreSQL 6.3.1
|
||||
|
||||
Mar 10, 1998 PL2 released
|
||||
* add regression test for EUC_JP, EUC_CN and MULE_INTERNAL
|
||||
* add an English document (this file)
|
||||
|
|
|
@ -1,14 +1,12 @@
|
|||
postgresql 6.3 multi-byte (MB) patch PL2 README 1998/3/10 $B:n@.(B
|
||||
postgresql 6.3.2 multi-byte (MB) support README 1998/4/21 $B:n@.(B
|
||||
|
||||
$B@P0fC#IW(B
|
||||
t-ishii@sra.co.jp
|
||||
http://www.sra.co.jp/people/t-ishii/PostgreSQL/
|
||||
|
||||
$B$O$8$a$K!'(B
|
||||
$B$3$N%Q%C%A$O!"%U%j!<$J(B RDBMS(Relational Database Management System)$B$N(B
|
||||
PostgreSQL (http://www.postgresql.org/)$B$N:G?7HG(B 6.3 $B$GF|K\8l(B EUC
|
||||
$B$J$I!"%^%k%A%P%$%HJ8;z$r07$&$3$H$r2DG=$K$9$k$?$a$N$b$N$G$9!#$3$N%Q%C(B
|
||||
$B%A$r$"$F$k$3$H$K$h$j!"0J2<$N$3$H$,2DG=$K$J$j$^$9!#(B
|
||||
|
||||
PostgreSQL $B$K$*$1$k%^%k%A%P%$%H%5%]!<%H$O0J2<$N$h$&$JFCD'$r;}$C$F$$$^$9!#(B
|
||||
|
||||
1.$B%^%k%A%P%$%HJ8;z$H$7$F!"F|K\8l!"Cf9q8l$J$I$N3F9q$N(B EUC$B!"(BUnicode$B!"(B
|
||||
mule internal code $B$,%3%s%Q%$%k;~$KA*Br2DG=!#%G!<%?%Y!<%9$K$O(B
|
||||
|
@ -19,45 +17,24 @@ postgresql 6.3 multi-byte (MB) patch PL2 README 1998/3/10 $B:n@.(B
|
|||
4.$B%G!<%?$=$N$b$N$K$b%^%k%A%P%$%HJ8;z$,;HMQ2DG=(B
|
||||
5.$B%^%k%A%P%$%HJ8;z$N@55,I=8=8!:w$,;HMQ2DG=(B
|
||||
6.$B%^%k%A%P%$%HJ8;z$N(B LIKE $B8!:w$,;HMQ2DG=(B
|
||||
7.character_length(), position(), substring() $B$G$N%^%k%A%P%$%H(B
|
||||
$B%5%]!<%H(B
|
||||
|
||||
($B$?$@$7!"(B2,3,4 $B$K$D$$$F$O%Q%C%A$r$"$F$J$/$F$b2DG=$G$9!#(B)
|
||||
$B%$%s%9%H!<%k!'(B
|
||||
$B%G%U%)%k%H$G$O(B PostgreSQL $B$O%^%k%A%P%$%H$r%5%]!<%H$7$F$$$^$;$s!#(B
|
||||
$B%^%k%A%P%$%H%5%]!<%H$rM-8z$K$9$kJ}K!$r@bL@$7$^$9!#(B
|
||||
|
||||
postgresql-6.3 $B$NF~<jJ}K!!'(B
|
||||
postgresql-6.3.tar.gz $B$O(B postgresql $B$NF|K\$G$N8x<0%_%i!<%5%$%H$G(B
|
||||
$B$"$k(B ftp://ftp.jaist.ac.jp/pub/dbms/PostgreSQL/ $B$+$iF~<j$G$-$^$9!#(B
|
||||
$B2?$i$+$NM}M3$G$3$3$+$iF~<j$G$-$J$$>l9g$O!"(B
|
||||
ftp://ftp.sra.co.jp/pub/cmd/postgres/6.3/ $B$bMxMQ$G$-$^$9!#(B
|
||||
$B$J$*!"(Bpostgresql $B$N%*%j%8%J%k(B ftp $B%5%$%H$O(B ftp://ftp.postgresql.org
|
||||
$B$G$9!#(B
|
||||
|
||||
$B$3$N%Q%C%A$NF~<jJ}K!!'(B
|
||||
|
||||
ftp://ftp.sra.co.jp/pub/cmd/postgres/6.3/patches/6.3mbPL2.patch.gz
|
||||
$B$rF~<j$7$F2<$5$$!#(B
|
||||
|
||||
$B%Q%C%A$N$"$F$+$?!'(B
|
||||
$BF~<j$7$?%Q%C%A%U%!%$%k$rE83+$7$^$9!#(B
|
||||
|
||||
% gunzip 6.3mbPL2.patch.gz
|
||||
|
||||
postgresql-6.3 $B$N%=!<%9$rE83+$7$^$9!#(B
|
||||
|
||||
% gtar xfz postgresql-6.3.tar.gz
|
||||
|
||||
$B$9$k$H!"(Bpostgresql-6.3 $B$H$$$&%G%#%l%/%H%j$,$G$-$k$N$G!"$=$3$K(B
|
||||
cd $B$7$^$9!#(B
|
||||
|
||||
% cd postgresql-6.3
|
||||
|
||||
$B%Q%C%A$rEv$F$^$9!#(B
|
||||
|
||||
% patch -p1 < 6.3mbPL2.patch
|
||||
|
||||
$B$H$7$F$"$F$F$/$@$5$$!#<!$K!"(Bsrc/Makefile.custom $B$H$$$&%U%!%$%k$r:n$j!"(B
|
||||
src/Makefile.custom $B$H$$$&%U%!%$%k$r:n$j!"(B
|
||||
|
||||
MB=EUC_JP
|
||||
|
||||
$B$N(B 1 $B9T$rDI2C$7$^$9!#(BEUC_JP $B$r4^$a!"0J2<$N%3!<%I$,;XDj$G$-$^$9!#(B
|
||||
$B$N(B 1 $B9T$rDI2C$7$^$9!#$"$k$$$O!"(Bconfigure $B5/F0;~$K0J2<$N$h$&$K;XDj$7$^$9!#(B
|
||||
|
||||
% configure --with-mb=EUC_JP
|
||||
|
||||
$BJ8;z%3!<%I$H$7$F$O(B EUC_JP $B$r4^$a!"0J2<$N%3!<%I$,;XDj$G$-$^$9!#(B
|
||||
($B8=:_$N<BAu$G$O!"J8;z%3!<%I$O%3%s%Q%$%k;~$K7hDj$5$l!"<B9T;~$K(B
|
||||
$BF0E*$KJQ99$9$k$3$H$O$G$-$^$;$s(B)
|
||||
|
||||
EUC_JP $BF|K\8l(B EUC
|
||||
EUC_CN GB $B$r%Y!<%9$K$7$?CfJ8(BEUC$B!#(Bcode set 2 $B$O(B
|
||||
|
@ -93,6 +70,22 @@ postgresql-6.3 $B$NF~<jJ}K!!'(B
|
|||
|
||||
$B2~DjMzNr!'(B
|
||||
|
||||
1998/4/21 $B5!G=DI2C!?%P%0=$@5(B
|
||||
* character_length(), position(), substring() $B$N%^%k%A%P%$%H(B
|
||||
$BBP1~(B
|
||||
* octet_length() $BDI2C(B $B"*(B initdb $B$N$d$jD>$7I,MW(B
|
||||
* configure $B$N%*%W%7%g%s$K(B MB $B%5%]!<%HDI2C(B
|
||||
(ex. configure --with-mb=EUC_JP)
|
||||
* EUC_KR $B$N(B regression test $BDI2C(B
|
||||
("Soonmyung. Hong" <hong@lunaris.hanmesoft.co.kr> $B$5$sDs6!(B)
|
||||
* EUC_JP $B$N(B regression test $B$K(B character_length(), position(),
|
||||
substring(), octet_length() $BDI2C(B
|
||||
* regress.sh $B$N(B SystemV $B$K$*$1$kHs8_49@-=$@5(B
|
||||
* toupper(), tolower() $B$K(B 8bit $BJ8;z$,EO$k$HMn$A$k$3$H$,(B
|
||||
$B$"$k$N$r=$@5(B
|
||||
|
||||
1998/3/25 PostgreSQL 6.3.1 $B%j%j!<%9!"(BMB PL2 $B$,<h$j9~$^$l$k(B
|
||||
|
||||
1998/3/10 PL2 $B$r%j%j!<%9(B
|
||||
* EUC_JP, EUC_CN, MULE_INTERNAL $B$N(B regression test $B$rDI2C(B
|
||||
(EUC_CN $B$N%G!<%?$O(B he@sra.co.jp $B$5$sDs6!(B)
|
||||
|
|
|
@ -7,7 +7,7 @@
|
|||
#
|
||||
#
|
||||
# IDENTIFICATION
|
||||
# $Header: /cvsroot/pgsql/src/Makefile.global.in,v 1.40 1998/04/27 14:54:05 scrappy Exp $
|
||||
# $Header: /cvsroot/pgsql/src/Makefile.global.in,v 1.41 1998/04/27 17:07:22 scrappy Exp $
|
||||
#
|
||||
# NOTES
|
||||
# Essentially all Postgres make files include this file and use the
|
||||
|
@ -147,6 +147,11 @@ X_CFLAGS= @X_CFLAGS@
|
|||
X_LIBS= @X_LIBS@
|
||||
X11_LIBS= -lX11 @X_EXTRA_LIBS@
|
||||
|
||||
#
|
||||
# enable multi-byte support
|
||||
# choose one of:
|
||||
# EUC_JP,EHC_CN,EUC_KR,EUC_TW,UNICODE,MULE_INTERNAL
|
||||
MB=@MB@
|
||||
|
||||
##############################################################################
|
||||
#
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
/*
|
||||
* misc conversion functions between pg_wchar and other encodings.
|
||||
* Tatsuo Ishii
|
||||
* $Id: utils.c,v 1.1 1998/03/15 07:38:39 scrappy Exp $
|
||||
* $Id: utils.c,v 1.2 1998/04/27 17:07:53 scrappy Exp $
|
||||
*/
|
||||
#include <regex/pg_wchar.h>
|
||||
/*
|
||||
|
@ -324,25 +324,151 @@ static void pg_mule2wchar_with_len(const unsigned char *from, pg_wchar *to, int
|
|||
*to = 0;
|
||||
}
|
||||
|
||||
static int pg_euc_mblen(const unsigned char *s)
|
||||
{
|
||||
int len;
|
||||
|
||||
if (*s == SS2) {
|
||||
len = 2;
|
||||
} else if (*s == SS3) {
|
||||
len = 3;
|
||||
} else if (*s & 0x80) {
|
||||
len = 2;
|
||||
} else {
|
||||
len = 1;
|
||||
}
|
||||
return(len);
|
||||
}
|
||||
|
||||
static int pg_eucjp_mblen(const unsigned char *s)
|
||||
{
|
||||
return(pg_euc_mblen(s));
|
||||
}
|
||||
|
||||
static int pg_euckr_mblen(const unsigned char *s)
|
||||
{
|
||||
return(pg_euc_mblen(s));
|
||||
}
|
||||
|
||||
static int pg_eucch_mblen(const unsigned char *s)
|
||||
{
|
||||
int len;
|
||||
|
||||
if (*s == SS2) {
|
||||
len = 3;
|
||||
} else if (*s == SS3) {
|
||||
len = 3;
|
||||
} else if (*s & 0x80) {
|
||||
len = 2;
|
||||
} else {
|
||||
len = 1;
|
||||
}
|
||||
return(len);
|
||||
}
|
||||
|
||||
static int pg_euccn_mblen(const unsigned char *s)
|
||||
{
|
||||
int len;
|
||||
|
||||
if (*s == SS2) {
|
||||
len = 4;
|
||||
} else if (*s == SS3) {
|
||||
len = 3;
|
||||
} else if (*s & 0x80) {
|
||||
len = 2;
|
||||
} else {
|
||||
len = 1;
|
||||
}
|
||||
return(len);
|
||||
}
|
||||
|
||||
static int pg_utf_mblen(const unsigned char *s)
|
||||
{
|
||||
int len = 1;
|
||||
|
||||
if ((*s & 0x80) == 0) {
|
||||
len = 1;
|
||||
} else if ((*s & 0xe0) == 0xc0) {
|
||||
len = 2;
|
||||
} else if ((*s & 0xe0) == 0xe0) {
|
||||
len = 3;
|
||||
}
|
||||
return(len);
|
||||
}
|
||||
|
||||
static int pg_mule_mblen(const unsigned char *s)
|
||||
{
|
||||
int len;
|
||||
|
||||
if (IS_LC1(*s)) {
|
||||
len = 2;
|
||||
} else if (IS_LCPRV1(*s)) {
|
||||
len = 3;
|
||||
} else if (IS_LC2(*s)) {
|
||||
len = 3;
|
||||
} else if (IS_LCPRV2(*s)) {
|
||||
len = 4;
|
||||
} else { /* assume ASCII */
|
||||
len = 1;
|
||||
}
|
||||
return(len);
|
||||
}
|
||||
|
||||
typedef struct {
|
||||
void (*mb2wchar)();
|
||||
void (*mb2wchar_with_len)();
|
||||
void (*mb2wchar)(); /* convert a multi-byte string to a wchar */
|
||||
void (*mb2wchar_with_len)(); /* convert a multi-byte string to a wchar
|
||||
with a limited length */
|
||||
int (*mblen)(); /* returns the length of a multi-byte word */
|
||||
} pg_wchar_tbl;
|
||||
|
||||
static pg_wchar_tbl pg_wchar_table[] = {
|
||||
{pg_eucjp2wchar, pg_eucjp2wchar_with_len},
|
||||
{pg_eucch2wchar, pg_eucch2wchar_with_len},
|
||||
{pg_euckr2wchar, pg_euckr2wchar_with_len},
|
||||
{pg_euccn2wchar, pg_euccn2wchar_with_len},
|
||||
{pg_utf2wchar, pg_utf2wchar_with_len},
|
||||
{pg_mule2wchar, pg_mule2wchar_with_len}};
|
||||
{pg_eucjp2wchar, pg_eucjp2wchar_with_len, pg_eucjp_mblen},
|
||||
{pg_eucch2wchar, pg_eucch2wchar_with_len, pg_eucch_mblen},
|
||||
{pg_euckr2wchar, pg_euckr2wchar_with_len, pg_euckr_mblen},
|
||||
{pg_euccn2wchar, pg_euccn2wchar_with_len, pg_euccn_mblen},
|
||||
{pg_utf2wchar, pg_utf2wchar_with_len, pg_utf_mblen},
|
||||
{pg_mule2wchar, pg_mule2wchar_with_len, pg_mule_mblen}};
|
||||
|
||||
/* convert a multi-byte string to a wchar */
|
||||
void pg_mb2wchar(const unsigned char *from, pg_wchar *to)
|
||||
{
|
||||
(*pg_wchar_table[MB].mb2wchar)(from,to);
|
||||
}
|
||||
|
||||
/* convert a multi-byte string to a wchar with a limited length */
|
||||
void pg_mb2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
|
||||
{
|
||||
(*pg_wchar_table[MB].mb2wchar_with_len)(from,to,len);
|
||||
}
|
||||
|
||||
/* returns the byte length of a multi-byte word */
|
||||
int pg_mblen(const unsigned char *mbstr)
|
||||
{
|
||||
return((*pg_wchar_table[MB].mblen)(mbstr));
|
||||
}
|
||||
|
||||
/* returns the length (counted as a wchar) of a multi-byte string */
|
||||
int pg_mbstrlen(const unsigned char *mbstr)
|
||||
{
|
||||
int len = 0;
|
||||
while (*mbstr) {
|
||||
mbstr += pg_mblen(mbstr);
|
||||
len++;
|
||||
}
|
||||
return(len);
|
||||
}
|
||||
|
||||
/* returns the length (counted as a wchar) of a multi-byte string
|
||||
(not necessarily NULL terminated) */
|
||||
int pg_mbstrlen_with_len(const unsigned char *mbstr, int limit)
|
||||
{
|
||||
int len = 0;
|
||||
int l;
|
||||
while (*mbstr && limit > 0) {
|
||||
l = pg_mblen(mbstr);
|
||||
limit -= l;
|
||||
mbstr += l;
|
||||
len++;
|
||||
}
|
||||
return(len);
|
||||
}
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
/*
|
||||
* Edmund Mergl <E.Mergl@bawue.de>
|
||||
*
|
||||
* $Id: oracle_compat.c,v 1.12 1998/02/26 04:37:19 momjian Exp $
|
||||
* $Id: oracle_compat.c,v 1.13 1998/04/27 17:08:19 scrappy Exp $
|
||||
*
|
||||
*/
|
||||
|
||||
|
@ -55,7 +55,7 @@ lower(text *string)
|
|||
|
||||
while (m--)
|
||||
{
|
||||
*ptr_ret++ = tolower(*ptr++);
|
||||
*ptr_ret++ = tolower((unsigned char)*ptr++);
|
||||
}
|
||||
|
||||
return ret;
|
||||
|
@ -95,7 +95,7 @@ upper(text *string)
|
|||
|
||||
while (m--)
|
||||
{
|
||||
*ptr_ret++ = toupper(*ptr++);
|
||||
*ptr_ret++ = toupper((unsigned char)*ptr++);
|
||||
}
|
||||
|
||||
return ret;
|
||||
|
@ -135,18 +135,18 @@ initcap(text *string)
|
|||
ptr = VARDATA(string);
|
||||
ptr_ret = VARDATA(ret);
|
||||
|
||||
*ptr_ret++ = toupper(*ptr++);
|
||||
*ptr_ret++ = toupper((unsigned char)*ptr++);
|
||||
--m;
|
||||
|
||||
while (m--)
|
||||
{
|
||||
if (*(ptr_ret - 1) == ' ' || *(ptr_ret - 1) == ' ')
|
||||
{
|
||||
*ptr_ret++ = toupper(*ptr++);
|
||||
*ptr_ret++ = toupper((unsigned char)*ptr++);
|
||||
}
|
||||
else
|
||||
{
|
||||
*ptr_ret++ = tolower(*ptr++);
|
||||
*ptr_ret++ = tolower((unsigned char)*ptr++);
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
@ -7,7 +7,7 @@
|
|||
*
|
||||
*
|
||||
* IDENTIFICATION
|
||||
* $Header: /cvsroot/pgsql/src/backend/utils/adt/varchar.c,v 1.29 1998/02/26 04:37:24 momjian Exp $
|
||||
* $Header: /cvsroot/pgsql/src/backend/utils/adt/varchar.c,v 1.30 1998/04/27 17:08:26 scrappy Exp $
|
||||
*
|
||||
*-------------------------------------------------------------------------
|
||||
*/
|
||||
|
@ -21,6 +21,8 @@ char *convertstr(char *, int, int);
|
|||
|
||||
#endif
|
||||
|
||||
#include "regex/pg_wchar.h"
|
||||
|
||||
/*
|
||||
* CHAR() and VARCHAR() types are part of the ANSI SQL standard. CHAR()
|
||||
* is for blank-padded string whose length is specified in CREATE TABLE.
|
||||
|
@ -213,6 +215,31 @@ bcTruelen(char *arg)
|
|||
|
||||
int32
|
||||
bpcharlen(char *arg)
|
||||
{
|
||||
#ifdef MB
|
||||
unsigned char *s;
|
||||
int len, l, wl;
|
||||
#endif
|
||||
if (!PointerIsValid(arg))
|
||||
elog(ERROR, "Bad (null) char() external representation", NULL);
|
||||
#ifdef MB
|
||||
l = bcTruelen(arg);
|
||||
len = 0;
|
||||
s = VARDATA(arg);
|
||||
while (l > 0) {
|
||||
wl = pg_mblen(s);
|
||||
l -= wl;
|
||||
s += wl;
|
||||
len++;
|
||||
}
|
||||
return(len);
|
||||
#else
|
||||
return (bcTruelen(arg));
|
||||
#endif
|
||||
}
|
||||
|
||||
int32
|
||||
bpcharoctetlen(char *arg)
|
||||
{
|
||||
if (!PointerIsValid(arg))
|
||||
elog(ERROR, "Bad (null) char() external representation", NULL);
|
||||
|
@ -354,9 +381,34 @@ bpcharcmp(char *arg1, char *arg2)
|
|||
int32
|
||||
varcharlen(char *arg)
|
||||
{
|
||||
#ifdef MB
|
||||
unsigned char *s;
|
||||
int len, l, wl;
|
||||
#endif
|
||||
if (!PointerIsValid(arg))
|
||||
elog(ERROR, "Bad (null) varchar() external representation", NULL);
|
||||
|
||||
#ifdef MB
|
||||
len = 0;
|
||||
s = VARDATA(arg);
|
||||
l = VARSIZE(arg) - VARHDRSZ;
|
||||
while (l > 0) {
|
||||
wl = pg_mblen(s);
|
||||
l -= wl;
|
||||
s += wl;
|
||||
len++;
|
||||
}
|
||||
return(len);
|
||||
#else
|
||||
return (VARSIZE(arg) - VARHDRSZ);
|
||||
#endif
|
||||
}
|
||||
|
||||
int32
|
||||
varcharoctetlen(char *arg)
|
||||
{
|
||||
if (!PointerIsValid(arg))
|
||||
elog(ERROR, "Bad (null) varchar() external representation", NULL);
|
||||
return (VARSIZE(arg) - VARHDRSZ);
|
||||
}
|
||||
|
||||
|
|
|
@ -7,7 +7,7 @@
|
|||
*
|
||||
*
|
||||
* IDENTIFICATION
|
||||
* $Header: /cvsroot/pgsql/src/backend/utils/adt/varlena.c,v 1.32 1998/03/15 08:07:01 scrappy Exp $
|
||||
* $Header: /cvsroot/pgsql/src/backend/utils/adt/varlena.c,v 1.33 1998/04/27 17:08:28 scrappy Exp $
|
||||
*
|
||||
*-------------------------------------------------------------------------
|
||||
*/
|
||||
|
@ -18,6 +18,8 @@
|
|||
#include "utils/palloc.h"
|
||||
#include "utils/builtins.h" /* where function declarations go */
|
||||
|
||||
#include "regex/pg_wchar.h"
|
||||
|
||||
/*****************************************************************************
|
||||
* USER I/O ROUTINES *
|
||||
*****************************************************************************/
|
||||
|
@ -198,18 +200,52 @@ textout(text *vlena)
|
|||
|
||||
/*
|
||||
* textlen -
|
||||
* returns the actual length of a text*
|
||||
* returns the logical length of a text*
|
||||
* (which is less than the VARSIZE of the text*)
|
||||
*/
|
||||
int32
|
||||
textlen(text *t)
|
||||
{
|
||||
#ifdef MB
|
||||
unsigned char *s;
|
||||
int len, l, wl;
|
||||
#endif
|
||||
|
||||
if (!PointerIsValid(t))
|
||||
elog(ERROR, "Null input to textlen");
|
||||
|
||||
#ifdef MB
|
||||
len = 0;
|
||||
s = VARDATA(t);
|
||||
l = VARSIZE(t) - VARHDRSZ;
|
||||
while (l > 0) {
|
||||
wl = pg_mblen(s);
|
||||
l -= wl;
|
||||
s += wl;
|
||||
len++;
|
||||
}
|
||||
return(len);
|
||||
#else
|
||||
return (VARSIZE(t) - VARHDRSZ);
|
||||
#endif
|
||||
|
||||
} /* textlen() */
|
||||
|
||||
/*
|
||||
* textoctetlen -
|
||||
* returns the physical length of a text*
|
||||
* (which is less than the VARSIZE of the text*)
|
||||
*/
|
||||
int32
|
||||
textoctetlen(text *t)
|
||||
{
|
||||
if (!PointerIsValid(t))
|
||||
elog(ERROR, "Null input to textoctetlen");
|
||||
|
||||
return (VARSIZE(t) - VARHDRSZ);
|
||||
|
||||
} /* textoctetlen() */
|
||||
|
||||
/*
|
||||
* textcat -
|
||||
* takes two text* and returns a text* that is the concatentation of
|
||||
|
@ -278,17 +314,27 @@ textcat(text *t1, text *t2)
|
|||
*
|
||||
* Note that the arguments operate on octet length,
|
||||
* so not aware of multi-byte character sets.
|
||||
*
|
||||
* Added multi-byte support.
|
||||
* - Tatsuo Ishii 1998-4-21
|
||||
*/
|
||||
text *
|
||||
text_substr(text *string, int32 m, int32 n)
|
||||
{
|
||||
text *ret;
|
||||
int len;
|
||||
#ifdef MB
|
||||
int i;
|
||||
char *p;
|
||||
#endif
|
||||
|
||||
if ((string == (text *) NULL) || (m <= 0))
|
||||
return string;
|
||||
|
||||
len = VARSIZE(string) - VARHDRSZ;
|
||||
#ifdef MB
|
||||
len = pg_mbstrlen_with_len(VARDATA(string),len);
|
||||
#endif
|
||||
|
||||
/* m will now become a zero-based starting position */
|
||||
if (m > len)
|
||||
|
@ -303,6 +349,17 @@ text_substr(text *string, int32 m, int32 n)
|
|||
n = (len - m);
|
||||
}
|
||||
|
||||
#ifdef MB
|
||||
p = VARDATA(string);
|
||||
for (i=0;i<m;i++) {
|
||||
p += pg_mblen(p);
|
||||
}
|
||||
m = p - VARDATA(string);
|
||||
for (i=0;i<n;i++) {
|
||||
p += pg_mblen(p);
|
||||
}
|
||||
n = p - (VARDATA(string) + m);
|
||||
#endif
|
||||
ret = (text *) palloc(VARHDRSZ + n);
|
||||
VARSIZE(ret) = VARHDRSZ + n;
|
||||
|
||||
|
@ -317,6 +374,9 @@ text_substr(text *string, int32 m, int32 n)
|
|||
* Implements the SQL92 POSITION() function.
|
||||
* Ref: A Guide To The SQL Standard, Date & Darwen, 1997
|
||||
* - thomas 1997-07-27
|
||||
*
|
||||
* Added multi-byte support.
|
||||
* - Tatsuo Ishii 1998-4-21
|
||||
*/
|
||||
int32
|
||||
textpos(text *t1, text *t2)
|
||||
|
@ -326,8 +386,11 @@ textpos(text *t1, text *t2)
|
|||
p;
|
||||
int len1,
|
||||
len2;
|
||||
char *p1,
|
||||
pg_wchar *p1,
|
||||
*p2;
|
||||
#ifdef MB
|
||||
pg_wchar *ps1, *ps2;
|
||||
#endif
|
||||
|
||||
if (!PointerIsValid(t1) || !PointerIsValid(t2))
|
||||
return (0);
|
||||
|
@ -337,19 +400,36 @@ textpos(text *t1, text *t2)
|
|||
|
||||
len1 = (VARSIZE(t1) - VARHDRSZ);
|
||||
len2 = (VARSIZE(t2) - VARHDRSZ);
|
||||
#ifdef MB
|
||||
ps1 = p1 = (pg_wchar *) palloc((len1 + 1)*sizeof(pg_wchar));
|
||||
(void)pg_mb2wchar_with_len((unsigned char *)VARDATA(t1),p1,len1);
|
||||
len1 = pg_wchar_strlen(p1);
|
||||
ps2 = p2 = (pg_wchar *) palloc((len2 + 1)*sizeof(pg_wchar));
|
||||
(void)pg_mb2wchar_with_len((unsigned char *)VARDATA(t2),p2,len2);
|
||||
len2 = pg_wchar_strlen(p2);
|
||||
#else
|
||||
p1 = VARDATA(t1);
|
||||
p2 = VARDATA(t2);
|
||||
#endif
|
||||
pos = 0;
|
||||
px = (len1 - len2);
|
||||
for (p = 0; p <= px; p++)
|
||||
{
|
||||
#ifdef MB
|
||||
if ((*p2 == *p1) && (pg_wchar_strncmp(p1, p2, len2) == 0))
|
||||
#else
|
||||
if ((*p2 == *p1) && (strncmp(p1, p2, len2) == 0))
|
||||
#endif
|
||||
{
|
||||
pos = p + 1;
|
||||
break;
|
||||
};
|
||||
p1++;
|
||||
};
|
||||
#ifdef MB
|
||||
pfree(ps1);
|
||||
pfree(ps2);
|
||||
#endif
|
||||
return (pos);
|
||||
} /* textpos() */
|
||||
|
||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -199,6 +199,24 @@ AC_ARG_ENABLE(
|
|||
AC_MSG_RESULT(disabled)
|
||||
)
|
||||
|
||||
AC_MSG_CHECKING(setting MB)
|
||||
AC_ARG_WITH(mb,
|
||||
[ --with-mb=<encoding> enable multi-byte support ],
|
||||
[
|
||||
case "$withval" in
|
||||
EUC_JP|EHC_CN|EUC_KR|EUC_TW|UNICODE|MULE_INTERNAL)
|
||||
MB="$withval";
|
||||
AC_MSG_RESULT("enabled with $withval")
|
||||
;;
|
||||
*)
|
||||
AC_MSG_ERROR([*** You must supply an argument to the --with-mb option one of EUC_JP,EHC_CN,EUC_KR,EUC_TW,UNICODE,MULE_INTERNAL])
|
||||
;;
|
||||
esac
|
||||
MB="$withval"
|
||||
],
|
||||
AC_MSG_RESULT("disabled")
|
||||
)
|
||||
|
||||
dnl We use the default value of 5432 for the DEF_PGPORT value. If
|
||||
dnl we over-ride it with --with-pgport=port then we bypass this piece
|
||||
AC_MSG_CHECKING(setting DEF_PGPORT)
|
||||
|
@ -305,6 +323,7 @@ AC_SUBST(DLSUFFIX)
|
|||
AC_SUBST(DL_LIB)
|
||||
AC_SUBST(USE_TCL)
|
||||
AC_SUBST(USE_PERL)
|
||||
AC_SUBST(MB)
|
||||
|
||||
dnl ****************************************************************
|
||||
dnl Hold off on the C++ stuff until we can figure out why it doesn't
|
||||
|
|
|
@ -6,7 +6,7 @@
|
|||
*
|
||||
* Copyright (c) 1994, Regents of the University of California
|
||||
*
|
||||
* $Id: pg_proc.h,v 1.53 1998/04/27 04:08:07 momjian Exp $
|
||||
* $Id: pg_proc.h,v 1.54 1998/04/27 17:08:41 scrappy Exp $
|
||||
*
|
||||
* NOTES
|
||||
* The script catalog/genbki.sh reads this file and generates .bki
|
||||
|
@ -201,6 +201,8 @@ DATA(insert OID = 1257 ( textlen PGUID 11 f t f 1 f 23 "25" 100 0 1 0 foo
|
|||
DESCR("length");
|
||||
DATA(insert OID = 1258 ( textcat PGUID 11 f t f 2 f 25 "25 25" 100 0 1 0 foo bar ));
|
||||
DESCR("concat");
|
||||
DATA(insert OID = 1377 ( textoctetlen PGUID 11 f t f 1 f 23 "25" 100 0 1 0 foo bar ));
|
||||
DESCR("octet length");
|
||||
DATA(insert OID = 84 ( boolne PGUID 11 f t f 2 f 16 "16 16" 100 0 0 100 foo bar ));
|
||||
DESCR("not equal");
|
||||
|
||||
|
@ -1444,7 +1446,11 @@ DESCR("does not match regex., case-insensitive");
|
|||
|
||||
DATA(insert OID = 1251 ( bpcharlen PGUID 11 f t f 1 f 23 "1042" 100 0 0 100 foo bar ));
|
||||
DESCR("octet length");
|
||||
DATA(insert OID = 1378 ( bpcharoctetlen PGUID 11 f t f 1 f 23 "1042" 100 0 0 100 foo bar ));
|
||||
DESCR("octet length");
|
||||
DATA(insert OID = 1253 ( varcharlen PGUID 11 f t f 1 f 23 "1043" 100 0 0 100 foo bar ));
|
||||
DESCR("character length");
|
||||
DATA(insert OID = 1379 ( varcharoctetlen PGUID 11 f t f 1 f 23 "1043" 100 0 0 100 foo bar ));
|
||||
DESCR("octet length");
|
||||
|
||||
DATA(insert OID = 1263 ( text_timespan PGUID 11 f t f 1 f 1186 "25" 100 0 0 100 foo bar ));
|
||||
|
@ -1550,10 +1556,17 @@ DESCR("convert");
|
|||
DATA(insert OID = 1370 ( timestamp PGUID 14 f t f 1 f 1296 "1184" 100 0 0 100 "select datetime_stamp($1)" - ));
|
||||
DESCR("convert");
|
||||
DATA(insert OID = 1371 ( length PGUID 14 f t f 1 f 23 "25" 100 0 0 100 "select textlen($1)" - ));
|
||||
DESCR("octet length");
|
||||
DESCR("character length");
|
||||
DATA(insert OID = 1372 ( length PGUID 14 f t f 1 f 23 "1042" 100 0 0 100 "select bpcharlen($1)" - ));
|
||||
DESCR("octet length");
|
||||
DESCR("character length");
|
||||
DATA(insert OID = 1373 ( length PGUID 14 f t f 1 f 23 "1043" 100 0 0 100 "select varcharlen($1)" - ));
|
||||
DESCR("character length");
|
||||
|
||||
DATA(insert OID = 1374 ( octet_length PGUID 14 f t f 1 f 23 "25" 100 0 0 100 "select textoctetlen($1)" - ));
|
||||
DESCR("octet length");
|
||||
DATA(insert OID = 1375 ( octet_length PGUID 14 f t f 1 f 23 "1042" 100 0 0 100 "select bpcharoctetlen($1)" - ));
|
||||
DESCR("octet length");
|
||||
DATA(insert OID = 1376 ( octet_length PGUID 14 f t f 1 f 23 "1043" 100 0 0 100 "select varcharoctetlen($1)" - ));
|
||||
DESCR("octet length");
|
||||
|
||||
DATA(insert OID = 1380 ( date_part PGUID 14 f t f 2 f 701 "25 1184" 100 0 0 100 "select datetime_part($1, $2)" - ));
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
/* $Id: pg_wchar.h,v 1.1 1998/03/15 07:38:47 scrappy Exp $ */
|
||||
/* $Id: pg_wchar.h,v 1.2 1998/04/27 17:09:12 scrappy Exp $ */
|
||||
|
||||
#ifndef PG_WCHAR_H
|
||||
#define PG_WCHAR_H
|
||||
|
@ -39,6 +39,9 @@ extern int pg_char_and_wchar_strcmp(const char *, const pg_wchar *);
|
|||
extern int pg_wchar_strncmp(const pg_wchar *, const pg_wchar *, size_t);
|
||||
extern int pg_char_and_wchar_strncmp(const char *, const pg_wchar *, size_t);
|
||||
extern size_t pg_wchar_strlen(const pg_wchar *);
|
||||
extern int pg_mblen(const unsigned char *);
|
||||
extern int pg_mbstrlen(const unsigned char *);
|
||||
extern int pg_mbstrlen_with_len(const unsigned char *, int);
|
||||
#endif
|
||||
|
||||
#endif
|
||||
|
|
|
@ -6,7 +6,7 @@
|
|||
*
|
||||
* Copyright (c) 1994, Regents of the University of California
|
||||
*
|
||||
* $Id: builtins.h,v 1.40 1998/04/26 04:09:25 momjian Exp $
|
||||
* $Id: builtins.h,v 1.41 1998/04/27 17:09:28 scrappy Exp $
|
||||
*
|
||||
* NOTES
|
||||
* This should normally only be included by fmgr.h.
|
||||
|
@ -400,6 +400,7 @@ extern bool bpchargt(char *arg1, char *arg2);
|
|||
extern bool bpcharge(char *arg1, char *arg2);
|
||||
extern int32 bpcharcmp(char *arg1, char *arg2);
|
||||
extern int32 bpcharlen(char *arg);
|
||||
extern int32 bpcharoctetlen(char *arg);
|
||||
extern uint32 hashbpchar(struct varlena * key);
|
||||
|
||||
extern char *varcharin(char *s, int dummy, int16 atttypmod);
|
||||
|
@ -412,6 +413,7 @@ extern bool varchargt(char *arg1, char *arg2);
|
|||
extern bool varcharge(char *arg1, char *arg2);
|
||||
extern int32 varcharcmp(char *arg1, char *arg2);
|
||||
extern int32 varcharlen(char *arg);
|
||||
extern int32 varcharoctetlen(char *arg);
|
||||
extern uint32 hashvarchar(struct varlena * key);
|
||||
|
||||
/* varlena.c */
|
||||
|
@ -425,6 +427,7 @@ extern bool text_le(text *arg1, text *arg2);
|
|||
extern bool text_gt(text *arg1, text *arg2);
|
||||
extern bool text_ge(text *arg1, text *arg2);
|
||||
extern int32 textlen(text *arg);
|
||||
extern int32 textoctetlen(text *arg);
|
||||
extern int32 textpos(text *arg1, text *arg2);
|
||||
extern text *text_substr(text *string, int32 m, int32 n);
|
||||
|
||||
|
|
|
@ -53,3 +53,35 @@ QUERY: select * from
|
|||
コンピュータグラフィックス|分B10中 |
|
||||
(2 rows)
|
||||
|
||||
QUERY: select *,character_length(用語) from 計算機用語;
|
||||
用語 |分類コード|備考1aだよ|length
|
||||
--------------------------+----------+----------+------
|
||||
コンピュータディスプレイ |機A01上 | | 12
|
||||
コンピュータグラフィックス|分B10中 | | 13
|
||||
コンピュータプログラマー |人Z01下 | | 12
|
||||
(3 rows)
|
||||
|
||||
QUERY: select *,octet_length(用語) from 計算機用語;
|
||||
用語 |分類コード|備考1aだよ|octet_length
|
||||
--------------------------+----------+----------+------------
|
||||
コンピュータディスプレイ |機A01上 | | 24
|
||||
コンピュータグラフィックス|分B10中 | | 26
|
||||
コンピュータプログラマー |人Z01下 | | 24
|
||||
(3 rows)
|
||||
|
||||
QUERY: select *,position('デ' in 用語) from 計算機用語;
|
||||
用語 |分類コード|備考1aだよ|strpos
|
||||
--------------------------+----------+----------+------
|
||||
コンピュータディスプレイ |機A01上 | | 7
|
||||
コンピュータグラフィックス|分B10中 | | 0
|
||||
コンピュータプログラマー |人Z01下 | | 0
|
||||
(3 rows)
|
||||
|
||||
QUERY: select *,substring(用語 from 10 for 4) from 計算機用語;
|
||||
用語 |分類コード|備考1aだよ|substr
|
||||
--------------------------+----------+----------+--------
|
||||
コンピュータディスプレイ |機A01上 | |プレイ
|
||||
コンピュータグラフィックス|分B10中 | |ィックス
|
||||
コンピュータプログラマー |人Z01下 | |ラマー
|
||||
(3 rows)
|
||||
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
#!/bin/sh
|
||||
# $Header: /cvsroot/pgsql/src/test/regress/Attic/regress.sh,v 1.18 1998/03/15 07:39:04 scrappy Exp $
|
||||
# $Header: /cvsroot/pgsql/src/test/regress/Attic/regress.sh,v 1.19 1998/04/27 17:10:17 scrappy Exp $
|
||||
#
|
||||
if echo '\c' | grep -s c >/dev/null 2>&1
|
||||
then
|
||||
|
@ -43,7 +43,7 @@ fi
|
|||
echo "=============== running regression queries... ================="
|
||||
echo "" > regression.diffs
|
||||
if [ a$MB != a ];then
|
||||
mbtests=`echo $MB|tr A-Z a-z`
|
||||
mbtests=`echo $MB|tr "[A-Z]" "[a-z]"`
|
||||
else
|
||||
mbtests=""
|
||||
fi
|
||||
|
|
|
@ -13,3 +13,7 @@ select * from
|
|||
select * from 計算機用語 where 分類コード like '_Z%';
|
||||
select * from 計算機用語 where 用語 ~ 'コンピュータ[デグ]';
|
||||
select * from 計算機用語 where 用語 ~* 'コンピュータ[デグ]';
|
||||
select *,character_length(用語) from 計算機用語;
|
||||
select *,octet_length(用語) from 計算機用語;
|
||||
select *,position('デ' in 用語) from 計算機用語;
|
||||
select *,substring(用語 from 10 for 4) from 計算機用語;
|
||||
|
|
Loading…
Reference in New Issue