postgres/doc/FAQ_AIX
Bruce Momjian aadd8a23ce Update AIX FAQ.
Chris Browne
2006-05-26 19:48:32 +00:00

399 lines
14 KiB
Plaintext

From: Zeugswetter Andreas <ZeugswetterA@spardat.at>
$Date: 2006/05/26 19:48:32 $
On AIX 4.3.2 PostgreSQL compiled with the native IBM compiler xlc
(vac.C 5.0.1) passes all regression tests. Other versions of OS and
compiler should also work. If you don't have a powerpc or use gcc you
might see rounding differences in the geometry regression test.
Use the following configure flags in addition to your own
if you have readline or libz there:
--with-includes=/usr/local/include --with-libraries=/usr/local/lib
There will probably be warnings about 0.0/0.0 division and duplicate
symbols which you can safely ignore.
Compiling PostgreSQL with gcc (2.95.3) on AIX also works.
You need libm.a that is in the fileset bos.adt.libm. (Try the
following command.)
$ lslpp -l bos.adt.libm
---
From: Christopher Browne <cbbrowne@ca.afilias.info>
Date: 2005-07-15
On AIX 5.3, there have been some problems getting PostgreSQL to
compile and run using GCC.
1. You will want to use a version of GCC subsequent to 3.3.2,
particularly if you use a prepackaged version. We had good
success with 4.0.1.
Problems with earlier versions seem to have more to do with the
way IBM packaged GCC than with actual issues with GCC, so that if
you compile GCC yourself, you might well have success with an
earlier version of GCC.
2. AIX 5.3 has a problem where sockadr_storage is not defined to be
large enough. In version 5.3, IBM increased the size of
sockaddr_un, the address structure for UNIX Domain Sockets, but
did not correspondingly increase the size of sockadr_storage.
The result of this is that attempts to use UDS with PostgreSQL
lead to libpq overflowing the data structure. TCP/IP connections
work OK, but not UDS, which prevents the regression tests from
working.
The nonconformance may be readily demonstrated by compiling and
running the following C program which calculates and compares the
sizes of the various structures:
test_size.c
------------
---------- snip here - test_size.c ----------------------------
#include <stdio.h>
#include <sys/un.h>
#include <sys/socket.h>
int main (int argc, char *argv[]) {
struct sockaddr_storage a;
struct sockaddr_un b;
printf("Size of sockadr_storage: %d\n", sizeof(a));
printf ("Size of sockaddr_un:%d\n", sizeof(b));
if (sizeof(a) >= sizeof(b))
printf ("Conformant to RFC 3493\n");
else
printf ("Non-conformant to RFC 3493\n");
}
---------- snip here - test_size.c ----------------------------
The problem was reported to IBM, and is recorded as bug report
PMR29657.
An immediate resolution is to alter _SS_MAXSIZE to = 1025 in
/usr/include/sys/socket.h, which will resolve the immediate problem.
It appears that the "final" resolution will be to alter _SS_MAXSIZE to
1280, making the size nicely align with page boundaries.
IBM will be providing a fix in the next maintenance release (expected
in October 2005) with an updated socket.h.
---
PMR29657 was resolved in APAR IY74147: INCOMPATIBILITY BETWEEN
SOCKADDR_UN AND SOCKADDR_STORAGE STRUCT
APAR information
APAR number IY74147
Reported component name AIX 5.3
Reported component ID 5765G0300
Reported release 530
Status CLOSED PER
PE NoPE
HIPER NoHIPER
Submitted date 2005-07-18
Closed date 2005-07-18
Last modified date 2005-09-06
If you upgrade to maintenance level 5300-03, that will include this
fix. Use the command "oslevel -r" to determine what maintenance level
you are at.
---
From: Christopher Browne <cbbrowne@ca.afilias.info>
Date: 2005-07-15
Some of the AIX tools may be "a little different" from what you may be
accustomed to on other platforms. If you are looking for a version of
ldd, useful for determining what object code depends on what
libraries, the following URLs may help you...
http://www.faqs.org/faqs/aix-faq/part4/section-22.html
http://www.han.de/~jum/aix/ldd.c
---
From: Christopher Browne <cbbrowne@ca.afilias.info>
Date: 2005-11-02
On AIX 5.3 ML3 (e.g. maintenance level 5300-03), there is some problem
with the handling of the pointer to memcpy. It is speculated that
this relates to some linker bug that may have been introduced between
5300-02 and 5300-03, but we have so far been unable to track down the
cause.
At any rate, the following patch, which "unwraps" the function
reference, has been observed to allow PG 8.1 pre-releases to pass
regression tests.
The same behaviour (albeit with varying underlying functions to
"blame") has been observed when compiling with either GCC 4.0 or IBM
XLC.
------------ per Seneca Cunningham -------------------
The following patch works on the AIX 5.3 ML3 box here and didn't cause
any problems with postgres on the x86 desktop. It's just a cleaner
version of what I tried earlier.
*** dynahash.c.orig Tue Nov 1 19:41:42 2005
--- dynahash.c Tue Nov 1 20:30:33 2005
***************
*** 670,676 ****
/* copy key into record */
currBucket->hashvalue = hashvalue;
! hashp->keycopy(ELEMENTKEY(currBucket), keyPtr, keysize);
/* caller is expected to fill the data field on return */
--- 670,687 ----
/* copy key into record */
currBucket->hashvalue = hashvalue;
! if (hashp->keycopy == memcpy)
! {
! memcpy(ELEMENTKEY(currBucket), keyPtr, keysize);
! }
! else if (hashp->keycopy == strncpy)
! {
! strncpy(ELEMENTKEY(currBucket), keyPtr, keysize);
! }
! else
! {
! hashp->keycopy(ELEMENTKEY(currBucket), keyPtr, keysize);
! }
/* caller is expected to fill the data field on return */
------------ per Seneca Cunningham -------------------
---
AIX, readline, and postgres 8.1.x:
----------------------------------
If make check doesn't work on AIX with initdb going into an infinite
loop or failing with child processes terminated with signal 11, the
problem could be the installed copy of readline. Previously a patch to
dynahash.c was suggested to get around this, don't use it, better ways
to get postgres working exist.
See <http://archives.postgresql.org/pgsql-patches/2005-11/msg00139.php>
for details about the problem.
Working around the problem:
---------------------------
Try one of the following:
o Use the new 8.2devel backend Makefile:
After the matter of readline's export list and the problems that were
occurring on AIX because of it being linked to the backend, a filter
to exclude unneeded libraries from being linked against the backend was
added. Get revision 1.112 of src/backend/Makefile from CVS and replace
the copy that came with postgres with it. Build normally.
o Use libedit
There are a few libedit ports available online. Build and install the
desired port. If libreadline.a can be found in /lib, /usr/lib, or in
any location passed to postgres' configure via "--with-libraries=",
readline will be detected and used by postgres. IBM's rpm of readline
creates a symlink to /opt/freeware/lib/libreadline.a in /lib, so merely
excluding /opt/freeware/lib from the passed library path does not stop
readline from being used.
If the linker cannot avoid finding libreadline.a, use revision 1.433
configure.in and 1.19 config/programs.m4 from CVS, change 8.2devel to
the appropriate 8.1.x in configure.in and run autoconf. Add the
configure flag "--with-libedit-preferred".
If the version of libedit used calls its "history.h" something other
than history.h, place a symlink called history.h to it somewhere that
the C preprocessor will check.
o Configure with "--without-readline"
postgres can be configured with the option "--without-readline". When
this is enabled, postgres will not link against libreadline or libedit.
psql will not have history, tab completion, or any of the other niceties
that readline and libedit bring, but external readline wrappers exist
that add that functionality.
o Use readline 5.0
Readline 5.0 does not induce the problems, however it does export
memcpy and strncpy when built using the easy method of "-bexpall". Like
4.3, it is possible to do a build that does not export these symbols,
but it does take considerable manual effort and the creation of export
files.
References
----------
"AIX 5L Porting Guide"
IBM Redbook
http://www.redbooks.ibm.com/redbooks/pdfs/sg246034.pdf
http://www.redbooks.ibm.com/abstracts/sg246034.html?Open
"Developing and Porting C and C++ Applications on AIX"
IBM Redbook
http://www.redbooks.ibm.com/redbooks/pdfs/sg245674.pdf
http://www.redbooks.ibm.com/abstracts/sg245674.html?Open
-----
AIX Memory Management: An Overview
==================================
by Seneca Cunningham...
AIX can be somewhat peculiar with regards to the way it does memory
management. You can have a server with many multiples of gigabytes of
RAM free, but still get out of memory or address space errors when
running applications.
Two examples of AIX-specific memory problems
--------------------------------------------
Both examples were from systems with gigabytes of free RAM.
a) createlang failing with unusual errors
Running as the owner of the postgres install:
-bash-3.00$ createlang plpgsql template1
createlang: language installation failed: ERROR: could not load library
"/opt/dbs/pgsql748/lib/plpgsql.so": A memory address is not in the
address space for the process.
Running as a non-owner in the group posessing the postgres install:
-bash-3.00$ createlang plpgsql template1
createlang: language installation failed: ERROR: could not load library
"/opt/dbs/pgsql748/lib/plpgsql.so": Bad address
b) out of memory errors in the postgres logs
Every memory allocation near or greater than 256MB failing.
The cause of these problems
----------------------------
The overall cause of all these problems is the default bittedness and
memory model used by the postmaster process.
By default, all binaries built on AIX are 32-bit. This does not
depend upon hardware type or kernel in use. These 32-bit processes
are limited to 4GB of memory laid out in 256MB segments using one of a
few models. The default allows for less than 256MB in the heap as it
shares a single segment with the stack.
In the case of example a), above, check your umask and the permissions
of the binaries in your postgres install. The binaries involved in
that example were 32-bit and installed as mode 750 instead of 755.
Due to the permissions being set in this fashion, only the owner or a
member of the possessing group can load the library. Since it isn't
world-readable, the loader places the object into the process' heap
instead of the shared library segments where it would otherwise be
placed.
Solutions and workarounds
-------------------------
In this section, all build flag syntax is presented for gcc.
The "ideal" solution for this is to use a 64-bit build of postgres,
but that's not always practical. Systems with 32-bit processors can
build, but not run, 64-bit binaries.
If a 32-bit binary is desired, set LDR_CNTRL to "MAXDATA=0xn0000000",
where 1 <= n <= 8, before starting the postmaster and try different
values and postgresql.conf settings to find a configuration that works
satisfactorily. This use of LDR_CNTRL tells AIX that you want the
postmaster to have $MAXDATA bytes set aside for the heap, allocated in
256MB segments.
When you find a workable configuration, ldedit can be used to modify
the binaries so that they default to using the desired heap size.
PostgreSQL might also be rebuilt, passing configure
LDFLAGS="-Wl,-bmaxdata:0xn0000000" to achieve the same effect.
For a 64-bit build, set OBJECT_MODE to 64 and pass CC="gcc -maix64"
and LDFLAGS="-Wl,-bbigtoc" to configure. If you omit the export of
OBJECT_MODE, your build may fail with linker errors. When OBJECT_MODE
is set, it tells AIX's build utilities such as ar, as, and ld what
type of objects to default to handling.
Overcommit
----------
By default, overcommit of paging space can happen. While I have not
seen this occur, AIX will kill processes when it runs out of memory
and the overcommit is accessed. The closest to this that I have seen
is fork failing because the system decided that there was not enough
memory for another process. Like many other parts of AIX, the paging
space allocation method and out-of-memory kill is configurable on a
system- or process-wide basis if this becomes a problem.
References and resources
------------------------
"Large Program Support"
AIX Documentation: General Programming Concepts: Writing and Debugging Programs
http://publib.boulder.ibm.com/infocenter/pseries/topic/com.ibm.aix.doc/aixprggd/genprogc/lrg_prg_support.htm
"Program Address Space Overview"
AIX Documentation: General Programming Concepts: Writing and Debugging Programs
http://publib.boulder.ibm.com/infocenter/pseries/topic/com.ibm.aix.doc/aixprggd/genprogc/address_space.htm
"Performance Overview of the Virtual Memory Manager (VMM)"
AIX Documentation: Performance Management Guide
http://publib.boulder.ibm.com/infocenter/pseries/v5r3/topic/com.ibm.aix.doc/aixbman/prftungd/resmgmt2.htm
"Page Space Allocation"
AIX Documentation: Performance Management Guide
http://publib.boulder.ibm.com/infocenter/pseries/v5r3/topic/com.ibm.aix.doc/aixbman/prftungd/memperf7.htm
"Paging-space thresholds tuning"
AIX Documentation: Performance Management Guide
http://publib.boulder.ibm.com/infocenter/pseries/v5r3/topic/com.ibm.aix.doc/aixbman/prftungd/memperf6.htm
"Developing and Porting C and C++ Applications on AIX"
IBM Redbook
http://www.redbooks.ibm.com/redbooks/pdfs/sg245674.pdf
http://www.redbooks.ibm.com/abstracts/sg245674.html?Open
Statistics Collector Fun on AIX
--------------------------------
When implementing PostgreSQL version 8.1 on AIX 5.3, we periodically
ran into problems where the statistics collector would "mysteriously"
not come up successfully.
This appears to be the result of unexpected behaviour in the IPv6
implementation. It looks like PostgreSQL and IPv6 do not play very
well together at this time on AIX.
Any of the following actions "fix" the problem.
1. Delete the localhost ipv6 address
(as root)
# ifconfig lo0 inet6 ::1/0 delete
2. Remove IPv6 from net services. The file /etc/netsvc.conf, on AIX,
is roughly equivalent to /etc/nsswitch.conf on Solaris/Linux.
The default, on AIX, is thus:
hosts=local,bind
Replace this with:
hosts=local4,bind4
to deactivate searching for IPv6 addresses.