192 lines
6.7 KiB
Plaintext
192 lines
6.7 KiB
Plaintext
1 - Purpose of this document
|
|
============================
|
|
|
|
This document describes how to debug parts of the Postfix mail
|
|
system, either by making the software log a lot of detail to the
|
|
syslog daemon, or by running some daemon processes under control
|
|
of an interactive debugger.
|
|
|
|
2 - Verbose logging for specific SMTP connections
|
|
=================================================
|
|
|
|
In /etc/postfix/main.cf, list the remote site name or address in
|
|
the "debug_peer_list" parameter. For example, in order to make the
|
|
software log a lot of information to the syslog daemon for connections
|
|
from or to the loopback interface:
|
|
|
|
debug_peer_list = 127.0.0.1
|
|
|
|
You can specify one or more hosts, domains, addresses or net/masks.
|
|
|
|
2b - Record the SMTP connection with a sniffer
|
|
==============================================
|
|
|
|
This example uses tcpdump. In order to record a conversation you
|
|
need to specify a large enough buffer or else you will miss some
|
|
or all of the packet payload.
|
|
|
|
tcpdump -w /file/name -s 2000 host hostname and port 25
|
|
|
|
Run this for a while, stop with Ctrl-C when done. To view the data
|
|
use a binary viewer, or use my tcpdumpx utility that is available
|
|
from ftp://ftp.porcupine.org/pub/debugging.
|
|
|
|
3 - Making Postfix daemon programs more verbose
|
|
===============================================
|
|
|
|
Append one or more -v options to selected daemon definitions in
|
|
/etc/postfix/master.cf and type "postfix reload". This will cause
|
|
a lot of activity to be logged to the syslog daemon.
|
|
|
|
4 - Manually tracing a Postfix daemon process
|
|
=============================================
|
|
|
|
Some systems allow you to inspect a running process with a system
|
|
call tracer. For example:
|
|
|
|
# trace -p process-id (SunOS 4)
|
|
# strace -p process-id (Linux and many others)
|
|
# truss -p process-id (Solaris, FreeBSD)
|
|
# ktrace -p process-id (generic 4.4BSD)
|
|
|
|
Even more informative are traces of system library calls. Examples:
|
|
|
|
# ltrace -p process-id (Linux, also ported to FreeBSD and BSD/OS)
|
|
# sotruss -p process-id (Solaris)
|
|
|
|
See your system documentation for details.
|
|
|
|
Tracing a running process can give valuable information about what
|
|
a process is attempting to do. This is as much information as you
|
|
can get without running an interactive debugger program, as described
|
|
in a later section.
|
|
|
|
5 - Automatically tracing a Postfix daemon process
|
|
==================================================
|
|
|
|
Postfix can attach a call tracer whenever a daemon process starts.
|
|
Call tracers come in several kinds.
|
|
|
|
1) System call tracers such as trace, truss, strace, or ktrace.
|
|
These show the communication between the process and the kernel.
|
|
|
|
2) Library call tracers such as sotruss and ltrace. These show
|
|
calls of library routines, and give a better idea of what is
|
|
going on within the process.
|
|
|
|
Append a -D option to the suspect command in /etc/postfix/master.cf,
|
|
for example:
|
|
|
|
smtp inet n - n - - smtpd -D
|
|
|
|
Edit the debugger_command definition in /etc/postfix/main.cf so
|
|
that it invokes the call tracer of your choice, for example:
|
|
|
|
debugger_command =
|
|
PATH=/bin:/usr/bin:/usr/local/bin;
|
|
(truss -p $process_id 2>&1 | logger -p mail.info) & sleep 5
|
|
|
|
Type "postfix reload" and watch the logfile.
|
|
|
|
6 - Running daemon programs under a debugger
|
|
============================================
|
|
|
|
Append a -D option to the suspect command in /etc/postfix/master.cf,
|
|
for example:
|
|
|
|
smtp inet n - n - - smtpd -D
|
|
|
|
Edit the debugger_command definition in /etc/postfix/main.cf so
|
|
that it invokes the debugger of your choice.
|
|
|
|
Two choices are described in detail:
|
|
|
|
1) If you do not have X Windows installed on the Postfix machine,
|
|
or if you are not familiar with interactive debuggers, then you
|
|
can try to run gdb in non-interactive mode:
|
|
|
|
/etc/postfix/main.cf:
|
|
--------------------
|
|
debugger_command =
|
|
PATH=/bin:/usr/bin:/usr/local/bin; export PATH; (echo cont;
|
|
echo where) | gdb $daemon_directory/$process_name $process_id 2>&1
|
|
>$config_directory/$process_name.$process_id.log & sleep 5
|
|
|
|
Type "postfix reload" to make the configuration changes effective.
|
|
|
|
Whenever a suspect daemon process is started, an output file
|
|
is created, named after the daemon and process ID (for example,
|
|
smtpd.12345.log). When the process crashes, a stack trace (with
|
|
output from the "where" command) is written to its logfile.
|
|
|
|
2) If you have X Windows installed on the Postfix machine, then
|
|
an interactive debugger such as xxgdb can be convenient.
|
|
|
|
/etc/postfix/main.cf:
|
|
--------------------
|
|
debugger_command =
|
|
PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin
|
|
xxgdb $daemon_directory/$process_name $process_id & sleep 5
|
|
|
|
Be sure that gdb is in the command search path, and export
|
|
XAUTHORITY so that X access control works, for example:
|
|
|
|
% setenv XAUTHORITY ~/.Xauthority
|
|
|
|
Stop and start the Postfix system. This is necessary so that
|
|
Postfix runs with the proper XAUTHORITY and DISPLAY settings.
|
|
|
|
Whenever the suspect daemon process is started, a debugger window
|
|
pops up and you can watch in detail what happens (when using
|
|
xxgdb) or a file is created (if using gdb in non-interactive
|
|
mode).
|
|
|
|
7 - Unreasonable behavior
|
|
=========================
|
|
|
|
Sometimes the behavior exhibit by Postfix just does not match the
|
|
source code. Why can a program deviate from the instructions given
|
|
by its author? There are two possibilities.
|
|
|
|
1 - The compiler has messed up.
|
|
|
|
2 - The hardware has messed up.
|
|
|
|
In both cases, the program being executed is not the program that
|
|
was supposed to be executed, so anything can happen.
|
|
|
|
There is a third possibility:
|
|
|
|
3 - Bugs in system software (kernel or libraries).
|
|
|
|
Hardware-related failures happen erratically, and they usually do
|
|
not reproduce after power cycling and rebooting the system. There's
|
|
little I can do about bad hardware. Be sure to use hardware that
|
|
at the very least can detect memory errors. Otherwise, Postfix will
|
|
just be a sitting duck waiting to be hit by a bit error. Critical
|
|
systems deserve real hardware.
|
|
|
|
When a compiler messes up, the problem can be reproduced whenever
|
|
the resulting program is run. Compiler errors are most likely to
|
|
happen in the code optimizer. If a problem is reproducible across
|
|
power cycles and system reboots, it can be worthwhile to rebuild
|
|
Postfix with optimization disabled, and to see if optimization
|
|
makes a difference.
|
|
|
|
In order to compile Postfix with optimizations turned off:
|
|
|
|
% make tidy
|
|
% make makefiles OPT=
|
|
|
|
This produces a set of Makefiles that do not request compiler
|
|
optimization.
|
|
|
|
Once the makefiles are set up, build the software:
|
|
|
|
% make
|
|
% su
|
|
# make install
|
|
|
|
And see if the problem reproduces. If the problem goes away, talk
|
|
to your vendor.
|