242 lines
8.4 KiB
Groff
242 lines
8.4 KiB
Groff
.\" $NetBSD: crash.8,v 1.5 1997/10/19 13:04:31 mrg Exp $
|
|
.\"
|
|
.\" Copyright (c) 1980, 1991, 1993
|
|
.\" The Regents of the University of California. All rights reserved.
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\" 3. All advertising materials mentioning features or use of this software
|
|
.\" must display the following acknowledgement:
|
|
.\" This product includes software developed by the University of
|
|
.\" California, Berkeley and its contributors.
|
|
.\" 4. Neither the name of the University nor the names of its contributors
|
|
.\" may be used to endorse or promote products derived from this software
|
|
.\" without specific prior written permission.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
|
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
|
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
.\" SUCH DAMAGE.
|
|
.\"
|
|
.\" from: @(#)crash.8 8.1 (Berkeley) 6/5/93
|
|
.\"
|
|
.TH CRASH 8 "June 5, 1993"
|
|
.UC 4
|
|
.SH NAME
|
|
crash \- UNIX system failures
|
|
.SH DESCRIPTION
|
|
This section explains what happens when the system crashes
|
|
and (very briefly) how to analyze crash dumps.
|
|
.PP
|
|
When the system crashes voluntarily it prints a message of the form
|
|
.IP
|
|
panic: why i gave up the ghost
|
|
.LP
|
|
on the console, takes a dump on a mass storage peripheral,
|
|
and then invokes an automatic reboot procedure as
|
|
described in
|
|
.IR reboot (8).
|
|
(If auto-reboot is disabled on the front panel of the machine the system
|
|
will simply halt at this point.)
|
|
Unless some unexpected inconsistency is encountered in the state
|
|
of the file systems due to hardware or software failure, the system
|
|
will then resume multi-user operations.
|
|
.PP
|
|
The system has a large number of internal consistency checks; if one
|
|
of these fails, then it will panic with a very short message indicating
|
|
which one failed.
|
|
In many instances, this will be the name of the routine which detected
|
|
the error, or a two-word description of the inconsistency.
|
|
A full understanding of most panic messages requires perusal of the
|
|
source code for the system.
|
|
.PP
|
|
The most common cause of system failures is hardware failure, which
|
|
can reflect itself in different ways. Here are the messages which
|
|
are most likely, with some hints as to causes.
|
|
Left unstated in all cases is the possibility that hardware or software
|
|
error produced the message in some unexpected way.
|
|
.TP
|
|
.B iinit
|
|
This cryptic panic message results from a failure to mount the root filesystem
|
|
during the bootstrap process.
|
|
Either the root filesystem has been corrupted,
|
|
or the system is attempting to use the wrong device as root filesystem.
|
|
Usually, an alternative copy of the system binary or an alternative root
|
|
filesystem can be used to bring up the system to investigate.
|
|
.TP
|
|
.B Can't exec /sbin/init
|
|
This is not a panic message, as reboots are likely to be futile.
|
|
Late in the bootstrap procedure, the system was unable to locate
|
|
and execute the initialization process,
|
|
.IR init (8).
|
|
The root filesystem is incorrect or has been corrupted, or the mode
|
|
or type of /sbin/init forbids execution.
|
|
.TP
|
|
.B IO err in push
|
|
.ns
|
|
.TP
|
|
.B hard IO err in swap
|
|
The system encountered an error trying to write to the paging device
|
|
or an error in reading critical information from a disk drive.
|
|
The offending disk should be fixed if it is broken or unreliable.
|
|
.TP
|
|
.B realloccg: bad optim
|
|
.ns
|
|
.TP
|
|
.B ialloc: dup alloc
|
|
.ns
|
|
.TP
|
|
.B alloccgblk: cyl groups corrupted
|
|
.ns
|
|
.TP
|
|
.B ialloccg: map corrupted
|
|
.ns
|
|
.TP
|
|
.B free: freeing free block
|
|
.ns
|
|
.TP
|
|
.B free: freeing free frag
|
|
.ns
|
|
.TP
|
|
.B ifree: freeing free inode
|
|
.ns
|
|
.TP
|
|
.B alloccg: map corrupted
|
|
These panic messages are among those that may be produced
|
|
when filesystem inconsistencies are detected.
|
|
The problem generally results from a failure to repair damaged filesystems
|
|
after a crash, hardware failures, or other condition that should not
|
|
normally occur.
|
|
A filesystem check will normally correct the problem.
|
|
.TP
|
|
.B timeout table overflow
|
|
.ns
|
|
This really shouldn't be a panic, but until the data structure
|
|
involved is made to be extensible, running out of entries causes a crash.
|
|
If this happens, make the timeout table bigger.
|
|
.TP
|
|
.B KSP not valid
|
|
.ns
|
|
.TP
|
|
.B SBI fault
|
|
.ns
|
|
.TP
|
|
.B CHM? in kernel
|
|
These indicate either a serious bug in the system or, more often,
|
|
a glitch or failing hardware.
|
|
If SBI faults recur, check out the hardware or call
|
|
field service. If the other faults recur, there is likely a bug somewhere
|
|
in the system, although these can be caused by a flakey processor.
|
|
Run processor microdiagnostics.
|
|
.TP
|
|
.B machine check %x:
|
|
.I description
|
|
.ns
|
|
.TP
|
|
.I \0\0\0machine dependent machine-check information
|
|
.ns
|
|
Machine checks are different on each type of CPU.
|
|
Most of the internal processor registers are saved at the time of the fault
|
|
and are printed on the console.
|
|
For most processors, there is one line that summarizes the type of machine
|
|
check.
|
|
Often, the nature of the problem is apparent from this messaage
|
|
and/or the contents of key registers.
|
|
The VAX Hardware Handbook should be consulted,
|
|
and, if necessary, your friendly field service people should be informed
|
|
of the problem.
|
|
.TP
|
|
.B trap type %d, code=%x, pc=%x
|
|
A unexpected trap has occurred within the system; the trap types are:
|
|
.sp
|
|
.nf
|
|
0 reserved addressing fault
|
|
1 privileged instruction fault
|
|
2 reserved operand fault
|
|
3 bpt instruction fault
|
|
4 xfc instruction fault
|
|
5 system call trap
|
|
6 arithmetic trap
|
|
7 ast delivery trap
|
|
8 segmentation fault
|
|
9 protection fault
|
|
10 trace trap
|
|
11 compatibility mode fault
|
|
12 page fault
|
|
13 page table fault
|
|
.fi
|
|
.sp
|
|
The favorite trap types in system crashes are trap types 8 and 9,
|
|
indicating
|
|
a wild reference. The code is the referenced address, and the pc at the
|
|
time of the fault is printed. These problems tend to be easy to track
|
|
down if they are kernel bugs since the processor stops cold, but random
|
|
flakiness seems to cause this sometimes.
|
|
The debugger can be used to locate the instruction and subroutine
|
|
corresponding to the PC value.
|
|
If that is insufficient to suggest the nature of the problem,
|
|
more detailed examination of the system status at the time of the trap
|
|
usually can produce an explanation.
|
|
.TP
|
|
.B init died
|
|
The system initialization process has exited. This is bad news, as no new
|
|
users will then be able to log in. Rebooting is the only fix, so the
|
|
system just does it right away.
|
|
.TP
|
|
.B out of mbufs: map full
|
|
The network has exhausted its private page map for network buffers.
|
|
This usually indicates that buffers are being lost, and rather than
|
|
allow the system to slowly degrade, it reboots immediately.
|
|
The map may be made larger if necessary.
|
|
.PP
|
|
That completes the list of panic types you are likely to see.
|
|
.PP
|
|
When the system crashes it writes (or at least attempts to write)
|
|
an image of memory into the back end of the dump device,
|
|
usually the same as the primary swap
|
|
area. After the system is rebooted, the program
|
|
.IR savecore (8)
|
|
runs and preserves a copy of this core image and the current
|
|
system in a specified directory for later perusal. See
|
|
.IR savecore (8)
|
|
for details.
|
|
.PP
|
|
To analyze a dump you should begin by running
|
|
.IR adb (1)
|
|
with the
|
|
.B \-k
|
|
flag on the system load image and core dump.
|
|
If the core image is the result of a panic,
|
|
the panic message is printed.
|
|
Normally the command
|
|
``$c''
|
|
will provide a stack trace from the point of
|
|
the crash and this will provide a clue as to
|
|
what went wrong.
|
|
For more detail
|
|
see
|
|
``Using ADB to Debug the UNIX Kernel''.
|
|
.SH "SEE ALSO"
|
|
adb(1),
|
|
reboot(8)
|
|
.br
|
|
.I "VAX 11/780 System Maintenance Guide"
|
|
and
|
|
.I "VAX Hardware Handbook"
|
|
for more information about machine checks.
|
|
.br
|
|
.I "Using ADB to Debug the UNIX Kernel"
|