1094 lines
42 KiB
Plaintext
1094 lines
42 KiB
Plaintext
This is gdbint.info, produced by Makeinfo version 3.12f from
|
||
./gdbint.texinfo.
|
||
|
||
START-INFO-DIR-ENTRY
|
||
* Gdb-Internals: (gdbint). The GNU debugger's internals.
|
||
END-INFO-DIR-ENTRY
|
||
|
||
This file documents the internals of the GNU debugger GDB.
|
||
|
||
Copyright 1990-1999 Free Software Foundation, Inc. Contributed by
|
||
Cygnus Solutions. Written by John Gilmore. Second Edition by Stan
|
||
Shebs.
|
||
|
||
Permission is granted to make and distribute verbatim copies of this
|
||
manual provided the copyright notice and this permission notice are
|
||
preserved on all copies.
|
||
|
||
Permission is granted to copy or distribute modified versions of this
|
||
manual under the terms of the GPL (for which purpose this text may be
|
||
regarded as a program in the language TeX).
|
||
|
||
|
||
File: gdbint.info, Node: Top, Next: Requirements, Up: (dir)
|
||
|
||
Scope of this Document
|
||
**********************
|
||
|
||
This document documents the internals of the GNU debugger, GDB. It
|
||
includes description of GDB's key algorithms and operations, as well as
|
||
the mechanisms that adapt GDB to specific hosts and targets.
|
||
|
||
* Menu:
|
||
|
||
* Requirements::
|
||
* Overall Structure::
|
||
* Algorithms::
|
||
* User Interface::
|
||
* Symbol Handling::
|
||
* Language Support::
|
||
* Host Definition::
|
||
* Target Architecture Definition::
|
||
* Target Vector Definition::
|
||
* Native Debugging::
|
||
* Support Libraries::
|
||
* Coding::
|
||
* Porting GDB::
|
||
* Testsuite::
|
||
* Hints::
|
||
|
||
|
||
File: gdbint.info, Node: Requirements, Next: Overall Structure, Prev: Top, Up: Top
|
||
|
||
Requirements
|
||
************
|
||
|
||
Before diving into the internals, you should understand the formal
|
||
requirements and other expectations for GDB. Although some of these may
|
||
seem obvious, there have been proposals for GDB that have run counter to
|
||
these requirements.
|
||
|
||
First of all, GDB is a debugger. It's not designed to be a front
|
||
panel for embedded systems. It's not a text editor. It's not a shell.
|
||
It's not a programming environment.
|
||
|
||
GDB is an interactive tool. Although a batch mode is available,
|
||
GDB's primary role is to interact with a human programmer.
|
||
|
||
GDB should be responsive to the user. A programmer hot on the trail
|
||
of a nasty bug, and operating under a looming deadline, is going to be
|
||
very impatient of everything, including the response time to debugger
|
||
commands.
|
||
|
||
GDB should be relatively permissive, such as for expressions. While
|
||
the compiler should be picky (or have the option to be made picky),
|
||
since source code lives for a long time usually, the programmer doing
|
||
debugging shouldn't be spending time figuring out to mollify the
|
||
debugger.
|
||
|
||
GDB will be called upon to deal with really large programs.
|
||
Executable sizes of 50 to 100 megabytes occur regularly, and we've
|
||
heard reports of programs approaching 1 gigabyte in size.
|
||
|
||
GDB should be able to run everywhere. No other debugger is available
|
||
for even half as many configurations as GDB supports.
|
||
|
||
|
||
File: gdbint.info, Node: Overall Structure, Next: Algorithms, Prev: Requirements, Up: Top
|
||
|
||
Overall Structure
|
||
*****************
|
||
|
||
GDB consists of three major subsystems: user interface, symbol
|
||
handling (the "symbol side"), and target system handling (the "target
|
||
side").
|
||
|
||
Ther user interface consists of several actual interfaces, plus
|
||
supporting code.
|
||
|
||
The symbol side consists of object file readers, debugging info
|
||
interpreters, symbol table management, source language expression
|
||
parsing, type and value printing.
|
||
|
||
The target side consists of execution control, stack frame analysis,
|
||
and physical target manipulation.
|
||
|
||
The target side/symbol side division is not formal, and there are a
|
||
number of exceptions. For instance, core file support involves symbolic
|
||
elements (the basic core file reader is in BFD) and target elements (it
|
||
supplies the contents of memory and the values of registers). Instead,
|
||
this division is useful for understanding how the minor subsystems
|
||
should fit together.
|
||
|
||
The Symbol Side
|
||
===============
|
||
|
||
The symbolic side of GDB can be thought of as "everything you can do
|
||
in GDB without having a live program running". For instance, you can
|
||
look at the types of variables, and evaluate many kinds of expressions.
|
||
|
||
The Target Side
|
||
===============
|
||
|
||
The target side of GDB is the "bits and bytes manipulator". Although
|
||
it may make reference to symbolic info here and there, most of the
|
||
target side will run with only a stripped executable available - or
|
||
even no executable at all, in remote debugging cases.
|
||
|
||
Operations such as disassembly, stack frame crawls, and register
|
||
display, are able to work with no symbolic info at all. In some cases,
|
||
such as disassembly, GDB will use symbolic info to present addresses
|
||
relative to symbols rather than as raw numbers, but it will work either
|
||
way.
|
||
|
||
Configurations
|
||
==============
|
||
|
||
"Host" refers to attributes of the system where GDB runs. "Target"
|
||
refers to the system where the program being debugged executes. In
|
||
most cases they are the same machine, in which case a third type of
|
||
"Native" attributes come into play.
|
||
|
||
Defines and include files needed to build on the host are host
|
||
support. Examples are tty support, system defined types, host byte
|
||
order, host float format.
|
||
|
||
Defines and information needed to handle the target format are target
|
||
dependent. Examples are the stack frame format, instruction set,
|
||
breakpoint instruction, registers, and how to set up and tear down the
|
||
stack to call a function.
|
||
|
||
Information that is only needed when the host and target are the
|
||
same, is native dependent. One example is Unix child process support;
|
||
if the host and target are not the same, doing a fork to start the
|
||
target process is a bad idea. The various macros needed for finding the
|
||
registers in the `upage', running `ptrace', and such are all in the
|
||
native-dependent files.
|
||
|
||
Another example of native-dependent code is support for features that
|
||
are really part of the target environment, but which require `#include'
|
||
files that are only available on the host system. Core file handling
|
||
and `setjmp' handling are two common cases.
|
||
|
||
When you want to make GDB work "native" on a particular machine, you
|
||
have to include all three kinds of information.
|
||
|
||
|
||
File: gdbint.info, Node: Algorithms, Next: User Interface, Prev: Overall Structure, Up: Top
|
||
|
||
Algorithms
|
||
**********
|
||
|
||
GDB uses a number of debugging-specific algorithms. They are often
|
||
not very complicated, but get lost in the thicket of special cases and
|
||
real-world issues. This chapter describes the basic algorithms and
|
||
mentions some of the specific target definitions that they use.
|
||
|
||
Frames
|
||
======
|
||
|
||
A frame is a construct that GDB uses to keep track of calling and
|
||
called functions.
|
||
|
||
`FRAME_FP' in the machine description has no meaning to the
|
||
machine-independent part of GDB, except that it is used when setting up
|
||
a new frame from scratch, as follows:
|
||
|
||
create_new_frame (read_register (FP_REGNUM), read_pc ()));
|
||
|
||
Other than that, all the meaning imparted to `FP_REGNUM' is imparted
|
||
by the machine-dependent code. So, `FP_REGNUM' can have any value that
|
||
is convenient for the code that creates new frames.
|
||
(`create_new_frame' calls `INIT_EXTRA_FRAME_INFO' if it is defined;
|
||
that is where you should use the `FP_REGNUM' value, if your frames are
|
||
nonstandard.)
|
||
|
||
Given a GDB frame, define `FRAME_CHAIN' to determine the address of
|
||
the calling function's frame. This will be used to create a new GDB
|
||
frame struct, and then `INIT_EXTRA_FRAME_INFO' and `INIT_FRAME_PC' will
|
||
be called for the new frame.
|
||
|
||
Breakpoint Handling
|
||
===================
|
||
|
||
In general, a breakpoint is a user-designated location in the program
|
||
where the user wants to regain control if program execution ever reaches
|
||
that location.
|
||
|
||
There are two main ways to implement breakpoints; either as
|
||
"hardware" breakpoints or as "software" breakpoints.
|
||
|
||
Hardware breakpoints are sometimes available as a builtin debugging
|
||
features with some chips. Typically these work by having dedicated
|
||
register into which the breakpoint address may be stored. If the PC
|
||
ever matches a value in a breakpoint registers, the CPU raises an
|
||
exception and reports it to GDB. Another possibility is when an
|
||
emulator is in use; many emulators include circuitry that watches the
|
||
address lines coming out from the processor, and force it to stop if the
|
||
address matches a breakpoint's address. A third possibility is that the
|
||
target already has the ability to do breakpoints somehow; for instance,
|
||
a ROM monitor may do its own software breakpoints. So although these
|
||
are not literally "hardware breakpoints", from GDB's point of view they
|
||
work the same; GDB need not do nothing more than set the breakpoint and
|
||
wait for something to happen.
|
||
|
||
Since they depend on hardware resources, hardware breakpoints may be
|
||
limited in number; when the user asks for more, GDB will start trying to
|
||
set software breakpoints.
|
||
|
||
Software breakpoints require GDB to do somewhat more work. The basic
|
||
theory is that GDB will replace a program instruction with a trap,
|
||
illegal divide, or some other instruction that will cause an exception,
|
||
and then when it's encountered, GDB will take the exception and stop the
|
||
program. When the user says to continue, GDB will restore the original
|
||
instruction, single-step, re-insert the trap, and continue on.
|
||
|
||
Since it literally overwrites the program being tested, the program
|
||
area must be writeable, so this technique won't work on programs in
|
||
ROM. It can also distort the behavior of programs that examine
|
||
themselves, although the situation would be highly unusual.
|
||
|
||
Also, the software breakpoint instruction should be the smallest
|
||
size of instruction, so it doesn't overwrite an instruction that might
|
||
be a jump target, and cause disaster when the program jumps into the
|
||
middle of the breakpoint instruction. (Strictly speaking, the
|
||
breakpoint must be no larger than the smallest interval between
|
||
instructions that may be jump targets; perhaps there is an architecture
|
||
where only even-numbered instructions may jumped to.) Note that it's
|
||
possible for an instruction set not to have any instructions usable for
|
||
a software breakpoint, although in practice only the ARC has failed to
|
||
define such an instruction.
|
||
|
||
The basic definition of the software breakpoint is the macro
|
||
`BREAKPOINT'.
|
||
|
||
Basic breakpoint object handling is in `breakpoint.c'. However,
|
||
much of the interesting breakpoint action is in `infrun.c'.
|
||
|
||
Single Stepping
|
||
===============
|
||
|
||
Signal Handling
|
||
===============
|
||
|
||
Thread Handling
|
||
===============
|
||
|
||
Inferior Function Calls
|
||
=======================
|
||
|
||
Longjmp Support
|
||
===============
|
||
|
||
GDB has support for figuring out that the target is doing a
|
||
`longjmp' and for stopping at the target of the jump, if we are
|
||
stepping. This is done with a few specialized internal breakpoints,
|
||
which are visible in the `maint info breakpoint' command.
|
||
|
||
To make this work, you need to define a macro called
|
||
`GET_LONGJMP_TARGET', which will examine the `jmp_buf' structure and
|
||
extract the longjmp target address. Since `jmp_buf' is target
|
||
specific, you will need to define it in the appropriate `tm-XYZ.h'
|
||
file. Look in `tm-sun4os4.h' and `sparc-tdep.c' for examples of how to
|
||
do this.
|
||
|
||
|
||
File: gdbint.info, Node: User Interface, Next: Symbol Handling, Prev: Algorithms, Up: Top
|
||
|
||
User Interface
|
||
**************
|
||
|
||
GDB has several user interfaces. Although the command-line interface
|
||
is the most common and most familiar, there are others.
|
||
|
||
Command Interpreter
|
||
===================
|
||
|
||
The command interpreter in GDB is fairly simple. It is designed to
|
||
allow for the set of commands to be augmented dynamically, and also has
|
||
a recursive subcommand capability, where the first argument to a
|
||
command may itself direct a lookup on a different command list.
|
||
|
||
For instance, the `set' command just starts a lookup on the
|
||
`setlist' command list, while `set thread' recurses to the
|
||
`set_thread_cmd_list'.
|
||
|
||
To add commands in general, use `add_cmd'. `add_com' adds to the
|
||
main command list, and should be used for those commands. The usual
|
||
place to add commands is in the `_initialize_XYZ' routines at the ends
|
||
of most source files.
|
||
|
||
Before removing commands from the command set it is a good idea to
|
||
deprecate them for some time. Use `deprecate_cmd' on commands or
|
||
aliases to set the deprecated flag. `deprecate_cmd' takes a `struct
|
||
cmd_list_element' as it's first argument. You can use the return value
|
||
from `add_com' or `add_cmd' to deprecate the command immediately after
|
||
it is created.
|
||
|
||
The first time a comamnd is used the user will be warned and offered
|
||
a replacement (if one exists). Note that the replacement string passed
|
||
to `deprecate_cmd' should be the full name of the command, i.e. the
|
||
entire string the user should type at the command line.
|
||
|
||
Console Printing
|
||
================
|
||
|
||
TUI
|
||
===
|
||
|
||
libgdb
|
||
======
|
||
|
||
`libgdb' was an abortive project of years ago. The theory was to
|
||
provide an API to GDB's functionality.
|
||
|
||
|
||
File: gdbint.info, Node: Symbol Handling, Next: Language Support, Prev: User Interface, Up: Top
|
||
|
||
Symbol Handling
|
||
***************
|
||
|
||
Symbols are a key part of GDB's operation. Symbols include
|
||
variables, functions, and types.
|
||
|
||
Symbol Reading
|
||
==============
|
||
|
||
GDB reads symbols from "symbol files". The usual symbol file is the
|
||
file containing the program which GDB is debugging. GDB can be directed
|
||
to use a different file for symbols (with the `symbol-file' command),
|
||
and it can also read more symbols via the "add-file" and "load"
|
||
commands, or while reading symbols from shared libraries.
|
||
|
||
Symbol files are initially opened by code in `symfile.c' using the
|
||
BFD library. BFD identifies the type of the file by examining its
|
||
header. `find_sym_fns' then uses this identification to locate a set
|
||
of symbol-reading functions.
|
||
|
||
Symbol reading modules identify themselves to GDB by calling
|
||
`add_symtab_fns' during their module initialization. The argument to
|
||
`add_symtab_fns' is a `struct sym_fns' which contains the name (or name
|
||
prefix) of the symbol format, the length of the prefix, and pointers to
|
||
four functions. These functions are called at various times to process
|
||
symbol-files whose identification matches the specified prefix.
|
||
|
||
The functions supplied by each module are:
|
||
|
||
`XYZ_symfile_init(struct sym_fns *sf)'
|
||
Called from `symbol_file_add' when we are about to read a new
|
||
symbol file. This function should clean up any internal state
|
||
(possibly resulting from half-read previous files, for example)
|
||
and prepare to read a new symbol file. Note that the symbol file
|
||
which we are reading might be a new "main" symbol file, or might
|
||
be a secondary symbol file whose symbols are being added to the
|
||
existing symbol table.
|
||
|
||
The argument to `XYZ_symfile_init' is a newly allocated `struct
|
||
sym_fns' whose `bfd' field contains the BFD for the new symbol
|
||
file being read. Its `private' field has been zeroed, and can be
|
||
modified as desired. Typically, a struct of private information
|
||
will be `malloc''d, and a pointer to it will be placed in the
|
||
`private' field.
|
||
|
||
There is no result from `XYZ_symfile_init', but it can call
|
||
`error' if it detects an unavoidable problem.
|
||
|
||
`XYZ_new_init()'
|
||
Called from `symbol_file_add' when discarding existing symbols.
|
||
This function need only handle the symbol-reading module's internal
|
||
state; the symbol table data structures visible to the rest of GDB
|
||
will be discarded by `symbol_file_add'. It has no arguments and no
|
||
result. It may be called after `XYZ_symfile_init', if a new
|
||
symbol table is being read, or may be called alone if all symbols
|
||
are simply being discarded.
|
||
|
||
`XYZ_symfile_read(struct sym_fns *sf, CORE_ADDR addr, int mainline)'
|
||
Called from `symbol_file_add' to actually read the symbols from a
|
||
symbol-file into a set of psymtabs or symtabs.
|
||
|
||
`sf' points to the struct sym_fns originally passed to
|
||
`XYZ_sym_init' for possible initialization. `addr' is the offset
|
||
between the file's specified start address and its true address in
|
||
memory. `mainline' is 1 if this is the main symbol table being
|
||
read, and 0 if a secondary symbol file (e.g. shared library or
|
||
dynamically loaded file) is being read.
|
||
|
||
In addition, if a symbol-reading module creates psymtabs when
|
||
XYZ_symfile_read is called, these psymtabs will contain a pointer to a
|
||
function `XYZ_psymtab_to_symtab', which can be called from any point in
|
||
the GDB symbol-handling code.
|
||
|
||
`XYZ_psymtab_to_symtab (struct partial_symtab *pst)'
|
||
Called from `psymtab_to_symtab' (or the PSYMTAB_TO_SYMTAB macro) if
|
||
the psymtab has not already been read in and had its `pst->symtab'
|
||
pointer set. The argument is the psymtab to be fleshed-out into a
|
||
symtab. Upon return, pst->readin should have been set to 1, and
|
||
pst->symtab should contain a pointer to the new corresponding
|
||
symtab, or zero if there were no symbols in that part of the
|
||
symbol file.
|
||
|
||
Partial Symbol Tables
|
||
=====================
|
||
|
||
GDB has three types of symbol tables.
|
||
|
||
* full symbol tables (symtabs). These contain the main information
|
||
about symbols and addresses.
|
||
|
||
* partial symbol tables (psymtabs). These contain enough
|
||
information to know when to read the corresponding part of the full
|
||
symbol table.
|
||
|
||
* minimal symbol tables (msymtabs). These contain information
|
||
gleaned from non-debugging symbols.
|
||
|
||
|
||
This section describes partial symbol tables.
|
||
|
||
A psymtab is constructed by doing a very quick pass over an
|
||
executable file's debugging information. Small amounts of information
|
||
are extracted - enough to identify which parts of the symbol table will
|
||
need to be re-read and fully digested later, when the user needs the
|
||
information. The speed of this pass causes GDB to start up very
|
||
quickly. Later, as the detailed rereading occurs, it occurs in small
|
||
pieces, at various times, and the delay therefrom is mostly invisible to
|
||
the user.
|
||
|
||
The symbols that show up in a file's psymtab should be, roughly,
|
||
those visible to the debugger's user when the program is not running
|
||
code from that file. These include external symbols and types, static
|
||
symbols and types, and enum values declared at file scope.
|
||
|
||
The psymtab also contains the range of instruction addresses that the
|
||
full symbol table would represent.
|
||
|
||
The idea is that there are only two ways for the user (or much of the
|
||
code in the debugger) to reference a symbol:
|
||
|
||
* by its address (e.g. execution stops at some address which is
|
||
inside a function in this file). The address will be noticed to
|
||
be in the range of this psymtab, and the full symtab will be read
|
||
in. `find_pc_function', `find_pc_line', and other `find_pc_...'
|
||
functions handle this.
|
||
|
||
* by its name (e.g. the user asks to print a variable, or set a
|
||
breakpoint on a function). Global names and file-scope names will
|
||
be found in the psymtab, which will cause the symtab to be pulled
|
||
in. Local names will have to be qualified by a global name, or a
|
||
file-scope name, in which case we will have already read in the
|
||
symtab as we evaluated the qualifier. Or, a local symbol can be
|
||
referenced when we are "in" a local scope, in which case the first
|
||
case applies. `lookup_symbol' does most of the work here.
|
||
|
||
|
||
The only reason that psymtabs exist is to cause a symtab to be read
|
||
in at the right moment. Any symbol that can be elided from a psymtab,
|
||
while still causing that to happen, should not appear in it. Since
|
||
psymtabs don't have the idea of scope, you can't put local symbols in
|
||
them anyway. Psymtabs don't have the idea of the type of a symbol,
|
||
either, so types need not appear, unless they will be referenced by
|
||
name.
|
||
|
||
It is a bug for GDB to behave one way when only a psymtab has been
|
||
read, and another way if the corresponding symtab has been read in.
|
||
Such bugs are typically caused by a psymtab that does not contain all
|
||
the visible symbols, or which has the wrong instruction address ranges.
|
||
|
||
The psymtab for a particular section of a symbol-file (objfile)
|
||
could be thrown away after the symtab has been read in. The symtab
|
||
should always be searched before the psymtab, so the psymtab will never
|
||
be used (in a bug-free environment). Currently, psymtabs are allocated
|
||
on an obstack, and all the psymbols themselves are allocated in a pair
|
||
of large arrays on an obstack, so there is little to be gained by
|
||
trying to free them unless you want to do a lot more work.
|
||
|
||
Types
|
||
=====
|
||
|
||
Fundamental Types (e.g., FT_VOID, FT_BOOLEAN).
|
||
|
||
These are the fundamental types that GDB uses internally.
|
||
Fundamental types from the various debugging formats (stabs, ELF, etc)
|
||
are mapped into one of these. They are basically a union of all
|
||
fundamental types that gdb knows about for all the languages that GDB
|
||
knows about.
|
||
|
||
Type Codes (e.g., TYPE_CODE_PTR, TYPE_CODE_ARRAY).
|
||
|
||
Each time GDB builds an internal type, it marks it with one of these
|
||
types. The type may be a fundamental type, such as TYPE_CODE_INT, or a
|
||
derived type, such as TYPE_CODE_PTR which is a pointer to another type.
|
||
Typically, several FT_* types map to one TYPE_CODE_* type, and are
|
||
distinguished by other members of the type struct, such as whether the
|
||
type is signed or unsigned, and how many bits it uses.
|
||
|
||
Builtin Types (e.g., builtin_type_void, builtin_type_char).
|
||
|
||
These are instances of type structs that roughly correspond to
|
||
fundamental types and are created as global types for GDB to use for
|
||
various ugly historical reasons. We eventually want to eliminate these.
|
||
Note for example that builtin_type_int initialized in gdbtypes.c is
|
||
basically the same as a TYPE_CODE_INT type that is initialized in
|
||
c-lang.c for an FT_INTEGER fundamental type. The difference is that the
|
||
builtin_type is not associated with any particular objfile, and only one
|
||
instance exists, while c-lang.c builds as many TYPE_CODE_INT types as
|
||
needed, with each one associated with some particular objfile.
|
||
|
||
Object File Formats
|
||
===================
|
||
|
||
a.out
|
||
-----
|
||
|
||
The `a.out' format is the original file format for Unix. It
|
||
consists of three sections: text, data, and bss, which are for program
|
||
code, initialized data, and uninitialized data, respectively.
|
||
|
||
The `a.out' format is so simple that it doesn't have any reserved
|
||
place for debugging information. (Hey, the original Unix hackers used
|
||
`adb', which is a machine-language debugger.) The only debugging
|
||
format for `a.out' is stabs, which is encoded as a set of normal
|
||
symbols with distinctive attributes.
|
||
|
||
The basic `a.out' reader is in `dbxread.c'.
|
||
|
||
COFF
|
||
----
|
||
|
||
The COFF format was introduced with System V Release 3 (SVR3) Unix.
|
||
COFF files may have multiple sections, each prefixed by a header. The
|
||
number of sections is limited.
|
||
|
||
The COFF specification includes support for debugging. Although this
|
||
was a step forward, the debugging information was woefully limited. For
|
||
instance, it was not possible to represent code that came from an
|
||
included file.
|
||
|
||
The COFF reader is in `coffread.c'.
|
||
|
||
ECOFF
|
||
-----
|
||
|
||
ECOFF is an extended COFF originally introduced for Mips and Alpha
|
||
workstations.
|
||
|
||
The basic ECOFF reader is in `mipsread.c'.
|
||
|
||
XCOFF
|
||
-----
|
||
|
||
The IBM RS/6000 running AIX uses an object file format called XCOFF.
|
||
The COFF sections, symbols, and line numbers are used, but debugging
|
||
symbols are dbx-style stabs whose strings are located in the `.debug'
|
||
section (rather than the string table). For more information, see
|
||
*Note Top: (stabs)Top.
|
||
|
||
The shared library scheme has a clean interface for figuring out what
|
||
shared libraries are in use, but the catch is that everything which
|
||
refers to addresses (symbol tables and breakpoints at least) needs to be
|
||
relocated for both shared libraries and the main executable. At least
|
||
using the standard mechanism this can only be done once the program has
|
||
been run (or the core file has been read).
|
||
|
||
PE
|
||
--
|
||
|
||
Windows 95 and NT use the PE (Portable Executable) format for their
|
||
executables. PE is basically COFF with additional headers.
|
||
|
||
While BFD includes special PE support, GDB needs only the basic COFF
|
||
reader.
|
||
|
||
ELF
|
||
---
|
||
|
||
The ELF format came with System V Release 4 (SVR4) Unix. ELF is
|
||
similar to COFF in being organized into a number of sections, but it
|
||
removes many of COFF's limitations.
|
||
|
||
The basic ELF reader is in `elfread.c'.
|
||
|
||
SOM
|
||
---
|
||
|
||
SOM is HP's object file and debug format (not to be confused with
|
||
IBM's SOM, which is a cross-language ABI).
|
||
|
||
The SOM reader is in `hpread.c'.
|
||
|
||
Other File Formats
|
||
------------------
|
||
|
||
Other file formats that have been supported by GDB include Netware
|
||
Loadable Modules (`nlmread.c'.
|
||
|
||
Debugging File Formats
|
||
======================
|
||
|
||
This section describes characteristics of debugging information that
|
||
are independent of the object file format.
|
||
|
||
stabs
|
||
-----
|
||
|
||
`stabs' started out as special symbols within the `a.out' format.
|
||
Since then, it has been encapsulated into other file formats, such as
|
||
COFF and ELF.
|
||
|
||
While `dbxread.c' does some of the basic stab processing, including
|
||
for encapsulated versions, `stabsread.c' does the real work.
|
||
|
||
COFF
|
||
----
|
||
|
||
The basic COFF definition includes debugging information. The level
|
||
of support is minimal and non-extensible, and is not often used.
|
||
|
||
Mips debug (Third Eye)
|
||
----------------------
|
||
|
||
ECOFF includes a definition of a special debug format.
|
||
|
||
The file `mdebugread.c' implements reading for this format.
|
||
|
||
DWARF 1
|
||
-------
|
||
|
||
DWARF 1 is a debugging format that was originally designed to be
|
||
used with ELF in SVR4 systems.
|
||
|
||
The DWARF 1 reader is in `dwarfread.c'.
|
||
|
||
DWARF 2
|
||
-------
|
||
|
||
DWARF 2 is an improved but incompatible version of DWARF 1.
|
||
|
||
The DWARF 2 reader is in `dwarf2read.c'.
|
||
|
||
SOM
|
||
---
|
||
|
||
Like COFF, the SOM definition includes debugging information.
|
||
|
||
Adding a New Symbol Reader to GDB
|
||
=================================
|
||
|
||
If you are using an existing object file format (a.out, COFF, ELF,
|
||
etc), there is probably little to be done.
|
||
|
||
If you need to add a new object file format, you must first add it to
|
||
BFD. This is beyond the scope of this document.
|
||
|
||
You must then arrange for the BFD code to provide access to the
|
||
debugging symbols. Generally GDB will have to call swapping routines
|
||
from BFD and a few other BFD internal routines to locate the debugging
|
||
information. As much as possible, GDB should not depend on the BFD
|
||
internal data structures.
|
||
|
||
For some targets (e.g., COFF), there is a special transfer vector
|
||
used to call swapping routines, since the external data structures on
|
||
various platforms have different sizes and layouts. Specialized
|
||
routines that will only ever be implemented by one object file format
|
||
may be called directly. This interface should be described in a file
|
||
`bfd/libxyz.h', which is included by GDB.
|
||
|
||
|
||
File: gdbint.info, Node: Language Support, Next: Host Definition, Prev: Symbol Handling, Up: Top
|
||
|
||
Language Support
|
||
****************
|
||
|
||
GDB's language support is mainly driven by the symbol reader,
|
||
although it is possible for the user to set the source language
|
||
manually.
|
||
|
||
GDB chooses the source language by looking at the extension of the
|
||
file recorded in the debug info; `.c' means C, `.f' means Fortran, etc.
|
||
It may also use a special-purpose language identifier if the debug
|
||
format supports it, such as DWARF.
|
||
|
||
Adding a Source Language to GDB
|
||
===============================
|
||
|
||
To add other languages to GDB's expression parser, follow the
|
||
following steps:
|
||
|
||
_Create the expression parser._
|
||
This should reside in a file `LANG-exp.y'. Routines for building
|
||
parsed expressions into a `union exp_element' list are in
|
||
`parse.c'.
|
||
|
||
Since we can't depend upon everyone having Bison, and YACC produces
|
||
parsers that define a bunch of global names, the following lines
|
||
_must_ be included at the top of the YACC parser, to prevent the
|
||
various parsers from defining the same global names:
|
||
|
||
#define yyparse LANG_parse
|
||
#define yylex LANG_lex
|
||
#define yyerror LANG_error
|
||
#define yylval LANG_lval
|
||
#define yychar LANG_char
|
||
#define yydebug LANG_debug
|
||
#define yypact LANG_pact
|
||
#define yyr1 LANG_r1
|
||
#define yyr2 LANG_r2
|
||
#define yydef LANG_def
|
||
#define yychk LANG_chk
|
||
#define yypgo LANG_pgo
|
||
#define yyact LANG_act
|
||
#define yyexca LANG_exca
|
||
#define yyerrflag LANG_errflag
|
||
#define yynerrs LANG_nerrs
|
||
|
||
At the bottom of your parser, define a `struct language_defn' and
|
||
initialize it with the right values for your language. Define an
|
||
`initialize_LANG' routine and have it call
|
||
`add_language(LANG_language_defn)' to tell the rest of GDB that
|
||
your language exists. You'll need some other supporting variables
|
||
and functions, which will be used via pointers from your
|
||
`LANG_language_defn'. See the declaration of `struct
|
||
language_defn' in `language.h', and the other `*-exp.y' files, for
|
||
more information.
|
||
|
||
_Add any evaluation routines, if necessary_
|
||
If you need new opcodes (that represent the operations of the
|
||
language), add them to the enumerated type in `expression.h'. Add
|
||
support code for these operations in `eval.c:evaluate_subexp()'.
|
||
Add cases for new opcodes in two functions from `parse.c':
|
||
`prefixify_subexp()' and `length_of_subexp()'. These compute the
|
||
number of `exp_element's that a given operation takes up.
|
||
|
||
_Update some existing code_
|
||
Add an enumerated identifier for your language to the enumerated
|
||
type `enum language' in `defs.h'.
|
||
|
||
Update the routines in `language.c' so your language is included.
|
||
These routines include type predicates and such, which (in some
|
||
cases) are language dependent. If your language does not appear
|
||
in the switch statement, an error is reported.
|
||
|
||
Also included in `language.c' is the code that updates the variable
|
||
`current_language', and the routines that translate the
|
||
`language_LANG' enumerated identifier into a printable string.
|
||
|
||
Update the function `_initialize_language' to include your
|
||
language. This function picks the default language upon startup,
|
||
so is dependent upon which languages that GDB is built for.
|
||
|
||
Update `allocate_symtab' in `symfile.c' and/or symbol-reading code
|
||
so that the language of each symtab (source file) is set properly.
|
||
This is used to determine the language to use at each stack frame
|
||
level. Currently, the language is set based upon the extension of
|
||
the source file. If the language can be better inferred from the
|
||
symbol information, please set the language of the symtab in the
|
||
symbol-reading code.
|
||
|
||
Add helper code to `expprint.c:print_subexp()' to handle any new
|
||
expression opcodes you have added to `expression.h'. Also, add the
|
||
printed representations of your operators to `op_print_tab'.
|
||
|
||
_Add a place of call_
|
||
Add a call to `LANG_parse()' and `LANG_error' in
|
||
`parse.c:parse_exp_1()'.
|
||
|
||
_Use macros to trim code_
|
||
The user has the option of building GDB for some or all of the
|
||
languages. If the user decides to build GDB for the language
|
||
LANG, then every file dependent on `language.h' will have the
|
||
macro `_LANG_LANG' defined in it. Use `#ifdef's to leave out
|
||
large routines that the user won't need if he or she is not using
|
||
your language.
|
||
|
||
Note that you do not need to do this in your YACC parser, since if
|
||
GDB is not build for LANG, then `LANG-exp.tab.o' (the compiled
|
||
form of your parser) is not linked into GDB at all.
|
||
|
||
See the file `configure.in' for how GDB is configured for different
|
||
languages.
|
||
|
||
_Edit `Makefile.in'_
|
||
Add dependencies in `Makefile.in'. Make sure you update the macro
|
||
variables such as `HFILES' and `OBJS', otherwise your code may not
|
||
get linked in, or, worse yet, it may not get `tar'red into the
|
||
distribution!
|
||
|
||
|
||
File: gdbint.info, Node: Host Definition, Next: Target Architecture Definition, Prev: Language Support, Up: Top
|
||
|
||
Host Definition
|
||
***************
|
||
|
||
With the advent of autoconf, it's rarely necessary to have host
|
||
definition machinery anymore.
|
||
|
||
Adding a New Host
|
||
=================
|
||
|
||
Most of GDB's host configuration support happens via autoconf. It
|
||
should be rare to need new host-specific definitions. GDB still uses
|
||
the host-specific definitions and files listed below, but these mostly
|
||
exist for historical reasons, and should eventually disappear.
|
||
|
||
Several files control GDB's configuration for host systems:
|
||
|
||
`gdb/config/ARCH/XYZ.mh'
|
||
Specifies Makefile fragments needed when hosting on machine XYZ.
|
||
In particular, this lists the required machine-dependent object
|
||
files, by defining `XDEPFILES=...'. Also specifies the header file
|
||
which describes host XYZ, by defining `XM_FILE= xm-XYZ.h'. You
|
||
can also define `CC', `SYSV_DEFINE', `XM_CFLAGS', `XM_ADD_FILES',
|
||
`XM_CLIBS', `XM_CDEPS', etc.; see `Makefile.in'.
|
||
|
||
`gdb/config/ARCH/xm-XYZ.h'
|
||
(`xm.h' is a link to this file, created by configure). Contains C
|
||
macro definitions describing the host system environment, such as
|
||
byte order, host C compiler and library.
|
||
|
||
`gdb/XYZ-xdep.c'
|
||
Contains any miscellaneous C code required for this machine as a
|
||
host. On most machines it doesn't exist at all. If it does
|
||
exist, put `XYZ-xdep.o' into the `XDEPFILES' line in
|
||
`gdb/config/ARCH/XYZ.mh'.
|
||
|
||
Generic Host Support Files
|
||
--------------------------
|
||
|
||
There are some "generic" versions of routines that can be used by
|
||
various systems. These can be customized in various ways by macros
|
||
defined in your `xm-XYZ.h' file. If these routines work for the XYZ
|
||
host, you can just include the generic file's name (with `.o', not
|
||
`.c') in `XDEPFILES'.
|
||
|
||
Otherwise, if your machine needs custom support routines, you will
|
||
need to write routines that perform the same functions as the generic
|
||
file. Put them into `XYZ-xdep.c', and put `XYZ-xdep.o' into
|
||
`XDEPFILES'.
|
||
|
||
`ser-unix.c'
|
||
This contains serial line support for Unix systems. This is always
|
||
included, via the makefile variable `SER_HARDWIRE'; override this
|
||
variable in the `.mh' file to avoid it.
|
||
|
||
`ser-go32.c'
|
||
This contains serial line support for 32-bit programs running
|
||
under DOS, using the GO32 execution environment.
|
||
|
||
`ser-tcp.c'
|
||
This contains generic TCP support using sockets.
|
||
|
||
Host Conditionals
|
||
=================
|
||
|
||
When GDB is configured and compiled, various macros are defined or
|
||
left undefined, to control compilation based on the attributes of the
|
||
host system. These macros and their meanings (or if the meaning is not
|
||
documented here, then one of the source files where they are used is
|
||
indicated) are:
|
||
|
||
`GDBINIT_FILENAME'
|
||
The default name of GDB's initialization file (normally
|
||
`.gdbinit').
|
||
|
||
`MEM_FNS_DECLARED'
|
||
Your host config file defines this if it includes declarations of
|
||
`memcpy' and `memset'. Define this to avoid conflicts between the
|
||
native include files and the declarations in `defs.h'.
|
||
|
||
`NO_STD_REGS'
|
||
This macro is deprecated.
|
||
|
||
`NO_SYS_FILE'
|
||
Define this if your system does not have a `<sys/file.h>'.
|
||
|
||
`SIGWINCH_HANDLER'
|
||
If your host defines `SIGWINCH', you can define this to be the name
|
||
of a function to be called if `SIGWINCH' is received.
|
||
|
||
`SIGWINCH_HANDLER_BODY'
|
||
Define this to expand into code that will define the function
|
||
named by the expansion of `SIGWINCH_HANDLER'.
|
||
|
||
`ALIGN_STACK_ON_STARTUP'
|
||
Define this if your system is of a sort that will crash in
|
||
`tgetent' if the stack happens not to be longword-aligned when
|
||
`main' is called. This is a rare situation, but is known to occur
|
||
on several different types of systems.
|
||
|
||
`CRLF_SOURCE_FILES'
|
||
Define this if host files use `\r\n' rather than `\n' as a line
|
||
terminator. This will cause source file listings to omit `\r'
|
||
characters when printing and it will allow \r\n line endings of
|
||
files which are "sourced" by gdb. It must be possible to open
|
||
files in binary mode using `O_BINARY' or, for fopen, `"rb"'.
|
||
|
||
`DEFAULT_PROMPT'
|
||
The default value of the prompt string (normally `"(gdb) "').
|
||
|
||
`DEV_TTY'
|
||
The name of the generic TTY device, defaults to `"/dev/tty"'.
|
||
|
||
`FCLOSE_PROVIDED'
|
||
Define this if the system declares `fclose' in the headers included
|
||
in `defs.h'. This isn't needed unless your compiler is unusually
|
||
anal.
|
||
|
||
`FOPEN_RB'
|
||
Define this if binary files are opened the same way as text files.
|
||
|
||
`GETENV_PROVIDED'
|
||
Define this if the system declares `getenv' in its headers included
|
||
in `defs.h'. This isn't needed unless your compiler is unusually
|
||
anal.
|
||
|
||
`HAVE_MMAP'
|
||
In some cases, use the system call `mmap' for reading symbol
|
||
tables. For some machines this allows for sharing and quick
|
||
updates.
|
||
|
||
`HAVE_SIGSETMASK'
|
||
Define this if the host system has job control, but does not define
|
||
`sigsetmask()'. Currently, this is only true of the RS/6000.
|
||
|
||
`HAVE_TERMIO'
|
||
Define this if the host system has `termio.h'.
|
||
|
||
`HOST_BYTE_ORDER'
|
||
The ordering of bytes in the host. This must be defined to be
|
||
either `BIG_ENDIAN' or `LITTLE_ENDIAN'.
|
||
|
||
`INT_MAX'
|
||
|
||
`INT_MIN'
|
||
|
||
`LONG_MAX'
|
||
|
||
`UINT_MAX'
|
||
|
||
`ULONG_MAX'
|
||
Values for host-side constants.
|
||
|
||
`ISATTY'
|
||
Substitute for isatty, if not available.
|
||
|
||
`LONGEST'
|
||
This is the longest integer type available on the host. If not
|
||
defined, it will default to `long long' or `long', depending on
|
||
`CC_HAS_LONG_LONG'.
|
||
|
||
`CC_HAS_LONG_LONG'
|
||
Define this if the host C compiler supports "long long". This is
|
||
set by the configure script.
|
||
|
||
`PRINTF_HAS_LONG_LONG'
|
||
Define this if the host can handle printing of long long integers
|
||
via the printf format directive "ll". This is set by the configure
|
||
script.
|
||
|
||
`HAVE_LONG_DOUBLE'
|
||
Define this if the host C compiler supports "long double". This is
|
||
set by the configure script.
|
||
|
||
`PRINTF_HAS_LONG_DOUBLE'
|
||
Define this if the host can handle printing of long double
|
||
float-point numbers via the printf format directive "Lg". This is
|
||
set by the configure script.
|
||
|
||
`SCANF_HAS_LONG_DOUBLE'
|
||
Define this if the host can handle the parsing of long double
|
||
float-point numbers via the scanf format directive directive "Lg".
|
||
This is set by the configure script.
|
||
|
||
`LSEEK_NOT_LINEAR'
|
||
Define this if `lseek (n)' does not necessarily move to byte number
|
||
`n' in the file. This is only used when reading source files. It
|
||
is normally faster to define `CRLF_SOURCE_FILES' when possible.
|
||
|
||
`L_SET'
|
||
This macro is used as the argument to lseek (or, most commonly,
|
||
bfd_seek). FIXME, should be replaced by SEEK_SET instead, which
|
||
is the POSIX equivalent.
|
||
|
||
`MALLOC_INCOMPATIBLE'
|
||
Define this if the system's prototype for `malloc' differs from the
|
||
ANSI definition.
|
||
|
||
`MMAP_BASE_ADDRESS'
|
||
When using HAVE_MMAP, the first mapping should go at this address.
|
||
|
||
`MMAP_INCREMENT'
|
||
when using HAVE_MMAP, this is the increment between mappings.
|
||
|
||
`NEED_POSIX_SETPGID'
|
||
Define this to use the POSIX version of `setpgid' to determine
|
||
whether job control is available.
|
||
|
||
`NORETURN'
|
||
If defined, this should be one or more tokens, such as `volatile',
|
||
that can be used in both the declaration and definition of
|
||
functions to indicate that they never return. The default is
|
||
already set correctly if compiling with GCC. This will almost
|
||
never need to be defined.
|
||
|
||
`ATTR_NORETURN'
|
||
If defined, this should be one or more tokens, such as
|
||
`__attribute__ ((noreturn))', that can be used in the declarations
|
||
of functions to indicate that they never return. The default is
|
||
already set correctly if compiling with GCC. This will almost
|
||
never need to be defined.
|
||
|
||
`USE_GENERIC_DUMMY_FRAMES'
|
||
Define this to 1 if the target is using the generic inferior
|
||
function call code. See `blockframe.c' for more information.
|
||
|
||
`USE_MMALLOC'
|
||
GDB will use the `mmalloc' library for memory allocation for symbol
|
||
reading if this symbol is defined. Be careful defining it since
|
||
there are systems on which `mmalloc' does not work for some
|
||
reason. One example is the DECstation, where its RPC library
|
||
can't cope with our redefinition of `malloc' to call `mmalloc'.
|
||
When defining `USE_MMALLOC', you will also have to set `MMALLOC'
|
||
in the Makefile, to point to the mmalloc library. This define is
|
||
set when you configure with -with-mmalloc.
|
||
|
||
`NO_MMCHECK'
|
||
Define this if you are using `mmalloc', but don't want the overhead
|
||
of checking the heap with `mmcheck'. Note that on some systems,
|
||
the C runtime makes calls to malloc prior to calling `main', and if
|
||
`free' is ever called with these pointers after calling `mmcheck'
|
||
to enable checking, a memory corruption abort is certain to occur.
|
||
These systems can still use mmalloc, but must define NO_MMCHECK.
|
||
|
||
`MMCHECK_FORCE'
|
||
Define this to 1 if the C runtime allocates memory prior to
|
||
`mmcheck' being called, but that memory is never freed so we don't
|
||
have to worry about it triggering a memory corruption abort. The
|
||
default is 0, which means that `mmcheck' will only install the heap
|
||
checking functions if there has not yet been any memory allocation
|
||
calls, and if it fails to install the functions, gdb will issue a
|
||
warning. This is currently defined if you configure using
|
||
-with-mmalloc.
|
||
|
||
`NO_SIGINTERRUPT'
|
||
Define this to indicate that siginterrupt() is not available.
|
||
|
||
`R_OK'
|
||
Define if this is not in a system .h file.
|
||
|
||
`SEEK_CUR'
|
||
|
||
`SEEK_SET'
|
||
Define these to appropriate value for the system lseek(), if not
|
||
already defined.
|
||
|
||
`STOP_SIGNAL'
|
||
This is the signal for stopping GDB. Defaults to SIGTSTP. (Only
|
||
redefined for the Convex.)
|
||
|
||
`USE_O_NOCTTY'
|
||
Define this if the interior's tty should be opened with the
|
||
O_NOCTTY flag. (FIXME: This should be a native-only flag, but
|
||
`inflow.c' is always linked in.)
|
||
|
||
`USG'
|
||
Means that System V (prior to SVR4) include files are in use.
|
||
(FIXME: This symbol is abused in `infrun.c', `regex.c',
|
||
`remote-nindy.c', and `utils.c' for other things, at the moment.)
|
||
|
||
`lint'
|
||
Define this to help placate lint in some situations.
|
||
|
||
`volatile'
|
||
Define this to override the defaults of `__volatile__' or `/**/'.
|
||
|