2076 lines
72 KiB
Plaintext
2076 lines
72 KiB
Plaintext
\input texinfo @c -*-texinfo-*-
|
|
@c %**start of header
|
|
@setfilename g++int.info
|
|
@settitle G++ internals
|
|
@setchapternewpage odd
|
|
@c %**end of header
|
|
|
|
@node Top, Limitations of g++, (dir), (dir)
|
|
@chapter Internal Architecture of the Compiler
|
|
|
|
This is meant to describe the C++ front-end for gcc in detail.
|
|
Questions and comments to Benjamin Kosnik @code{<bkoz@@cygnus.com>}.
|
|
|
|
@menu
|
|
* Limitations of g++::
|
|
* Routines::
|
|
* Implementation Specifics::
|
|
* Glossary::
|
|
* Macros::
|
|
* Typical Behavior::
|
|
* Coding Conventions::
|
|
* Templates::
|
|
* Access Control::
|
|
* Error Reporting::
|
|
* Parser::
|
|
* Exception Handling::
|
|
* Free Store::
|
|
* Mangling:: Function name mangling for C++ and Java
|
|
* Vtables:: Two ways to do virtual functions
|
|
* Concept Index::
|
|
@end menu
|
|
|
|
@node Limitations of g++, Routines, Top, Top
|
|
@section Limitations of g++
|
|
|
|
@itemize @bullet
|
|
@item
|
|
Limitations on input source code: 240 nesting levels with the parser
|
|
stacksize (YYSTACKSIZE) set to 500 (the default), and requires around
|
|
16.4k swap space per nesting level. The parser needs about 2.09 *
|
|
number of nesting levels worth of stackspace.
|
|
|
|
@cindex pushdecl_class_level
|
|
@item
|
|
I suspect there are other uses of pushdecl_class_level that do not call
|
|
set_identifier_type_value in tandem with the call to
|
|
pushdecl_class_level. It would seem to be an omission.
|
|
|
|
@cindex access checking
|
|
@item
|
|
Access checking is unimplemented for nested types.
|
|
|
|
@cindex @code{volatile}
|
|
@item
|
|
@code{volatile} is not implemented in general.
|
|
|
|
@end itemize
|
|
|
|
@node Routines, Implementation Specifics, Limitations of g++, Top
|
|
@section Routines
|
|
|
|
This section describes some of the routines used in the C++ front-end.
|
|
|
|
@code{build_vtable} and @code{prepare_fresh_vtable} is used only within
|
|
the @file{cp-class.c} file, and only in @code{finish_struct} and
|
|
@code{modify_vtable_entries}.
|
|
|
|
@code{build_vtable}, @code{prepare_fresh_vtable}, and
|
|
@code{finish_struct} are the only routines that set @code{DECL_VPARENT}.
|
|
|
|
@code{finish_struct} can steal the virtual function table from parents,
|
|
this prohibits related_vslot from working. When finish_struct steals,
|
|
we know that
|
|
|
|
@example
|
|
get_binfo (DECL_FIELD_CONTEXT (CLASSTYPE_VFIELD (t)), t, 0)
|
|
@end example
|
|
|
|
@noindent
|
|
will get the related binfo.
|
|
|
|
@code{layout_basetypes} does something with the VIRTUALS.
|
|
|
|
Supposedly (according to Tiemann) most of the breadth first searching
|
|
done, like in @code{get_base_distance} and in @code{get_binfo} was not
|
|
because of any design decision. I have since found out the at least one
|
|
part of the compiler needs the notion of depth first binfo searching, I
|
|
am going to try and convert the whole thing, it should just work. The
|
|
term left-most refers to the depth first left-most node. It uses
|
|
@code{MAIN_VARIANT == type} as the condition to get left-most, because
|
|
the things that have @code{BINFO_OFFSET}s of zero are shared and will
|
|
have themselves as their own @code{MAIN_VARIANT}s. The non-shared right
|
|
ones, are copies of the left-most one, hence if it is its own
|
|
@code{MAIN_VARIANT}, we know it IS a left-most one, if it is not, it is
|
|
a non-left-most one.
|
|
|
|
@code{get_base_distance}'s path and distance matters in its use in:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
@code{prepare_fresh_vtable} (the code is probably wrong)
|
|
@item
|
|
@code{init_vfields} Depends upon distance probably in a safe way,
|
|
build_offset_ref might use partial paths to do further lookups,
|
|
hack_identifier is probably not properly checking access.
|
|
|
|
@item
|
|
@code{get_first_matching_virtual} probably should check for
|
|
@code{get_base_distance} returning -2.
|
|
|
|
@item
|
|
@code{resolve_offset_ref} should be called in a more deterministic
|
|
manner. Right now, it is called in some random contexts, like for
|
|
arguments at @code{build_method_call} time, @code{default_conversion}
|
|
time, @code{convert_arguments} time, @code{build_unary_op} time,
|
|
@code{build_c_cast} time, @code{build_modify_expr} time,
|
|
@code{convert_for_assignment} time, and
|
|
@code{convert_for_initialization} time.
|
|
|
|
But, there are still more contexts it needs to be called in, one was the
|
|
ever simple:
|
|
|
|
@example
|
|
if (obj.*pmi != 7)
|
|
@dots{}
|
|
@end example
|
|
|
|
Seems that the problems were due to the fact that @code{TREE_TYPE} of
|
|
the @code{OFFSET_REF} was not a @code{OFFSET_TYPE}, but rather the type
|
|
of the referent (like @code{INTEGER_TYPE}). This problem was fixed by
|
|
changing @code{default_conversion} to check @code{TREE_CODE (x)},
|
|
instead of only checking @code{TREE_CODE (TREE_TYPE (x))} to see if it
|
|
was @code{OFFSET_TYPE}.
|
|
|
|
@end itemize
|
|
|
|
@node Implementation Specifics, Glossary, Routines, Top
|
|
@section Implementation Specifics
|
|
|
|
@itemize @bullet
|
|
@item Explicit Initialization
|
|
|
|
The global list @code{current_member_init_list} contains the list of
|
|
mem-initializers specified in a constructor declaration. For example:
|
|
|
|
@example
|
|
foo::foo() : a(1), b(2) @{@}
|
|
@end example
|
|
|
|
@noindent
|
|
will initialize @samp{a} with 1 and @samp{b} with 2.
|
|
@code{expand_member_init} places each initialization (a with 1) on the
|
|
global list. Then, when the fndecl is being processed,
|
|
@code{emit_base_init} runs down the list, initializing them. It used to
|
|
be the case that g++ first ran down @code{current_member_init_list},
|
|
then ran down the list of members initializing the ones that weren't
|
|
explicitly initialized. Things were rewritten to perform the
|
|
initializations in order of declaration in the class. So, for the above
|
|
example, @samp{a} and @samp{b} will be initialized in the order that
|
|
they were declared:
|
|
|
|
@example
|
|
class foo @{ public: int b; int a; foo (); @};
|
|
@end example
|
|
|
|
@noindent
|
|
Thus, @samp{b} will be initialized with 2 first, then @samp{a} will be
|
|
initialized with 1, regardless of how they're listed in the mem-initializer.
|
|
|
|
@item The Explicit Keyword
|
|
|
|
The use of @code{explicit} on a constructor is used by @code{grokdeclarator}
|
|
to set the field @code{DECL_NONCONVERTING_P}. That value is used by
|
|
@code{build_method_call} and @code{build_user_type_conversion_1} to decide
|
|
if a particular constructor should be used as a candidate for conversions.
|
|
|
|
@end itemize
|
|
|
|
@node Glossary, Macros, Implementation Specifics, Top
|
|
@section Glossary
|
|
|
|
@table @r
|
|
@item binfo
|
|
The main data structure in the compiler used to represent the
|
|
inheritance relationships between classes. The data in the binfo can be
|
|
accessed by the BINFO_ accessor macros.
|
|
|
|
@item vtable
|
|
@itemx virtual function table
|
|
|
|
The virtual function table holds information used in virtual function
|
|
dispatching. In the compiler, they are usually referred to as vtables,
|
|
or vtbls. The first index is not used in the normal way, I believe it
|
|
is probably used for the virtual destructor. There are two forms of
|
|
virtual tables, one that has offsets in addition to pointers, and one
|
|
using thunks. @xref{Vtables}.
|
|
|
|
@item vfield
|
|
|
|
vfields can be thought of as the base information needed to build
|
|
vtables. For every vtable that exists for a class, there is a vfield.
|
|
See also vtable and virtual function table pointer. When a type is used
|
|
as a base class to another type, the virtual function table for the
|
|
derived class can be based upon the vtable for the base class, just
|
|
extended to include the additional virtual methods declared in the
|
|
derived class. The virtual function table from a virtual base class is
|
|
never reused in a derived class. @code{is_normal} depends upon this.
|
|
|
|
@item virtual function table pointer
|
|
|
|
These are @code{FIELD_DECL}s that are pointer types that point to
|
|
vtables. See also vtable and vfield.
|
|
@end table
|
|
|
|
@node Macros, Typical Behavior, Glossary, Top
|
|
@section Macros
|
|
|
|
This section describes some of the macros used on trees. The list
|
|
should be alphabetical. Eventually all macros should be documented
|
|
here.
|
|
|
|
@table @code
|
|
@item BINFO_BASETYPES
|
|
A vector of additional binfos for the types inherited by this basetype.
|
|
The binfos are fully unshared (except for virtual bases, in which
|
|
case the binfo structure is shared).
|
|
|
|
If this basetype describes type D as inherited in C,
|
|
and if the basetypes of D are E anf F,
|
|
then this vector contains binfos for inheritance of E and F by C.
|
|
|
|
Has values of:
|
|
|
|
TREE_VECs
|
|
|
|
|
|
@item BINFO_INHERITANCE_CHAIN
|
|
Temporarily used to represent specific inheritances. It usually points
|
|
to the binfo associated with the lesser derived type, but it can be
|
|
reversed by reverse_path. For example:
|
|
|
|
@example
|
|
Z ZbY least derived
|
|
|
|
|
Y YbX
|
|
|
|
|
X Xb most derived
|
|
|
|
TYPE_BINFO (X) == Xb
|
|
BINFO_INHERITANCE_CHAIN (Xb) == YbX
|
|
BINFO_INHERITANCE_CHAIN (Yb) == ZbY
|
|
BINFO_INHERITANCE_CHAIN (Zb) == 0
|
|
@end example
|
|
|
|
Not sure is the above is really true, get_base_distance has is point
|
|
towards the most derived type, opposite from above.
|
|
|
|
Set by build_vbase_path, recursive_bounded_basetype_p,
|
|
get_base_distance, lookup_field, lookup_fnfields, and reverse_path.
|
|
|
|
What things can this be used on:
|
|
|
|
TREE_VECs that are binfos
|
|
|
|
|
|
@item BINFO_OFFSET
|
|
The offset where this basetype appears in its containing type.
|
|
BINFO_OFFSET slot holds the offset (in bytes) from the base of the
|
|
complete object to the base of the part of the object that is allocated
|
|
on behalf of this `type'. This is always 0 except when there is
|
|
multiple inheritance.
|
|
|
|
Used on TREE_VEC_ELTs of the binfos BINFO_BASETYPES (...) for example.
|
|
|
|
|
|
@item BINFO_VIRTUALS
|
|
A unique list of functions for the virtual function table. See also
|
|
TYPE_BINFO_VIRTUALS.
|
|
|
|
What things can this be used on:
|
|
|
|
TREE_VECs that are binfos
|
|
|
|
|
|
@item BINFO_VTABLE
|
|
Used to find the VAR_DECL that is the virtual function table associated
|
|
with this binfo. See also TYPE_BINFO_VTABLE. To get the virtual
|
|
function table pointer, see CLASSTYPE_VFIELD.
|
|
|
|
What things can this be used on:
|
|
|
|
TREE_VECs that are binfos
|
|
|
|
Has values of:
|
|
|
|
VAR_DECLs that are virtual function tables
|
|
|
|
|
|
@item BLOCK_SUPERCONTEXT
|
|
In the outermost scope of each function, it points to the FUNCTION_DECL
|
|
node. It aids in better DWARF support of inline functions.
|
|
|
|
|
|
@item CLASSTYPE_TAGS
|
|
CLASSTYPE_TAGS is a linked (via TREE_CHAIN) list of member classes of a
|
|
class. TREE_PURPOSE is the name, TREE_VALUE is the type (pushclass scans
|
|
these and calls pushtag on them.)
|
|
|
|
finish_struct scans these to produce TYPE_DECLs to add to the
|
|
TYPE_FIELDS of the type.
|
|
|
|
It is expected that name found in the TREE_PURPOSE slot is unique,
|
|
resolve_scope_to_name is one such place that depends upon this
|
|
uniqueness.
|
|
|
|
|
|
@item CLASSTYPE_METHOD_VEC
|
|
The following is true after finish_struct has been called (on the
|
|
class?) but not before. Before finish_struct is called, things are
|
|
different to some extent. Contains a TREE_VEC of methods of the class.
|
|
The TREE_VEC_LENGTH is the number of differently named methods plus one
|
|
for the 0th entry. The 0th entry is always allocated, and reserved for
|
|
ctors and dtors. If there are none, TREE_VEC_ELT(N,0) == NULL_TREE.
|
|
Each entry of the TREE_VEC is a FUNCTION_DECL. For each FUNCTION_DECL,
|
|
there is a DECL_CHAIN slot. If the FUNCTION_DECL is the last one with a
|
|
given name, the DECL_CHAIN slot is NULL_TREE. Otherwise it is the next
|
|
method that has the same name (but a different signature). It would
|
|
seem that it is not true that because the DECL_CHAIN slot is used in
|
|
this way, we cannot call pushdecl to put the method in the global scope
|
|
(cause that would overwrite the TREE_CHAIN slot), because they use
|
|
different _CHAINs. finish_struct_methods setups up one version of the
|
|
TREE_CHAIN slots on the FUNCTION_DECLs.
|
|
|
|
friends are kept in TREE_LISTs, so that there's no need to use their
|
|
TREE_CHAIN slot for anything.
|
|
|
|
Has values of:
|
|
|
|
TREE_VECs
|
|
|
|
|
|
@item CLASSTYPE_VFIELD
|
|
Seems to be in the process of being renamed TYPE_VFIELD. Use on types
|
|
to get the main virtual function table pointer. To get the virtual
|
|
function table use BINFO_VTABLE (TYPE_BINFO ()).
|
|
|
|
Has values of:
|
|
|
|
FIELD_DECLs that are virtual function table pointers
|
|
|
|
What things can this be used on:
|
|
|
|
RECORD_TYPEs
|
|
|
|
|
|
@item DECL_CLASS_CONTEXT
|
|
Identifies the context that the _DECL was found in. For virtual function
|
|
tables, it points to the type associated with the virtual function
|
|
table. See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_FCONTEXT.
|
|
|
|
The difference between this and DECL_CONTEXT, is that for virtuals
|
|
functions like:
|
|
|
|
@example
|
|
struct A
|
|
@{
|
|
virtual int f ();
|
|
@};
|
|
|
|
struct B : A
|
|
@{
|
|
int f ();
|
|
@};
|
|
|
|
DECL_CONTEXT (A::f) == A
|
|
DECL_CLASS_CONTEXT (A::f) == A
|
|
|
|
DECL_CONTEXT (B::f) == A
|
|
DECL_CLASS_CONTEXT (B::f) == B
|
|
@end example
|
|
|
|
Has values of:
|
|
|
|
RECORD_TYPEs, or UNION_TYPEs
|
|
|
|
What things can this be used on:
|
|
|
|
TYPE_DECLs, _DECLs
|
|
|
|
|
|
@item DECL_CONTEXT
|
|
Identifies the context that the _DECL was found in. Can be used on
|
|
virtual function tables to find the type associated with the virtual
|
|
function table, but since they are FIELD_DECLs, DECL_FIELD_CONTEXT is a
|
|
better access method. Internally the same as DECL_FIELD_CONTEXT, so
|
|
don't us both. See also DECL_FIELD_CONTEXT, DECL_FCONTEXT and
|
|
DECL_CLASS_CONTEXT.
|
|
|
|
Has values of:
|
|
|
|
RECORD_TYPEs
|
|
|
|
|
|
What things can this be used on:
|
|
|
|
@display
|
|
VAR_DECLs that are virtual function tables
|
|
_DECLs
|
|
@end display
|
|
|
|
|
|
@item DECL_FIELD_CONTEXT
|
|
Identifies the context that the FIELD_DECL was found in. Internally the
|
|
same as DECL_CONTEXT, so don't us both. See also DECL_CONTEXT,
|
|
DECL_FCONTEXT and DECL_CLASS_CONTEXT.
|
|
|
|
Has values of:
|
|
|
|
RECORD_TYPEs
|
|
|
|
What things can this be used on:
|
|
|
|
@display
|
|
FIELD_DECLs that are virtual function pointers
|
|
FIELD_DECLs
|
|
@end display
|
|
|
|
|
|
@item DECL_NAME
|
|
|
|
Has values of:
|
|
|
|
@display
|
|
0 for things that don't have names
|
|
IDENTIFIER_NODEs for TYPE_DECLs
|
|
@end display
|
|
|
|
@item DECL_IGNORED_P
|
|
A bit that can be set to inform the debug information output routines in
|
|
the back-end that a certain _DECL node should be totally ignored.
|
|
|
|
Used in cases where it is known that the debugging information will be
|
|
output in another file, or where a sub-type is known not to be needed
|
|
because the enclosing type is not needed.
|
|
|
|
A compiler constructed virtual destructor in derived classes that do not
|
|
define an explicit destructor that was defined explicit in a base class
|
|
has this bit set as well. Also used on __FUNCTION__ and
|
|
__PRETTY_FUNCTION__ to mark they are ``compiler generated.'' c-decl and
|
|
c-lex.c both want DECL_IGNORED_P set for ``internally generated vars,''
|
|
and ``user-invisible variable.''
|
|
|
|
Functions built by the C++ front-end such as default destructors,
|
|
virtual destructors and default constructors want to be marked that
|
|
they are compiler generated, but unsure why.
|
|
|
|
Currently, it is used in an absolute way in the C++ front-end, as an
|
|
optimization, to tell the debug information output routines to not
|
|
generate debugging information that will be output by another separately
|
|
compiled file.
|
|
|
|
|
|
@item DECL_VIRTUAL_P
|
|
A flag used on FIELD_DECLs and VAR_DECLs. (Documentation in tree.h is
|
|
wrong.) Used in VAR_DECLs to indicate that the variable is a vtable.
|
|
It is also used in FIELD_DECLs for vtable pointers.
|
|
|
|
What things can this be used on:
|
|
|
|
FIELD_DECLs and VAR_DECLs
|
|
|
|
|
|
@item DECL_VPARENT
|
|
Used to point to the parent type of the vtable if there is one, else it
|
|
is just the type associated with the vtable. Because of the sharing of
|
|
virtual function tables that goes on, this slot is not very useful, and
|
|
is in fact, not used in the compiler at all. It can be removed.
|
|
|
|
What things can this be used on:
|
|
|
|
VAR_DECLs that are virtual function tables
|
|
|
|
Has values of:
|
|
|
|
RECORD_TYPEs maybe UNION_TYPEs
|
|
|
|
|
|
@item DECL_FCONTEXT
|
|
Used to find the first baseclass in which this FIELD_DECL is defined.
|
|
See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_CLASS_CONTEXT.
|
|
|
|
How it is used:
|
|
|
|
Used when writing out debugging information about vfield and
|
|
vbase decls.
|
|
|
|
What things can this be used on:
|
|
|
|
FIELD_DECLs that are virtual function pointers
|
|
FIELD_DECLs
|
|
|
|
|
|
@item DECL_REFERENCE_SLOT
|
|
Used to hold the initialize for the reference.
|
|
|
|
What things can this be used on:
|
|
|
|
PARM_DECLs and VAR_DECLs that have a reference type
|
|
|
|
|
|
@item DECL_VINDEX
|
|
Used for FUNCTION_DECLs in two different ways. Before the structure
|
|
containing the FUNCTION_DECL is laid out, DECL_VINDEX may point to a
|
|
FUNCTION_DECL in a base class which is the FUNCTION_DECL which this
|
|
FUNCTION_DECL will replace as a virtual function. When the class is
|
|
laid out, this pointer is changed to an INTEGER_CST node which is
|
|
suitable to find an index into the virtual function table. See
|
|
get_vtable_entry as to how one can find the right index into the virtual
|
|
function table. The first index 0, of a virtual function table it not
|
|
used in the normal way, so the first real index is 1.
|
|
|
|
DECL_VINDEX may be a TREE_LIST, that would seem to be a list of
|
|
overridden FUNCTION_DECLs. add_virtual_function has code to deal with
|
|
this when it uses the variable base_fndecl_list, but it would seem that
|
|
somehow, it is possible for the TREE_LIST to pursist until method_call,
|
|
and it should not.
|
|
|
|
|
|
What things can this be used on:
|
|
|
|
FUNCTION_DECLs
|
|
|
|
|
|
@item DECL_SOURCE_FILE
|
|
Identifies what source file a particular declaration was found in.
|
|
|
|
Has values of:
|
|
|
|
"<built-in>" on TYPE_DECLs to mean the typedef is built in
|
|
|
|
|
|
@item DECL_SOURCE_LINE
|
|
Identifies what source line number in the source file the declaration
|
|
was found at.
|
|
|
|
Has values of:
|
|
|
|
@display
|
|
0 for an undefined label
|
|
|
|
0 for TYPE_DECLs that are internally generated
|
|
|
|
0 for FUNCTION_DECLs for functions generated by the compiler
|
|
(not yet, but should be)
|
|
|
|
0 for ``magic'' arguments to functions, that the user has no
|
|
control over
|
|
@end display
|
|
|
|
|
|
@item TREE_USED
|
|
|
|
Has values of:
|
|
|
|
0 for unused labels
|
|
|
|
|
|
@item TREE_ADDRESSABLE
|
|
A flag that is set for any type that has a constructor.
|
|
|
|
|
|
@item TREE_COMPLEXITY
|
|
They seem a kludge way to track recursion, poping, and pushing. They only
|
|
appear in cp-decl.c and cp-decl2.c, so the are a good candidate for
|
|
proper fixing, and removal.
|
|
|
|
|
|
@item TREE_HAS_CONSTRUCTOR
|
|
A flag to indicate when a CALL_EXPR represents a call to a constructor.
|
|
If set, we know that the type of the object, is the complete type of the
|
|
object, and that the value returned is nonnull. When used in this
|
|
fashion, it is an optimization. Can also be used on SAVE_EXPRs to
|
|
indicate when they are of fixed type and nonnull. Can also be used on
|
|
INDIRECT_EXPRs on CALL_EXPRs that represent a call to a constructor.
|
|
|
|
|
|
@item TREE_PRIVATE
|
|
Set for FIELD_DECLs by finish_struct. But not uniformly set.
|
|
|
|
The following routines do something with PRIVATE access:
|
|
build_method_call, alter_access, finish_struct_methods,
|
|
finish_struct, convert_to_aggr, CWriteLanguageDecl, CWriteLanguageType,
|
|
CWriteUseObject, compute_access, lookup_field, dfs_pushdecl,
|
|
GNU_xref_member, dbxout_type_fields, dbxout_type_method_1
|
|
|
|
|
|
@item TREE_PROTECTED
|
|
The following routines do something with PROTECTED access:
|
|
build_method_call, alter_access, finish_struct, convert_to_aggr,
|
|
CWriteLanguageDecl, CWriteLanguageType, CWriteUseObject,
|
|
compute_access, lookup_field, GNU_xref_member, dbxout_type_fields,
|
|
dbxout_type_method_1
|
|
|
|
|
|
@item TYPE_BINFO
|
|
Used to get the binfo for the type.
|
|
|
|
Has values of:
|
|
|
|
TREE_VECs that are binfos
|
|
|
|
What things can this be used on:
|
|
|
|
RECORD_TYPEs
|
|
|
|
|
|
@item TYPE_BINFO_BASETYPES
|
|
See also BINFO_BASETYPES.
|
|
|
|
@item TYPE_BINFO_VIRTUALS
|
|
A unique list of functions for the virtual function table. See also
|
|
BINFO_VIRTUALS.
|
|
|
|
What things can this be used on:
|
|
|
|
RECORD_TYPEs
|
|
|
|
|
|
@item TYPE_BINFO_VTABLE
|
|
Points to the virtual function table associated with the given type.
|
|
See also BINFO_VTABLE.
|
|
|
|
What things can this be used on:
|
|
|
|
RECORD_TYPEs
|
|
|
|
Has values of:
|
|
|
|
VAR_DECLs that are virtual function tables
|
|
|
|
|
|
@item TYPE_NAME
|
|
Names the type.
|
|
|
|
Has values of:
|
|
|
|
@display
|
|
0 for things that don't have names.
|
|
should be IDENTIFIER_NODE for RECORD_TYPEs UNION_TYPEs and
|
|
ENUM_TYPEs.
|
|
TYPE_DECL for RECORD_TYPEs, UNION_TYPEs and ENUM_TYPEs, but
|
|
shouldn't be.
|
|
TYPE_DECL for typedefs, unsure why.
|
|
@end display
|
|
|
|
What things can one use this on:
|
|
|
|
@display
|
|
TYPE_DECLs
|
|
RECORD_TYPEs
|
|
UNION_TYPEs
|
|
ENUM_TYPEs
|
|
@end display
|
|
|
|
History:
|
|
|
|
It currently points to the TYPE_DECL for RECORD_TYPEs,
|
|
UNION_TYPEs and ENUM_TYPEs, but it should be history soon.
|
|
|
|
|
|
@item TYPE_METHODS
|
|
Synonym for @code{CLASSTYPE_METHOD_VEC}. Chained together with
|
|
@code{TREE_CHAIN}. @file{dbxout.c} uses this to get at the methods of a
|
|
class.
|
|
|
|
|
|
@item TYPE_DECL
|
|
Used to represent typedefs, and used to represent bindings layers.
|
|
|
|
Components:
|
|
|
|
DECL_NAME is the name of the typedef. For example, foo would
|
|
be found in the DECL_NAME slot when @code{typedef int foo;} is
|
|
seen.
|
|
|
|
DECL_SOURCE_LINE identifies what source line number in the
|
|
source file the declaration was found at. A value of 0
|
|
indicates that this TYPE_DECL is just an internal binding layer
|
|
marker, and does not correspond to a user supplied typedef.
|
|
|
|
DECL_SOURCE_FILE
|
|
|
|
@item TYPE_FIELDS
|
|
A linked list (via @code{TREE_CHAIN}) of member types of a class. The
|
|
list can contain @code{TYPE_DECL}s, but there can also be other things
|
|
in the list apparently. See also @code{CLASSTYPE_TAGS}.
|
|
|
|
|
|
@item TYPE_VIRTUAL_P
|
|
A flag used on a @code{FIELD_DECL} or a @code{VAR_DECL}, indicates it is
|
|
a virtual function table or a pointer to one. When used on a
|
|
@code{FUNCTION_DECL}, indicates that it is a virtual function. When
|
|
used on an @code{IDENTIFIER_NODE}, indicates that a function with this
|
|
same name exists and has been declared virtual.
|
|
|
|
When used on types, it indicates that the type has virtual functions, or
|
|
is derived from one that does.
|
|
|
|
Not sure if the above about virtual function tables is still true. See
|
|
also info on @code{DECL_VIRTUAL_P}.
|
|
|
|
What things can this be used on:
|
|
|
|
FIELD_DECLs, VAR_DECLs, FUNCTION_DECLs, IDENTIFIER_NODEs
|
|
|
|
|
|
@item VF_BASETYPE_VALUE
|
|
Get the associated type from the binfo that caused the given vfield to
|
|
exist. This is the least derived class (the most parent class) that
|
|
needed a virtual function table. It is probably the case that all uses
|
|
of this field are misguided, but they need to be examined on a
|
|
case-by-case basis. See history for more information on why the
|
|
previous statement was made.
|
|
|
|
Set at @code{finish_base_struct} time.
|
|
|
|
What things can this be used on:
|
|
|
|
TREE_LISTs that are vfields
|
|
|
|
History:
|
|
|
|
This field was used to determine if a virtual function table's
|
|
slot should be filled in with a certain virtual function, by
|
|
checking to see if the type returned by VF_BASETYPE_VALUE was a
|
|
parent of the context in which the old virtual function existed.
|
|
This incorrectly assumes that a given type _could_ not appear as
|
|
a parent twice in a given inheritance lattice. For single
|
|
inheritance, this would in fact work, because a type could not
|
|
possibly appear more than once in an inheritance lattice, but
|
|
with multiple inheritance, a type can appear more than once.
|
|
|
|
|
|
@item VF_BINFO_VALUE
|
|
Identifies the binfo that caused this vfield to exist. If this vfield
|
|
is from the first direct base class that has a virtual function table,
|
|
then VF_BINFO_VALUE is NULL_TREE, otherwise it will be the binfo of the
|
|
direct base where the vfield came from. Can use @code{TREE_VIA_VIRTUAL}
|
|
on result to find out if it is a virtual base class. Related to the
|
|
binfo found by
|
|
|
|
@example
|
|
get_binfo (VF_BASETYPE_VALUE (vfield), t, 0)
|
|
@end example
|
|
|
|
@noindent
|
|
where @samp{t} is the type that has the given vfield.
|
|
|
|
@example
|
|
get_binfo (VF_BASETYPE_VALUE (vfield), t, 0)
|
|
@end example
|
|
|
|
@noindent
|
|
will return the binfo for the given vfield.
|
|
|
|
May or may not be set at @code{modify_vtable_entries} time. Set at
|
|
@code{finish_base_struct} time.
|
|
|
|
What things can this be used on:
|
|
|
|
TREE_LISTs that are vfields
|
|
|
|
|
|
@item VF_DERIVED_VALUE
|
|
Identifies the type of the most derived class of the vfield, excluding
|
|
the class this vfield is for.
|
|
|
|
Set at @code{finish_base_struct} time.
|
|
|
|
What things can this be used on:
|
|
|
|
TREE_LISTs that are vfields
|
|
|
|
|
|
@item VF_NORMAL_VALUE
|
|
Identifies the type of the most derived class of the vfield, including
|
|
the class this vfield is for.
|
|
|
|
Set at @code{finish_base_struct} time.
|
|
|
|
What things can this be used on:
|
|
|
|
TREE_LISTs that are vfields
|
|
|
|
|
|
@item WRITABLE_VTABLES
|
|
This is a option that can be defined when building the compiler, that
|
|
will cause the compiler to output vtables into the data segment so that
|
|
the vtables maybe written. This is undefined by default, because
|
|
normally the vtables should be unwritable. People that implement object
|
|
I/O facilities may, or people that want to change the dynamic type of
|
|
objects may want to have the vtables writable. Another way of achieving
|
|
this would be to make a copy of the vtable into writable memory, but the
|
|
drawback there is that that method only changes the type for one object.
|
|
|
|
@end table
|
|
|
|
@node Typical Behavior, Coding Conventions, Macros, Top
|
|
@section Typical Behavior
|
|
|
|
@cindex parse errors
|
|
|
|
Whenever seemingly normal code fails with errors like
|
|
@code{syntax error at `\@{'}, it's highly likely that grokdeclarator is
|
|
returning a NULL_TREE for whatever reason.
|
|
|
|
@node Coding Conventions, Templates, Typical Behavior, Top
|
|
@section Coding Conventions
|
|
|
|
It should never be that case that trees are modified in-place by the
|
|
back-end, @emph{unless} it is guaranteed that the semantics are the same
|
|
no matter how shared the tree structure is. @file{fold-const.c} still
|
|
has some cases where this is not true, but rms hypothesizes that this
|
|
will never be a problem.
|
|
|
|
@node Templates, Access Control, Coding Conventions, Top
|
|
@section Templates
|
|
|
|
A template is represented by a @code{TEMPLATE_DECL}. The specific
|
|
fields used are:
|
|
|
|
@table @code
|
|
@item DECL_TEMPLATE_RESULT
|
|
The generic decl on which instantiations are based. This looks just
|
|
like any other decl.
|
|
|
|
@item DECL_TEMPLATE_PARMS
|
|
The parameters to this template.
|
|
@end table
|
|
|
|
The generic decl is parsed as much like any other decl as possible,
|
|
given the parameterization. The template decl is not built up until the
|
|
generic decl has been completed. For template classes, a template decl
|
|
is generated for each member function and static data member, as well.
|
|
|
|
Template members of template classes are represented by a TEMPLATE_DECL
|
|
for the class' parameters around another TEMPLATE_DECL for the member's
|
|
parameters.
|
|
|
|
All declarations that are instantiations or specializations of templates
|
|
refer to their template and parameters through DECL_TEMPLATE_INFO.
|
|
|
|
How should I handle parsing member functions with the proper param
|
|
decls? Set them up again or try to use the same ones? Currently we do
|
|
the former. We can probably do this without any extra machinery in
|
|
store_pending_inline, by deducing the parameters from the decl in
|
|
do_pending_inlines. PRE_PARSED_TEMPLATE_DECL?
|
|
|
|
If a base is a parm, we can't check anything about it. If a base is not
|
|
a parm, we need to check it for name binding. Do finish_base_struct if
|
|
no bases are parameterized (only if none, including indirect, are
|
|
parms). Nah, don't bother trying to do any of this until instantiation
|
|
-- we only need to do name binding in advance.
|
|
|
|
Always set up method vec and fields, inc. synthesized methods. Really?
|
|
We can't know the types of the copy folks, or whether we need a
|
|
destructor, or can have a default ctor, until we know our bases and
|
|
fields. Otherwise, we can assume and fix ourselves later. Hopefully.
|
|
|
|
@node Access Control, Error Reporting, Templates, Top
|
|
@section Access Control
|
|
The function compute_access returns one of three values:
|
|
|
|
@table @code
|
|
@item access_public
|
|
means that the field can be accessed by the current lexical scope.
|
|
|
|
@item access_protected
|
|
means that the field cannot be accessed by the current lexical scope
|
|
because it is protected.
|
|
|
|
@item access_private
|
|
means that the field cannot be accessed by the current lexical scope
|
|
because it is private.
|
|
@end table
|
|
|
|
DECL_ACCESS is used for access declarations; alter_access creates a list
|
|
of types and accesses for a given decl.
|
|
|
|
Formerly, DECL_@{PUBLIC,PROTECTED,PRIVATE@} corresponded to the return
|
|
codes of compute_access and were used as a cache for compute_access.
|
|
Now they are not used at all.
|
|
|
|
TREE_PROTECTED and TREE_PRIVATE are used to record the access levels
|
|
granted by the containing class. BEWARE: TREE_PUBLIC means something
|
|
completely unrelated to access control!
|
|
|
|
@node Error Reporting, Parser, Access Control, Top
|
|
@section Error Reporting
|
|
|
|
The C++ front-end uses a call-back mechanism to allow functions to print
|
|
out reasonable strings for types and functions without putting extra
|
|
logic in the functions where errors are found. The interface is through
|
|
the @code{cp_error} function (or @code{cp_warning}, etc.). The
|
|
syntax is exactly like that of @code{error}, except that a few more
|
|
conversions are supported:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
%C indicates a value of `enum tree_code'.
|
|
@item
|
|
%D indicates a *_DECL node.
|
|
@item
|
|
%E indicates a *_EXPR node.
|
|
@item
|
|
%L indicates a value of `enum languages'.
|
|
@item
|
|
%P indicates the name of a parameter (i.e. "this", "1", "2", ...)
|
|
@item
|
|
%T indicates a *_TYPE node.
|
|
@item
|
|
%O indicates the name of an operator (MODIFY_EXPR -> "operator =").
|
|
|
|
@end itemize
|
|
|
|
There is some overlap between these; for instance, any of the node
|
|
options can be used for printing an identifier (though only @code{%D}
|
|
tries to decipher function names).
|
|
|
|
For a more verbose message (@code{class foo} as opposed to just @code{foo},
|
|
including the return type for functions), use @code{%#c}.
|
|
To have the line number on the error message indicate the line of the
|
|
DECL, use @code{cp_error_at} and its ilk; to indicate which argument you want,
|
|
use @code{%+D}, or it will default to the first.
|
|
|
|
@node Parser, Exception Handling, Error Reporting, Top
|
|
@section Parser
|
|
|
|
Some comments on the parser:
|
|
|
|
The @code{after_type_declarator} / @code{notype_declarator} hack is
|
|
necessary in order to allow redeclarations of @code{TYPENAME}s, for
|
|
instance
|
|
|
|
@example
|
|
typedef int foo;
|
|
class A @{
|
|
char *foo;
|
|
@};
|
|
@end example
|
|
|
|
In the above, the first @code{foo} is parsed as a @code{notype_declarator},
|
|
and the second as a @code{after_type_declarator}.
|
|
|
|
Ambiguities:
|
|
|
|
There are currently four reduce/reduce ambiguities in the parser. They are:
|
|
|
|
1) Between @code{template_parm} and
|
|
@code{named_class_head_sans_basetype}, for the tokens @code{aggr
|
|
identifier}. This situation occurs in code looking like
|
|
|
|
@example
|
|
template <class T> class A @{ @};
|
|
@end example
|
|
|
|
It is ambiguous whether @code{class T} should be parsed as the
|
|
declaration of a template type parameter named @code{T} or an unnamed
|
|
constant parameter of type @code{class T}. Section 14.6, paragraph 3 of
|
|
the January '94 working paper states that the first interpretation is
|
|
the correct one. This ambiguity results in two reduce/reduce conflicts.
|
|
|
|
2) Between @code{primary} and @code{type_id} for code like @samp{int()}
|
|
in places where both can be accepted, such as the argument to
|
|
@code{sizeof}. Section 8.1 of the pre-San Diego working paper specifies
|
|
that these ambiguous constructs will be interpreted as @code{typename}s.
|
|
This ambiguity results in six reduce/reduce conflicts between
|
|
@samp{absdcl} and @samp{functional_cast}.
|
|
|
|
3) Between @code{functional_cast} and
|
|
@code{complex_direct_notype_declarator}, for various token strings.
|
|
This situation occurs in code looking like
|
|
|
|
@example
|
|
int (*a);
|
|
@end example
|
|
|
|
This code is ambiguous; it could be a declaration of the variable
|
|
@samp{a} as a pointer to @samp{int}, or it could be a functional cast of
|
|
@samp{*a} to @samp{int}. Section 6.8 specifies that the former
|
|
interpretation is correct. This ambiguity results in 7 reduce/reduce
|
|
conflicts. Another aspect of this ambiguity is code like 'int (x[2]);',
|
|
which is resolved at the '[' and accounts for 6 reduce/reduce conflicts
|
|
between @samp{direct_notype_declarator} and
|
|
@samp{primary}/@samp{overqualified_id}. Finally, there are 4 r/r
|
|
conflicts between @samp{expr_or_declarator} and @samp{primary} over code
|
|
like 'int (a);', which could probably be resolved but would also
|
|
probably be more trouble than it's worth. In all, this situation
|
|
accounts for 17 conflicts. Ack!
|
|
|
|
The second case above is responsible for the failure to parse 'LinppFile
|
|
ppfile (String (argv[1]), &outs, argc, argv);' (from Rogue Wave
|
|
Math.h++) as an object declaration, and must be fixed so that it does
|
|
not resolve until later.
|
|
|
|
4) Indirectly between @code{after_type_declarator} and @code{parm}, for
|
|
type names. This occurs in (as one example) code like
|
|
|
|
@example
|
|
typedef int foo, bar;
|
|
class A @{
|
|
foo (bar);
|
|
@};
|
|
@end example
|
|
|
|
What is @code{bar} inside the class definition? We currently interpret
|
|
it as a @code{parm}, as does Cfront, but IBM xlC interprets it as an
|
|
@code{after_type_declarator}. I believe that xlC is correct, in light
|
|
of 7.1p2, which says "The longest sequence of @i{decl-specifiers} that
|
|
could possibly be a type name is taken as the @i{decl-specifier-seq} of
|
|
a @i{declaration}." However, it seems clear that this rule must be
|
|
violated in the case of constructors. This ambiguity accounts for 8
|
|
conflicts.
|
|
|
|
Unlike the others, this ambiguity is not recognized by the Working Paper.
|
|
|
|
@node Exception Handling, Free Store, Parser, Top
|
|
@section Exception Handling
|
|
|
|
Note, exception handling in g++ is still under development.
|
|
|
|
This section describes the mapping of C++ exceptions in the C++
|
|
front-end, into the back-end exception handling framework.
|
|
|
|
The basic mechanism of exception handling in the back-end is
|
|
unwind-protect a la elisp. This is a general, robust, and language
|
|
independent representation for exceptions.
|
|
|
|
The C++ front-end exceptions are mapping into the unwind-protect
|
|
semantics by the C++ front-end. The mapping is describe below.
|
|
|
|
When -frtti is used, rtti is used to do exception object type checking,
|
|
when it isn't used, the encoded name for the type of the object being
|
|
thrown is used instead. All code that originates exceptions, even code
|
|
that throws exceptions as a side effect, like dynamic casting, and all
|
|
code that catches exceptions must be compiled with either -frtti, or
|
|
-fno-rtti. It is not possible to mix rtti base exception handling
|
|
objects with code that doesn't use rtti. The exceptions to this, are
|
|
code that doesn't catch or throw exceptions, catch (...), and code that
|
|
just rethrows an exception.
|
|
|
|
Currently we use the normal mangling used in building functions names
|
|
(int's are "i", const char * is PCc) to build the non-rtti base type
|
|
descriptors for exception handling. These descriptors are just plain
|
|
NULL terminated strings, and internally they are passed around as char
|
|
*.
|
|
|
|
In C++, all cleanups should be protected by exception regions. The
|
|
region starts just after the reason why the cleanup is created has
|
|
ended. For example, with an automatic variable, that has a constructor,
|
|
it would be right after the constructor is run. The region ends just
|
|
before the finalization is expanded. Since the backend may expand the
|
|
cleanup multiple times along different paths, once for normal end of the
|
|
region, once for non-local gotos, once for returns, etc, the backend
|
|
must take special care to protect the finalization expansion, if the
|
|
expansion is for any other reason than normal region end, and it is
|
|
`inline' (it is inside the exception region). The backend can either
|
|
choose to move them out of line, or it can created an exception region
|
|
over the finalization to protect it, and in the handler associated with
|
|
it, it would not run the finalization as it otherwise would have, but
|
|
rather just rethrow to the outer handler, careful to skip the normal
|
|
handler for the original region.
|
|
|
|
In Ada, they will use the more runtime intensive approach of having
|
|
fewer regions, but at the cost of additional work at run time, to keep a
|
|
list of things that need cleanups. When a variable has finished
|
|
construction, they add the cleanup to the list, when the come to the end
|
|
of the lifetime of the variable, the run the list down. If the take a
|
|
hit before the section finishes normally, they examine the list for
|
|
actions to perform. I hope they add this logic into the back-end, as it
|
|
would be nice to get that alternative approach in C++.
|
|
|
|
On an rs6000, xlC stores exception objects on that stack, under the try
|
|
block. When is unwinds down into a handler, the frame pointer is
|
|
adjusted back to the normal value for the frame in which the handler
|
|
resides, and the stack pointer is left unchanged from the time at which
|
|
the object was thrown. This is so that there is always someplace for
|
|
the exception object, and nothing can overwrite it, once we start
|
|
throwing. The only bad part, is that the stack remains large.
|
|
|
|
The below points out some things that work in g++'s exception handling.
|
|
|
|
All completely constructed temps and local variables are cleaned up in
|
|
all unwinded scopes. Completely constructed parts of partially
|
|
constructed objects are cleaned up. This includes partially built
|
|
arrays. Exception specifications are now handled. Thrown objects are
|
|
now cleaned up all the time. We can now tell if we have an active
|
|
exception being thrown or not (__eh_type != 0). We use this to call
|
|
terminate if someone does a throw; without there being an active
|
|
exception object. uncaught_exception () works. Exception handling
|
|
should work right if you optimize. Exception handling should work with
|
|
-fpic or -fPIC.
|
|
|
|
The below points out some flaws in g++'s exception handling, as it now
|
|
stands.
|
|
|
|
Only exact type matching or reference matching of throw types works when
|
|
-fno-rtti is used. Only works on a SPARC (like Suns) (both -mflat and
|
|
-mno-flat models work), SPARClite, Hitachi SH, i386, arm, rs6000,
|
|
PowerPC, Alpha, mips, VAX, m68k and z8k machines. SPARC v9 may not
|
|
work. HPPA is mostly done, but throwing between a shared library and
|
|
user code doesn't yet work. Some targets have support for data-driven
|
|
unwinding. Partial support is in for all other machines, but a stack
|
|
unwinder called __unwind_function has to be written, and added to
|
|
libgcc2 for them. The new EH code doesn't rely upon the
|
|
__unwind_function for C++ code, instead it creates per function
|
|
unwinders right inside the function, unfortunately, on many platforms
|
|
the definition of RETURN_ADDR_RTX in the tm.h file for the machine port
|
|
is wrong. See below for details on __unwind_function. RTL_EXPRs for EH
|
|
cond variables for && and || exprs should probably be wrapped in
|
|
UNSAVE_EXPRs, and RTL_EXPRs tweaked so that they can be unsaved.
|
|
|
|
We only do pointer conversions on exception matching a la 15.3 p2 case
|
|
3: `A handler with type T, const T, T&, or const T& is a match for a
|
|
throw-expression with an object of type E if [3]T is a pointer type and
|
|
E is a pointer type that can be converted to T by a standard pointer
|
|
conversion (_conv.ptr_) not involving conversions to pointers to private
|
|
or protected base classes.' when -frtti is given.
|
|
|
|
We don't call delete on new expressions that die because the ctor threw
|
|
an exception. See except/18 for a test case.
|
|
|
|
15.2 para 13: The exception being handled should be rethrown if control
|
|
reaches the end of a handler of the function-try-block of a constructor
|
|
or destructor, right now, it is not.
|
|
|
|
15.2 para 12: If a return statement appears in a handler of
|
|
function-try-block of a constructor, the program is ill-formed, but this
|
|
isn't diagnosed.
|
|
|
|
15.2 para 11: If the handlers of a function-try-block contain a jump
|
|
into the body of a constructor or destructor, the program is ill-formed,
|
|
but this isn't diagnosed.
|
|
|
|
15.2 para 9: Check that the fully constructed base classes and members
|
|
of an object are destroyed before entering the handler of a
|
|
function-try-block of a constructor or destructor for that object.
|
|
|
|
build_exception_variant should sort the incoming list, so that it
|
|
implements set compares, not exact list equality. Type smashing should
|
|
smash exception specifications using set union.
|
|
|
|
Thrown objects are usually allocated on the heap, in the usual way. If
|
|
one runs out of heap space, throwing an object will probably never work.
|
|
This could be relaxed some by passing an __in_chrg parameter to track
|
|
who has control over the exception object. Thrown objects are not
|
|
allocated on the heap when they are pointer to object types. We should
|
|
extend it so that all small (<4*sizeof(void*)) objects are stored
|
|
directly, instead of allocated on the heap.
|
|
|
|
When the backend returns a value, it can create new exception regions
|
|
that need protecting. The new region should rethrow the object in
|
|
context of the last associated cleanup that ran to completion.
|
|
|
|
The structure of the code that is generated for C++ exception handling
|
|
code is shown below:
|
|
|
|
@example
|
|
Ln: throw value;
|
|
copy value onto heap
|
|
jump throw (Ln, id, address of copy of value on heap)
|
|
|
|
try @{
|
|
+Lstart: the start of the main EH region
|
|
|... ...
|
|
+Lend: the end of the main EH region
|
|
@} catch (T o) @{
|
|
...1
|
|
@}
|
|
Lresume:
|
|
nop used to make sure there is something before
|
|
the next region ends, if there is one
|
|
... ...
|
|
|
|
jump Ldone
|
|
[
|
|
Lmainhandler: handler for the region Lstart-Lend
|
|
cleanup
|
|
] zero or more, depending upon automatic vars with dtors
|
|
+Lpartial:
|
|
| jump Lover
|
|
+Lhere:
|
|
rethrow (Lhere, same id, same obj);
|
|
Lterm: handler for the region Lpartial-Lhere
|
|
call terminate
|
|
Lover:
|
|
[
|
|
[
|
|
call throw_type_match
|
|
if (eq) @{
|
|
] these lines disappear when there is no catch condition
|
|
+Lsregion2:
|
|
| ...1
|
|
| jump Lresume
|
|
|Lhandler: handler for the region Lsregion2-Leregion2
|
|
| rethrow (Lresume, same id, same obj);
|
|
+Leregion2
|
|
@}
|
|
] there are zero or more of these sections, depending upon how many
|
|
catch clauses there are
|
|
----------------------------- expand_end_all_catch --------------------------
|
|
here we have fallen off the end of all catch
|
|
clauses, so we rethrow to outer
|
|
rethrow (Lresume, same id, same obj);
|
|
----------------------------- expand_end_all_catch --------------------------
|
|
[
|
|
L1: maybe throw routine
|
|
] depending upon if we have expanded it or not
|
|
Ldone:
|
|
ret
|
|
|
|
start_all_catch emits labels: Lresume,
|
|
|
|
@end example
|
|
|
|
The __unwind_function takes a pointer to the throw handler, and is
|
|
expected to pop the stack frame that was built to call it, as well as
|
|
the frame underneath and then jump to the throw handler. It must
|
|
restore all registers to their proper values as well as all other
|
|
machine state as determined by the context in which we are unwinding
|
|
into. The way I normally start is to compile:
|
|
|
|
void *g;
|
|
foo(void* a) @{ g = a; @}
|
|
|
|
with -S, and change the thing that alters the PC (return, or ret
|
|
usually) to not alter the PC, making sure to leave all other semantics
|
|
(like adjusting the stack pointer, or frame pointers) in. After that,
|
|
replicate the prologue once more at the end, again, changing the PC
|
|
altering instructions, and finally, at the very end, jump to `g'.
|
|
|
|
It takes about a week to write this routine, if someone wants to
|
|
volunteer to write this routine for any architecture, exception support
|
|
for that architecture will be added to g++. Please send in those code
|
|
donations. One other thing that needs to be done, is to double check
|
|
that __builtin_return_address (0) works.
|
|
|
|
@subsection Specific Targets
|
|
|
|
For the alpha, the __unwind_function will be something resembling:
|
|
|
|
@example
|
|
void
|
|
__unwind_function(void *ptr)
|
|
@{
|
|
/* First frame */
|
|
asm ("ldq $15, 8($30)"); /* get the saved frame ptr; 15 is fp, 30 is sp */
|
|
asm ("bis $15, $15, $30"); /* reload sp with the fp we found */
|
|
|
|
/* Second frame */
|
|
asm ("ldq $15, 8($30)"); /* fp */
|
|
asm ("bis $15, $15, $30"); /* reload sp with the fp we found */
|
|
|
|
/* Return */
|
|
asm ("ret $31, ($16), 1"); /* return to PTR, stored in a0 */
|
|
@}
|
|
@end example
|
|
|
|
@noindent
|
|
However, there are a few problems preventing it from working. First of
|
|
all, the gcc-internal function @code{__builtin_return_address} needs to
|
|
work given an argument of 0 for the alpha. As it stands as of August
|
|
30th, 1995, the code for @code{BUILT_IN_RETURN_ADDRESS} in @file{expr.c}
|
|
will definitely not work on the alpha. Instead, we need to define
|
|
the macros @code{DYNAMIC_CHAIN_ADDRESS} (maybe),
|
|
@code{RETURN_ADDR_IN_PREVIOUS_FRAME}, and definitely need a new
|
|
definition for @code{RETURN_ADDR_RTX}.
|
|
|
|
In addition (and more importantly), we need a way to reliably find the
|
|
frame pointer on the alpha. The use of the value 8 above to restore the
|
|
frame pointer (register 15) is incorrect. On many systems, the frame
|
|
pointer is consistently offset to a specific point on the stack. On the
|
|
alpha, however, the frame pointer is pushed last. First the return
|
|
address is stored, then any other registers are saved (e.g., @code{s0}),
|
|
and finally the frame pointer is put in place. So @code{fp} could have
|
|
an offset of 8, but if the calling function saved any registers at all,
|
|
they add to the offset.
|
|
|
|
The only places the frame size is noted are with the @samp{.frame}
|
|
directive, for use by the debugger and the OSF exception handling model
|
|
(useless to us), and in the initial computation of the new value for
|
|
@code{sp}, the stack pointer. For example, the function may start with:
|
|
|
|
@example
|
|
lda $30,-32($30)
|
|
.frame $15,32,$26,0
|
|
@end example
|
|
|
|
@noindent
|
|
The 32 above is exactly the value we need. With this, we can be sure
|
|
that the frame pointer is stored 8 bytes less---in this case, at 24(sp)).
|
|
The drawback is that there is no way that I (Brendan) have found to let
|
|
us discover the size of a previous frame @emph{inside} the definition
|
|
of @code{__unwind_function}.
|
|
|
|
So to accomplish exception handling support on the alpha, we need two
|
|
things: first, a way to figure out where the frame pointer was stored,
|
|
and second, a functional @code{__builtin_return_address} implementation
|
|
for except.c to be able to use it.
|
|
|
|
Or just support DWARF 2 unwind info.
|
|
|
|
@subsection New Backend Exception Support
|
|
|
|
This subsection discusses various aspects of the design of the
|
|
data-driven model being implemented for the exception handling backend.
|
|
|
|
The goal is to generate enough data during the compilation of user code,
|
|
such that we can dynamically unwind through functions at run time with a
|
|
single routine (@code{__throw}) that lives in libgcc.a, built by the
|
|
compiler, and dispatch into associated exception handlers.
|
|
|
|
This information is generated by the DWARF 2 debugging backend, and
|
|
includes all of the information __throw needs to unwind an arbitrary
|
|
frame. It specifies where all of the saved registers and the return
|
|
address can be found at any point in the function.
|
|
|
|
Major disadvantages when enabling exceptions are:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
Code that uses caller saved registers, can't, when flow can be
|
|
transferred into that code from an exception handler. In high performance
|
|
code this should not usually be true, so the effects should be minimal.
|
|
|
|
@end itemize
|
|
|
|
@subsection Backend Exception Support
|
|
|
|
The backend must be extended to fully support exceptions. Right now
|
|
there are a few hooks into the alpha exception handling backend that
|
|
resides in the C++ frontend from that backend that allows exception
|
|
handling to work in g++. An exception region is a segment of generated
|
|
code that has a handler associated with it. The exception regions are
|
|
denoted in the generated code as address ranges denoted by a starting PC
|
|
value and an ending PC value of the region. Some of the limitations
|
|
with this scheme are:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
The backend replicates insns for such things as loop unrolling and
|
|
function inlining. Right now, there are no hooks into the frontend's
|
|
exception handling backend to handle the replication of insns. When
|
|
replication happens, a new exception region descriptor needs to be
|
|
generated for the new region.
|
|
|
|
@item
|
|
The backend expects to be able to rearrange code, for things like jump
|
|
optimization. Any rearranging of the code needs have exception region
|
|
descriptors updated appropriately.
|
|
|
|
@item
|
|
The backend can eliminate dead code. Any associated exception region
|
|
descriptor that refers to fully contained code that has been eliminated
|
|
should also be removed, although not doing this is harmless in terms of
|
|
semantics.
|
|
|
|
@end itemize
|
|
|
|
The above is not meant to be exhaustive, but does include all things I
|
|
have thought of so far. I am sure other limitations exist.
|
|
|
|
Below are some notes on the migration of the exception handling code
|
|
backend from the C++ frontend to the backend.
|
|
|
|
NOTEs are to be used to denote the start of an exception region, and the
|
|
end of the region. I presume that the interface used to generate these
|
|
notes in the backend would be two functions, start_exception_region and
|
|
end_exception_region (or something like that). The frontends are
|
|
required to call them in pairs. When marking the end of a region, an
|
|
argument can be passed to indicate the handler for the marked region.
|
|
This can be passed in many ways, currently a tree is used. Another
|
|
possibility would be insns for the handler, or a label that denotes a
|
|
handler. I have a feeling insns might be the best way to pass it.
|
|
Semantics are, if an exception is thrown inside the region, control is
|
|
transferred unconditionally to the handler. If control passes through
|
|
the handler, then the backend is to rethrow the exception, in the
|
|
context of the end of the original region. The handler is protected by
|
|
the conventional mechanisms; it is the frontend's responsibility to
|
|
protect the handler, if special semantics are required.
|
|
|
|
This is a very low level view, and it would be nice is the backend
|
|
supported a somewhat higher level view in addition to this view. This
|
|
higher level could include source line number, name of the source file,
|
|
name of the language that threw the exception and possibly the name of
|
|
the exception. Kenner may want to rope you into doing more than just
|
|
the basics required by C++. You will have to resolve this. He may want
|
|
you to do support for non-local gotos, first scan for exception handler,
|
|
if none is found, allow the debugger to be entered, without any cleanups
|
|
being done. To do this, the backend would have to know the difference
|
|
between a cleanup-rethrower, and a real handler, if would also have to
|
|
have a way to know if a handler `matches' a thrown exception, and this
|
|
is frontend specific.
|
|
|
|
The stack unwinder is one of the hardest parts to do. It is highly
|
|
machine dependent. The form that kenner seems to like was a couple of
|
|
macros, that would do the machine dependent grunt work. One preexisting
|
|
function that might be of some use is __builtin_return_address (). One
|
|
macro he seemed to want was __builtin_return_address, and the other
|
|
would do the hard work of fixing up the registers, adjusting the stack
|
|
pointer, frame pointer, arg pointer and so on.
|
|
|
|
|
|
@node Free Store, Mangling, Exception Handling, Top
|
|
@section Free Store
|
|
|
|
@code{operator new []} adds a magic cookie to the beginning of arrays
|
|
for which the number of elements will be needed by @code{operator delete
|
|
[]}. These are arrays of objects with destructors and arrays of objects
|
|
that define @code{operator delete []} with the optional size_t argument.
|
|
This cookie can be examined from a program as follows:
|
|
|
|
@example
|
|
typedef unsigned long size_t;
|
|
extern "C" int printf (const char *, ...);
|
|
|
|
size_t nelts (void *p)
|
|
@{
|
|
struct cookie @{
|
|
size_t nelts __attribute__ ((aligned (sizeof (double))));
|
|
@};
|
|
|
|
cookie *cp = (cookie *)p;
|
|
--cp;
|
|
|
|
return cp->nelts;
|
|
@}
|
|
|
|
struct A @{
|
|
~A() @{ @}
|
|
@};
|
|
|
|
main()
|
|
@{
|
|
A *ap = new A[3];
|
|
printf ("%ld\n", nelts (ap));
|
|
@}
|
|
@end example
|
|
|
|
@section Linkage
|
|
The linkage code in g++ is horribly twisted in order to meet two design goals:
|
|
|
|
1) Avoid unnecessary emission of inlines and vtables.
|
|
|
|
2) Support pedantic assemblers like the one in AIX.
|
|
|
|
To meet the first goal, we defer emission of inlines and vtables until
|
|
the end of the translation unit, where we can decide whether or not they
|
|
are needed, and how to emit them if they are.
|
|
|
|
@node Mangling, Vtables, Free Store, Top
|
|
@section Function name mangling for C++ and Java
|
|
|
|
Both C++ and Jave provide overloaded function and methods,
|
|
which are methods with the same types but different parameter lists.
|
|
Selecting the correct version is done at compile time.
|
|
Though the overloaded functions have the same name in the source code,
|
|
they need to be translated into different assembler-level names,
|
|
since typical assemblers and linkers cannot handle overloading.
|
|
This process of encoding the parameter types with the method name
|
|
into a unique name is called @dfn{name mangling}. The inverse
|
|
process is called @dfn{demangling}.
|
|
|
|
It is convenient that C++ and Java use compatible mangling schemes,
|
|
since the makes life easier for tools such as gdb, and it eases
|
|
integration between C++ and Java.
|
|
|
|
Note there is also a standard "Jave Native Interface" (JNI) which
|
|
implements a different calling convention, and uses a different
|
|
mangling scheme. The JNI is a rather abstract ABI so Java can call methods
|
|
written in C or C++;
|
|
we are concerned here about a lower-level interface primarily
|
|
intended for methods written in Java, but that can also be used for C++
|
|
(and less easily C).
|
|
|
|
Note that on systems that follow BSD tradition, a C identifier @code{var}
|
|
would get "mangled" into the assembler name @samp{_var}. On such
|
|
systems, all other mangled names are also prefixed by a @samp{_}
|
|
which is not shown in the following examples.
|
|
|
|
@subsection Method name mangling
|
|
|
|
C++ mangles a method by emitting the function name, followed by @code{__},
|
|
followed by encodings of any method qualifiers (such as @code{const}),
|
|
followed by the mangling of the method's class,
|
|
followed by the mangling of the parameters, in order.
|
|
|
|
For example @code{Foo::bar(int, long) const} is mangled
|
|
as @samp{bar__C3Fooil}.
|
|
|
|
For a constructor, the method name is left out.
|
|
That is @code{Foo::Foo(int, long) const} is mangled
|
|
as @samp{__C3Fooil}.
|
|
|
|
GNU Java does the same.
|
|
|
|
@subsection Primitive types
|
|
|
|
The C++ types @code{int}, @code{long}, @code{short}, @code{char},
|
|
and @code{long long} are mangled as @samp{i}, @samp{l},
|
|
@samp{s}, @samp{c}, and @samp{x}, respectively.
|
|
The corresponding unsigned types have @samp{U} prefixed
|
|
to the mangling. The type @code{signed char} is mangled @samp{Sc}.
|
|
|
|
The C++ and Java floating-point types @code{float} and @code{double}
|
|
are mangled as @samp{f} and @samp{d} respectively.
|
|
|
|
The C++ @code{bool} type and the Java @code{boolean} type are
|
|
mangled as @samp{b}.
|
|
|
|
The C++ @code{wchar_t} and the Java @code{char} types are
|
|
mangled as @samp{w}.
|
|
|
|
The Java integral types @code{byte}, @code{short}, @code{int}
|
|
and @code{long} are mangled as @samp{c}, @samp{s}, @samp{i},
|
|
and @samp{x}, respectively.
|
|
|
|
C++ code that has included @code{javatypes.h} will mangle
|
|
the typedefs @code{jbyte}, @code{jshort}, @code{jint}
|
|
and @code{jlong} as respectively @samp{c}, @samp{s}, @samp{i},
|
|
and @samp{x}. (This has not been implemented yet.)
|
|
|
|
@subsection Mangling of simple names
|
|
|
|
A simple class, package, template, or namespace name is
|
|
encoded as the number of characters in the name, followed by
|
|
the actual characters. Thus the class @code{Foo}
|
|
is encoded as @samp{3Foo}.
|
|
|
|
If any of the characters in the name are not alphanumeric
|
|
(i.e not one of the standard ASCII letters, digits, or '_'),
|
|
or the initial character is a digit, then the name is
|
|
mangled as a sequence of encoded Unicode letters.
|
|
A Unicode encoding starts with a @samp{U} to indicate
|
|
that Unicode escapes are used, followed by the number of
|
|
bytes used by the Unicode encoding, followed by the bytes
|
|
representing the encoding. ASSCI letters and
|
|
non-initial digits are encoded without change. However, all
|
|
other characters (including underscore and initial digits) are
|
|
translated into a sequence starting with an underscore,
|
|
followed by the big-endian 4-hex-digit lower-case encoding of the character.
|
|
|
|
If a method name contains Unicode-escaped characters, the
|
|
entire mangled method name is followed by a @samp{U}.
|
|
|
|
For example, the method @code{X\u0319::M\u002B(int)} is encoded as
|
|
@samp{M_002b__U6X_0319iU}.
|
|
|
|
|
|
@subsection Pointer and reference types
|
|
|
|
A C++ pointer type is mangled as @samp{P} followed by the
|
|
mangling of the type pointed to.
|
|
|
|
A C++ reference type as mangled as @samp{R} followed by the
|
|
mangling of the type referenced.
|
|
|
|
A Java object reference type is equivalent
|
|
to a C++ pointer parameter, so we mangle such an parameter type
|
|
as @samp{P} followed by the mangling of the class name.
|
|
|
|
@subsection Squangled type compression
|
|
|
|
Squangling (enabled with the @samp{-fsquangle} option), utilizes the
|
|
@samp{B} code to indicate reuse of a previously seen type within an
|
|
indentifier. Types are recognized in a left to right manner and given
|
|
increasing values, which are appended to the code in the standard
|
|
manner. Ie, multiple digit numbers are delimited by @samp{_}
|
|
characters. A type is considered to be any non primitive type,
|
|
regardless of whether its a parameter, template parameter, or entire
|
|
template. Certain codes are considered modifiers of a type, and are not
|
|
included as part of the type. These are the @samp{C}, @samp{V},
|
|
@samp{P}, @samp{A}, @samp{R}, @samp{U} and @samp{u} codes, denoting
|
|
constant, volatile, pointer, array, reference, unsigned, and restrict.
|
|
These codes may precede a @samp{B} type in order to make the required
|
|
modifications to the type.
|
|
|
|
For example:
|
|
@example
|
|
template <class T> class class1 @{ @};
|
|
|
|
template <class T> class class2 @{ @};
|
|
|
|
class class3 @{ @};
|
|
|
|
int f(class2<class1<class3> > a ,int b, const class1<class3>&c, class3 *d) @{ @}
|
|
|
|
B0 -> class2<class1<class3>
|
|
B1 -> class1<class3>
|
|
B2 -> class3
|
|
@end example
|
|
Produces the mangled name @samp{f__FGt6class21Zt6class11Z6class3iRCB1PB2}.
|
|
The int parameter is a basic type, and does not receive a B encoding...
|
|
|
|
@subsection Qualified names
|
|
|
|
Both C++ and Java allow a class to be lexically nested inside another
|
|
class. C++ also supports namespaces (not yet implemented by G++).
|
|
Java also supports packages.
|
|
|
|
These are all mangled the same way: First the letter @samp{Q}
|
|
indicates that we are emitting a qualified name.
|
|
That is followed by the number of parts in the qualified name.
|
|
If that number is 9 or less, it is emitted with no delimiters.
|
|
Otherwise, an underscore is written before and after the count.
|
|
Then follows each part of the qualified name, as described above.
|
|
|
|
For example @code{Foo::\u0319::Bar} is encoded as
|
|
@samp{Q33FooU5_03193Bar}.
|
|
|
|
Squangling utilizes the the letter @samp{K} to indicate a
|
|
remembered portion of a qualified name. As qualified names are processed
|
|
for an identifier, the names are numbered and remembered in a
|
|
manner similar to the @samp{B} type compression code.
|
|
Names are recognized left to right, and given increasing values, which are
|
|
appended to the code in the standard manner. ie, multiple digit numbers
|
|
are delimited by @samp{_} characters.
|
|
|
|
For example
|
|
@example
|
|
class Andrew
|
|
@{
|
|
class WasHere
|
|
@{
|
|
class AndHereToo
|
|
@{
|
|
@};
|
|
@};
|
|
@};
|
|
|
|
f(Andrew&r1, Andrew::WasHere& r2, Andrew::WasHere::AndHereToo& r3) @{ @}
|
|
|
|
K0 -> Andrew
|
|
K1 -> Andrew::WasHere
|
|
K2 -> Andrew::WasHere::AndHereToo
|
|
@end example
|
|
Function @samp{f()} would be mangled as :
|
|
@samp{f__FR6AndrewRQ2K07WasHereRQ2K110AndHereToo}
|
|
|
|
There are some occasions when either a @samp{B} or @samp{K} code could
|
|
be chosen, preference is always given to the @samp{B} code. Ie, the example
|
|
in the section on @samp{B} mangling could have used a @samp{K} code
|
|
instead of @samp{B2}.
|
|
|
|
@subsection Templates
|
|
|
|
A class template instantiation is encoded as the letter @samp{t},
|
|
followed by the encoding of the template name, followed
|
|
the number of template parameters, followed by encoding of the template
|
|
parameters. If a template parameter is a type, it is written
|
|
as a @samp{Z} followed by the encoding of the type.
|
|
|
|
A function template specialization (either an instantiation or an
|
|
explicit specialization) is encoded by an @samp{H} followed by the
|
|
encoding of the template parameters, as described above, followed by an
|
|
@samp{_}, the encoding of the argument types to the template function
|
|
(not the specialization), another @samp{_}, and the return type. (Like
|
|
the argument types, the return type is the return type of the function
|
|
template, not the specialization.) Template parameters in the argument
|
|
and return types are encoded by an @samp{X} for type parameters, or a
|
|
@samp{Y} for constant parameters, an index indicating their position
|
|
in the template parameter list declaration, and their template depth.
|
|
|
|
@subsection Arrays
|
|
|
|
C++ array types are mangled by emitting @samp{A}, followed by
|
|
the length of the array, followed by an @samp{_}, followed by
|
|
the mangling of the element type. Of course, normally
|
|
array parameter types decay into a pointer types, so you
|
|
don't see this.
|
|
|
|
Java arrays are objects. A Java type @code{T[]} is mangled
|
|
as if it were the C++ type @code{JArray<T>}.
|
|
For example @code{java.lang.String[]} is encoded as
|
|
@samp{Pt6JArray1ZPQ34java4lang6String}.
|
|
|
|
@subsection Static fields
|
|
|
|
Both C++ and Java classes can have static fields.
|
|
These are allocated statically, and are shared among all instances.
|
|
|
|
The mangling starts with a prefix (@samp{_} in most systems), which is
|
|
followed by the mangling
|
|
of the class name, followed by the "joiner" and finally the field name.
|
|
The joiner (see @code{JOINER} in @code{cp-tree.h}) is a special
|
|
separator character. For historical reasons (and idiosyncracies
|
|
of assembler syntax) it can @samp{$} or @samp{.} (or even
|
|
@samp{_} on a few systems). If the joiner is @samp{_} then the prefix
|
|
is @samp{__static_} instead of just @samp{_}.
|
|
|
|
For example @code{Foo::Bar::var} (or @code{Foo.Bar.var} in Java syntax)
|
|
would be encoded as @samp{_Q23Foo3Bar$var} or @samp{_Q23Foo3Bar.var}
|
|
(or rarely @samp{__static_Q23Foo3Bar_var}).
|
|
|
|
If the name of a static variable needs Unicode escapes,
|
|
the Unicode indicator @samp{U} comes before the "joiner".
|
|
This @code{\u1234Foo::var\u3445} becomes @code{_U8_1234FooU.var_3445}.
|
|
|
|
@subsection Table of demangling code characters
|
|
|
|
The following special characters are used in mangling:
|
|
|
|
@table @samp
|
|
@item A
|
|
Indicates a C++ array type.
|
|
|
|
@item b
|
|
Encodes the C++ @code{bool} type,
|
|
and the Java @code{boolean} type.
|
|
|
|
@item B
|
|
Used for squangling. Similar in concept to the 'T' non-squangled code.
|
|
|
|
@item c
|
|
Encodes the C++ @code{char} type, and the Java @code{byte} type.
|
|
|
|
@item C
|
|
A modifier to indicate a @code{const} type.
|
|
Also used to indicate a @code{const} member function
|
|
(in which cases it precedes the encoding of the method's class).
|
|
|
|
@item d
|
|
Encodes the C++ and Java @code{double} types.
|
|
|
|
@item e
|
|
Indicates extra unknown arguments @code{...}.
|
|
|
|
@item E
|
|
Indicates the opening parenthesis of an expression.
|
|
|
|
@item f
|
|
Encodes the C++ and Java @code{float} types.
|
|
|
|
@item F
|
|
Used to indicate a function type.
|
|
|
|
@item H
|
|
Used to indicate a template function.
|
|
|
|
@item i
|
|
Encodes the C++ and Java @code{int} types.
|
|
|
|
@item I
|
|
Encodes typedef names of the form @code{int@var{n}_t}, where @var{n} is a
|
|
positive decimal number. The @samp{I} is followed by either two
|
|
hexidecimal digits, which encode the value of @var{n}, or by an
|
|
arbitrary number of hexidecimal digits between underscores. For
|
|
example, @samp{I40} encodes the type @code{int64_t}, and @samp{I_200_}
|
|
encodes the type @code{int512_t}.
|
|
|
|
@item J
|
|
Indicates a complex type.
|
|
|
|
@item K
|
|
Used by squangling to compress qualified names.
|
|
|
|
@item l
|
|
Encodes the C++ @code{long} type.
|
|
|
|
@item n
|
|
Immediate repeated type. Followed by the repeat count.
|
|
|
|
@item N
|
|
Repeated type. Followed by the repeat count of the repeated type,
|
|
followed by the type index of the repeated type. Due to a bug in
|
|
g++ 2.7.2, this is only generated if index is 0. Superceded by
|
|
@samp{n} when squangling.
|
|
|
|
@item P
|
|
Indicates a pointer type. Followed by the type pointed to.
|
|
|
|
@item Q
|
|
Used to mangle qualified names, which arise from nested classes.
|
|
Also used for namespaces.
|
|
In Java used to mangle package-qualified names, and inner classes.
|
|
|
|
@item r
|
|
Encodes the GNU C++ @code{long double} type.
|
|
|
|
@item R
|
|
Indicates a reference type. Followed by the referenced type.
|
|
|
|
@item s
|
|
Encodes the C++ and java @code{short} types.
|
|
|
|
@item S
|
|
A modifier that indicates that the following integer type is signed.
|
|
Only used with @code{char}.
|
|
|
|
Also used as a modifier to indicate a static member function.
|
|
|
|
@item t
|
|
Indicates a template instantiation.
|
|
|
|
@item T
|
|
A back reference to a previously seen type.
|
|
|
|
@item U
|
|
A modifier that indicates that the following integer type is unsigned.
|
|
Also used to indicate that the following class or namespace name
|
|
is encoded using Unicode-mangling.
|
|
|
|
@item u
|
|
The @code{restrict} type qualifier.
|
|
|
|
@item v
|
|
Encodes the C++ and Java @code{void} types.
|
|
|
|
@item V
|
|
A modifier for a @code{volatile} type or method.
|
|
|
|
@item w
|
|
Encodes the C++ @code{wchar_t} type, and the Java @code{char} types.
|
|
|
|
@item W
|
|
Indicates the closing parenthesis of an expression.
|
|
|
|
@item x
|
|
Encodes the GNU C++ @code{long long} type, and the Java @code{long} type.
|
|
|
|
@item X
|
|
Encodes a template type parameter, when part of a function type.
|
|
|
|
@item Y
|
|
Encodes a template constant parameter, when part of a function type.
|
|
|
|
@item Z
|
|
Used for template type parameters.
|
|
|
|
@end table
|
|
|
|
The letters @samp{G}, @samp{M}, @samp{O}, and @samp{p}
|
|
also seem to be used for obscure purposes ...
|
|
|
|
@node Vtables, Concept Index, Mangling, Top
|
|
@section Virtual Tables
|
|
|
|
In order to invoke virtual functions, GNU C++ uses virtual tables. Each
|
|
virtual function gets an index, and the table entry points to the
|
|
overridden function to call. Sometimes, and adjustment to the this
|
|
pointer has to be made before calling a virtual function:
|
|
|
|
@example
|
|
struct A@{
|
|
int i;
|
|
virtual void foo();
|
|
@};
|
|
|
|
struct B@{
|
|
int j;
|
|
virtual void bar();
|
|
@};
|
|
|
|
struct C:A,B@{
|
|
virtual void bar();
|
|
@};
|
|
|
|
void C::bar()
|
|
@{
|
|
i++;
|
|
@}
|
|
|
|
int main()
|
|
@{
|
|
C *c = new C;
|
|
B *b = c;
|
|
c->bar();
|
|
@}
|
|
@end example
|
|
|
|
Here, casting from @samp{c} to @samp{b} adds an offset. When @samp{bar}
|
|
is called, this offset needs to be subtracted, so that @samp{C::bar} can
|
|
properly access @samp{i}. One approach of achieving this is to use
|
|
@emph{thunks}, which are small half-functions put into the virtual
|
|
table. The modify the first argument (the @samp{this} pointer), and then
|
|
jump into the real function.
|
|
|
|
The other (traditional) approach is to have an additional integer in the
|
|
virtual table which is added to this. This is an additional overhead
|
|
both at the function call, and in the size of virtual tables: In the
|
|
case of single inheritance (or for the first base class), these integers
|
|
will always be zero.
|
|
|
|
@subsection Virtual Base Classes with Virtual Tables
|
|
|
|
In case of virtual bases, the code is even more
|
|
complicated. Constructors and destructors need to know whether they are
|
|
"in charge" of the virtual bases, and an implicit integer
|
|
@samp{__in_chrg} for that purpose.
|
|
|
|
@example
|
|
struct A@{
|
|
int i;
|
|
virtual void bar();
|
|
void call_bar()@{bar();@}
|
|
@};
|
|
|
|
struct B:virtual A@{
|
|
B();
|
|
int j;
|
|
virtual void bar();
|
|
@};
|
|
|
|
B::B()@{
|
|
call_bar();
|
|
@}
|
|
|
|
struct C@{
|
|
int k;
|
|
@};
|
|
|
|
struct D:C,B@{
|
|
int l;
|
|
virtual void bar();
|
|
@};
|
|
|
|
@end example
|
|
|
|
When constructing an instance of B, it will have the following layout:
|
|
@samp{vbase pointer to A}, @samp{j}, @samp{A virtual table}, @samp{i}.
|
|
On a 32-bit machine, downcasting from @samp{A*} to @samp{B*} would need
|
|
to subtract 8, which would be the thunk executed when calling
|
|
@samp{B::bar} inside @samp{call_bar}.
|
|
|
|
When constructing an instance of D, it will have a different layout:
|
|
@samp{k}, @samp{vbase pointer to A}, @samp{j}, @samp{l}, @samp{A virtual
|
|
table}, @samp{i}. So, when downcasting from @samp{A*} to @samp{B*} in a
|
|
@samp{D} object, the offset would be @samp{12}.
|
|
|
|
This means that during construction of the @samp{B} base of a @samp{D}
|
|
object, a virtual table is needed which has a @samp{-12} thunk to
|
|
@samp{B::bar}. This is @emph{only} needed during construction and
|
|
destruction, as the full object will use a @samp{-16} thunk to
|
|
@samp{D::bar}.
|
|
|
|
In order to implement this, the compiler generates an implicit argument
|
|
(in addition to @code{__in_chrg}): the virtual list argument
|
|
@code{__vlist}. This is a list of virtual tables needed during
|
|
construction and destruction. The virtual pointers are ordered in the
|
|
way they are used during construction; the destructors will process the
|
|
array in reverse order. The ordering is as follows:
|
|
@itemize @bullet
|
|
@item
|
|
If the class is in charge, the vlist starts with virtual table pointers
|
|
for the virtual bases that have virtual bases themselves. Here, only
|
|
@emph{polymorphic} virtual bases (pvbases) are interesting: if a vbase
|
|
has no virtual functions, it doesn't have a virtual table.
|
|
|
|
@item
|
|
Next, the vlist has virtual tables for the initialization of the
|
|
non-virtual bases. These bases are not in charge, so the layout is
|
|
recursive, but ignores virtual bases during recursion.
|
|
|
|
@item
|
|
Next, there is a number of virtual tables for each virtual base. These
|
|
are sorted in the order in which virtual bases are constructed. Each
|
|
virtual base may have more than one @code{vfield}, and therefore require
|
|
more than one @code{vtable}. The order of vtables is the same as used
|
|
when initializing vfields of non-virtual bases in a constructor.
|
|
@end itemize
|
|
|
|
The compiler emits a virtual table list in a variable mangled as
|
|
@code{__vl.classname}.
|
|
|
|
Class with virtual bases, but without pvbases, only have the
|
|
@code{__in_chrg} argument to their ctors and dtors: they don't have any
|
|
vfields in the vbases to initialize.
|
|
|
|
A further problem arises with virtual destructors: A destructor
|
|
typically has only the @code{__in_chrg} argument, which also indicates
|
|
whether the destructor should call @code{operator delete}. A dtor of a
|
|
class with pvbases has an additional argument. Unfortunately, a caller
|
|
of a virtual dtor might not know whether to pass that argument or not.
|
|
Therefore, the dtor processes the @code{__vlist} argument in an
|
|
automatic variable, which is initialized from the class' vlist if the
|
|
__in_chrg flag has a zero value in bit 2 (bit mask 4), or from the
|
|
argument @code{__vlist1} if bit 2 of the __in_chrg parameter is set to
|
|
one.
|
|
|
|
@subsection Specification of non-thunked vtables
|
|
|
|
In the traditional implementation of vtables, each slot contains three
|
|
fields: The offset to be added to the this pointer before invoking a
|
|
virtual function, an unused field that is always zero, and the pointer
|
|
to the virtual function. The first two fields are typically 16 bits
|
|
wide. The unused field is called `index'; it may be non-zero in
|
|
pointer-to-member-functions, which use the same layout.
|
|
|
|
The virtual table then is an array of vtable slots. The first slot is
|
|
always the virtual type info function, the other slots are in the order
|
|
in which the virtual functions appear in the class declaration.
|
|
|
|
If a class has base classes, it may inherit other bases' vfields. Each
|
|
class may have a primary vfield; the primary vfield of the derived class
|
|
is the primary vfield of the left-most non-virtual base class. If a
|
|
class inherits a primary vfield, any new virtual functions in the
|
|
derived class are appended to the virtual table of the primary
|
|
vfield. If there are new virtual functions in the derived class, and no
|
|
primary vfield is inherited, a new vfield is introduced which becomes
|
|
primary. The redefined virtual functions fill the vtable slots inherited
|
|
from the base; new virtual functions are put into the primary vtable in
|
|
the order of declaration. If no new virtual functions are introduced, no
|
|
primary vfield is allocated.
|
|
|
|
In a base class that has pvbases, virtual tables are needed which are
|
|
used only in the constructor (see example above). At run-time, the
|
|
virtual tables of the base class are adjusted, to reflect the new offset
|
|
of the pvbase. The compiler knows statically what offset the pvbase has
|
|
for a complete object. At run-time, the offset of the pvbase can be
|
|
extracted from the vbase pointer, which is set in the constructor of the
|
|
complete object. These two offsets result in a delta, which is used to
|
|
adjust the deltas in the vtable (the adjustment might be different for
|
|
different vtable slots). To adjust the vtables, the compiler emits code
|
|
that creates a vtable on the stack. This vtable is initialized with the
|
|
vtable for the complete base type, and then adjusted.
|
|
|
|
In order to call a virtual function, the compiler gets the offset field
|
|
from the vtable entry, and adds it to the this pointer. It then
|
|
indirectly calls the virtual function pointer, passing the adjusted this
|
|
pointer, and any arguments the virtual function may have.
|
|
|
|
To implement dynamic casting, the dynamic_cast function needs typeinfos
|
|
for the complete type, and the pointer to the complete type. The
|
|
typeinfo pointer is obtained by calling the virtual typeinfo function
|
|
(which doesn't take a this parameter). The pointer to the complete
|
|
object is obtained by adding the offset of the virtual typeinfo vtable
|
|
slot, since this virtual function is always implemented in the complete
|
|
object.
|
|
|
|
@subsection Specification of thunked vtables
|
|
|
|
For vtable thunks, each slot only consists of a pointer to the virtual
|
|
function, which might be a thunk function. The first slot in the vtable
|
|
is an offset of the this pointer to the complete object, which is needed
|
|
as a parameter to __dynamic_cast. The second slot is the virtual
|
|
typeinfo function. All other slots are allocated with the same procedure
|
|
as in the non-thunked case. Allocation of vfields also uses the same
|
|
procedure as described above.
|
|
|
|
If the virtual function needs an adjusted this pointer, a thunk function
|
|
is emitted. If supported by the target architecture, this is only a
|
|
half-function. Such a thunk has no stack frame; it merely adjusts the
|
|
first argument of the function, and then directly branches into the
|
|
implementation of the virtual function. If the architecture does not
|
|
support half-functions (i.e. if ASM_OUTPUT_MI_THUNK is not defined), the
|
|
compiler emits a wrapper function, which copies all arguments, adjust
|
|
the this pointer, and then calls the original function. Since objects of
|
|
non-aggregate type are passed by invisible reference, this copies only
|
|
POD arguments. The approach fails for virtual functions with a variable
|
|
number of arguments.
|
|
|
|
In order to support the vtables needed in base constructors with
|
|
pvbases, the compiler passes an implicit __vlist argument as described
|
|
above, if the version 2 thunks are used. For version 1 thunks, the base
|
|
class constructor will fill in the vtables for the complete base class,
|
|
which will incorrectly adjust the this pointer, leading to a dynamic
|
|
error.
|
|
|
|
@node Concept Index, , Vtables, Top
|
|
|
|
@section Concept Index
|
|
|
|
@printindex cp
|
|
|
|
@bye
|