2025 lines
64 KiB
Plaintext
2025 lines
64 KiB
Plaintext
|
@c Copyright (C) 1999 Free Software Foundation, Inc.
|
||
|
@c This is part of the G77 manual.
|
||
|
@c For copying conditions, see the file g77.texi.
|
||
|
|
||
|
@node Front End
|
||
|
@chapter Front End
|
||
|
@cindex GNU Fortran Front End (FFE)
|
||
|
@cindex FFE
|
||
|
@cindex @code{g77}, front end
|
||
|
@cindex front end, @code{g77}
|
||
|
|
||
|
This chapter describes some aspects of the design and implementation
|
||
|
of the @code{g77} front end.
|
||
|
Much of the information below applies not to current
|
||
|
releases of @code{g77},
|
||
|
but to the 0.6 rewrite being designed and implemented
|
||
|
as of late May, 1999.
|
||
|
|
||
|
To find about things that are ``To Be Determined'' or ``To Be Done'',
|
||
|
search for the string TBD.
|
||
|
If you want to help by working on one or more of these items,
|
||
|
email me at @email{@value{email-burley}}.
|
||
|
If you're planning to do more than just research issues and offer comments,
|
||
|
see @uref{http://www.gnu.org/software/contribute.html} for steps you might
|
||
|
need to take first.
|
||
|
|
||
|
@menu
|
||
|
* Overview of Sources::
|
||
|
* Overview of Translation Process::
|
||
|
* Philosophy of Code Generation::
|
||
|
* Two-pass Design::
|
||
|
* Challenges Posed::
|
||
|
* Transforming Statements::
|
||
|
* Transforming Expressions::
|
||
|
* Internal Naming Conventions::
|
||
|
@end menu
|
||
|
|
||
|
@node Overview of Sources
|
||
|
@section Overview of Sources
|
||
|
|
||
|
The current directory layout includes the following:
|
||
|
|
||
|
@table @file
|
||
|
@item @value{srcdir}/gcc/
|
||
|
Non-g77 files in gcc
|
||
|
|
||
|
@item @value{srcdir}/gcc/f/
|
||
|
GNU Fortran front end sources
|
||
|
|
||
|
@item @value{srcdir}/libf2c/
|
||
|
@code{libg2c} configuration and @code{g2c.h} file generation
|
||
|
|
||
|
@item @value{srcdir}/libf2c/libF77/
|
||
|
General support and math portion of @code{libg2c}
|
||
|
|
||
|
@item @value{srcdir}/libf2c/libI77/
|
||
|
I/O portion of @code{libg2c}
|
||
|
|
||
|
@item @value{srcdir}/libf2c/libU77/
|
||
|
Additional interfaces to Unix @code{libc} for @code{libg2c}
|
||
|
@end table
|
||
|
|
||
|
Components of note in @code{g77} are described below.
|
||
|
|
||
|
@file{f/} as a whole contains the source for @code{g77},
|
||
|
while @file{libf2c/} contains a portion of the separate program
|
||
|
@code{f2c}.
|
||
|
Note that the @code{libf2c} code is not part of the program @code{g77},
|
||
|
just distributed with it.
|
||
|
|
||
|
@file{f/} contains text files that document the Fortran compiler, source
|
||
|
files for the GNU Fortran Front End (FFE), and some other stuff.
|
||
|
The @code{g77} compiler code is placed in @file{f/} because it,
|
||
|
along with its contents,
|
||
|
is designed to be a subdirectory of a @code{gcc} source directory,
|
||
|
@file{gcc/},
|
||
|
which is structured so that language-specific front ends can be ``dropped
|
||
|
in'' as subdirectories.
|
||
|
The C++ front end (@code{g++}), is an example of this---it resides in
|
||
|
the @file{cp/} subdirectory.
|
||
|
Note that the C front end (also referred to as @code{gcc})
|
||
|
is an exception to this, as its source files reside
|
||
|
in the @file{gcc/} directory itself.
|
||
|
|
||
|
@file{libf2c/} contains the run-time libraries for the @code{f2c} program,
|
||
|
also used by @code{g77}.
|
||
|
These libraries normally referred to collectively as @code{libf2c}.
|
||
|
When built as part of @code{g77},
|
||
|
@code{libf2c} is installed under the name @code{libg2c} to avoid
|
||
|
conflict with any existing version of @code{libf2c},
|
||
|
and thus is often referred to as @code{libg2c} when the
|
||
|
@code{g77} version is specifically being referred to.
|
||
|
|
||
|
The @code{netlib} version of @code{libf2c/}
|
||
|
contains two distinct libraries,
|
||
|
@code{libF77} and @code{libI77},
|
||
|
each in their own subdirectories.
|
||
|
In @code{g77}, this distinction is not made,
|
||
|
beyond maintaining the subdirectory structure in the source-code tree.
|
||
|
|
||
|
@file{libf2c/} is not part of the program @code{g77},
|
||
|
just distributed with it.
|
||
|
It contains files not present
|
||
|
in the official (@code{netlib}) version of @code{libf2c},
|
||
|
and also contains some minor changes made from @code{libf2c},
|
||
|
to fix some bugs,
|
||
|
and to facilitate automatic configuration, building, and installation of
|
||
|
@code{libf2c} (as @code{libg2c}) for use by @code{g77} users.
|
||
|
See @file{libf2c/README} for more information,
|
||
|
including licensing conditions
|
||
|
governing distribution of programs containing code from @code{libg2c}.
|
||
|
|
||
|
@code{libg2c}, @code{g77}'s version of @code{libf2c},
|
||
|
adds Dave Love's implementation of @code{libU77},
|
||
|
in the @file{libf2c/libU77/} directory.
|
||
|
This library is distributed under the
|
||
|
GNU Library General Public License (LGPL)---see the
|
||
|
file @file{libf2c/libU77/COPYING.LIB}
|
||
|
for more information,
|
||
|
as this license
|
||
|
governs distribution conditions for programs containing code
|
||
|
from this portion of the library.
|
||
|
|
||
|
Files of note in @file{f/} and @file{libf2c/} are described below:
|
||
|
|
||
|
@table @file
|
||
|
@item f/BUGS
|
||
|
Lists some important bugs known to be in g77.
|
||
|
Or use Info (or GNU Emacs Info mode) to read
|
||
|
the ``Actual Bugs'' node of the @code{g77} documentation:
|
||
|
|
||
|
@smallexample
|
||
|
info -f f/g77.info -n "Actual Bugs"
|
||
|
@end smallexample
|
||
|
|
||
|
@item f/ChangeLog
|
||
|
Lists recent changes to @code{g77} internals.
|
||
|
|
||
|
@item libf2c/ChangeLog
|
||
|
Lists recent changes to @code{libg2c} internals.
|
||
|
|
||
|
@item f/NEWS
|
||
|
Contains the per-release changes.
|
||
|
These include the user-visible
|
||
|
changes described in the node ``Changes''
|
||
|
in the @code{g77} documentation, plus internal
|
||
|
changes of import.
|
||
|
Or use:
|
||
|
|
||
|
@smallexample
|
||
|
info -f f/g77.info -n News
|
||
|
@end smallexample
|
||
|
|
||
|
@item f/g77.info*
|
||
|
The @code{g77} documentation, in Info format,
|
||
|
produced by building @code{g77}.
|
||
|
|
||
|
All users of @code{g77} (not just installers) should read this,
|
||
|
using the @code{more} command if neither the @code{info} command,
|
||
|
nor GNU Emacs (with its Info mode), are available, or if users
|
||
|
aren't yet accustomed to using these tools.
|
||
|
All of these files are readable as ``plain text'' files,
|
||
|
though they're easier to navigate using Info readers
|
||
|
such as @code{info} and GNU Emacs Info mode.
|
||
|
@end table
|
||
|
|
||
|
If you want to explore the FFE code, which lives entirely in @file{f/},
|
||
|
here are a few clues.
|
||
|
The file @file{g77spec.c} contains the @code{g77}-specific source code
|
||
|
for the @code{g77} command only---this just forms a variant of the
|
||
|
@code{gcc} command, so,
|
||
|
just as the @code{gcc} command itself does not contain the C front end,
|
||
|
the @code{g77} command does not contain the Fortran front end (FFE).
|
||
|
The FFE code ends up in an executable named @file{f771},
|
||
|
which does the actual compiling,
|
||
|
so it contains the FFE plus the @code{gcc} back end (GBE),
|
||
|
the latter to do most of the optimization, and the code generation.
|
||
|
|
||
|
The file @file{parse.c} is the source file for @code{yyparse()},
|
||
|
which is invoked by the GBE to start the compilation process,
|
||
|
for @file{f771}.
|
||
|
|
||
|
The file @file{top.c} contains the top-level FFE function @code{ffe_file}
|
||
|
and it (along with top.h) define all @samp{ffe_[a-z].*}, @samp{ffe[A-Z].*},
|
||
|
and @samp{FFE_[A-Za-z].*} symbols.
|
||
|
|
||
|
The file @file{fini.c} is a @code{main()} program that is used when building
|
||
|
the FFE to generate C header and source files for recognizing keywords.
|
||
|
The files @file{malloc.c} and @file{malloc.h} comprise a memory manager
|
||
|
that defines all @samp{malloc_[a-z].*}, @samp{malloc[A-Z].*}, and
|
||
|
@samp{MALLOC_[A-Za-z].*} symbols.
|
||
|
|
||
|
All other modules named @var{xyz}
|
||
|
are comprised of all files named @samp{@var{xyz}*.@var{ext}}
|
||
|
and define all @samp{ffe@var{xyz}_[a-z].*}, @samp{ffe@var{xyz}[A-Z].*},
|
||
|
and @samp{FFE@var{XYZ}_[A-Za-z].*} symbols.
|
||
|
If you understand all this, congratulations---it's easier for me to remember
|
||
|
how it works than to type in these regular expressions.
|
||
|
But it does make it easy to find where a symbol is defined.
|
||
|
For example, the symbol @samp{ffexyz_set_something} would be defined
|
||
|
in @file{xyz.h} and implemented there (if it's a macro) or in @file{xyz.c}.
|
||
|
|
||
|
The ``porting'' files of note currently are:
|
||
|
|
||
|
@table @file
|
||
|
@item proj.c
|
||
|
@itemx proj.h
|
||
|
This defines the ``language'' used by all the other source files,
|
||
|
the language being Standard C plus some useful things
|
||
|
like @code{ARRAY_SIZE} and such.
|
||
|
|
||
|
@item target.c
|
||
|
@itemx target.h
|
||
|
These describe the target machine
|
||
|
in terms of what data types are supported,
|
||
|
how they are denoted
|
||
|
(to what C type does an @code{INTEGER*8} map, for example),
|
||
|
how to convert between them,
|
||
|
and so on.
|
||
|
Over time, versions of @code{g77} rely less on this file
|
||
|
and more on run-time configuration based on GBE info
|
||
|
in @file{com.c}.
|
||
|
|
||
|
@item com.c
|
||
|
@itemx com.h
|
||
|
These are the primary interface to the GBE.
|
||
|
|
||
|
@item ste.c
|
||
|
@itemx ste.h
|
||
|
This contains code for implementing recognized executable statements
|
||
|
in the GBE.
|
||
|
|
||
|
@item src.c
|
||
|
@itemx src.h
|
||
|
These contain information on the format(s) of source files
|
||
|
(such as whether they are never to be processed as case-insensitive
|
||
|
with regard to Fortran keywords).
|
||
|
@end table
|
||
|
|
||
|
If you want to debug the @file{f771} executable,
|
||
|
for example if it crashes,
|
||
|
note that the global variables @code{lineno} and @code{input_filename}
|
||
|
are usually set to reflect the current line being read by the lexer
|
||
|
during the first-pass analysis of a program unit and to reflect
|
||
|
the current line being processed during the second-pass compilation
|
||
|
of a program unit.
|
||
|
|
||
|
If an invocation of the function @code{ffestd_exec_end} is on the stack,
|
||
|
the compiler is in the second pass, otherwise it is in the first.
|
||
|
|
||
|
(This information might help you reduce a test case and/or work around
|
||
|
a bug in @code{g77} until a fix is available.)
|
||
|
|
||
|
@node Overview of Translation Process
|
||
|
@section Overview of Translation Process
|
||
|
|
||
|
The order of phases translating source code to the form accepted
|
||
|
by the GBE is:
|
||
|
|
||
|
@enumerate
|
||
|
@item
|
||
|
Stripping punched-card sources (@file{g77stripcard.c})
|
||
|
|
||
|
@item
|
||
|
Lexing (@file{lex.c})
|
||
|
|
||
|
@item
|
||
|
Stand-alone statement identification (@file{sta.c})
|
||
|
|
||
|
@item
|
||
|
Parsing (@file{stb.c} and @file{expr.c})
|
||
|
|
||
|
@item
|
||
|
Constructing (@file{stc.c})
|
||
|
|
||
|
@item
|
||
|
Collecting (@file{std.c})
|
||
|
|
||
|
@item
|
||
|
Expanding (@file{ste.c})
|
||
|
@end enumerate
|
||
|
|
||
|
To get a rough idea of how a particularly twisted Fortran statement
|
||
|
gets treated by the passes, consider:
|
||
|
|
||
|
@smallexample
|
||
|
FORMAT(I2 4H)=(J/
|
||
|
& I3)
|
||
|
@end smallexample
|
||
|
|
||
|
The job of @file{lex.c} is to know enough about Fortran syntax rules
|
||
|
to break the statement up into distinct lexemes without requiring
|
||
|
any feedback from subsequent phases:
|
||
|
|
||
|
@smallexample
|
||
|
`FORMAT'
|
||
|
`('
|
||
|
`I24H'
|
||
|
`)'
|
||
|
`='
|
||
|
`('
|
||
|
`J'
|
||
|
`/'
|
||
|
`I3'
|
||
|
`)'
|
||
|
@end smallexample
|
||
|
|
||
|
The job of @file{sta.c} is to figure out the kind of statement,
|
||
|
or, at least, statement form, that sequence of lexemes represent.
|
||
|
|
||
|
The sooner it can do this (in terms of using the smallest number of
|
||
|
lexemes, starting with the first for each statement), the better,
|
||
|
because that leaves diagnostics for problems beyond the recognition
|
||
|
of the statement form to subsequent phases,
|
||
|
which can usually better describe the nature of the problem.
|
||
|
|
||
|
In this case, the @samp{=} at ``level zero''
|
||
|
(not nested within parentheses)
|
||
|
tells @file{sta.c} that this is an @emph{assignment-form},
|
||
|
not @code{FORMAT}, statement.
|
||
|
|
||
|
An assignment-form statement might be a statement-function
|
||
|
definition or an executable assignment statement.
|
||
|
|
||
|
To make that determination,
|
||
|
@file{sta.c} looks at the first two lexemes.
|
||
|
|
||
|
Since the second lexeme is @samp{(},
|
||
|
the first must represent an array for this to be an assignment statement,
|
||
|
else it's a statement function.
|
||
|
|
||
|
Either way, @file{sta.c} hands off the statement to @file{stb.c}
|
||
|
(either its statement-function parser or its assignment-statement parser).
|
||
|
|
||
|
@file{stb.c} forms a
|
||
|
statement-specific record containing the pertinent information.
|
||
|
That information includes a source expression and,
|
||
|
for an assignment statement, a destination expression.
|
||
|
Expressions are parsed by @file{expr.c}.
|
||
|
|
||
|
This record is passed to @file{stc.c},
|
||
|
which copes with the implications of the statement
|
||
|
within the context established by previous statements.
|
||
|
|
||
|
For example, if it's the first statement in the file
|
||
|
or after an @code{END} statement,
|
||
|
@file{stc.c} recognizes that, first of all,
|
||
|
a main program unit is now being lexed
|
||
|
(and tells that to @file{std.c}
|
||
|
before telling it about the current statement).
|
||
|
|
||
|
@file{stc.c} attaches whatever information it can,
|
||
|
usually derived from the context established by the preceding statements,
|
||
|
and passes the information to @file{std.c}.
|
||
|
|
||
|
@file{std.c} saves this information away,
|
||
|
since the GBE cannot cope with information
|
||
|
that might be incomplete at this stage.
|
||
|
|
||
|
For example, @samp{I3} might later be determined
|
||
|
to be an argument to an alternate @code{ENTRY} point.
|
||
|
|
||
|
When @file{std.c} is told about the end of an external (top-level)
|
||
|
program unit,
|
||
|
it passes all the information it has saved away
|
||
|
on statements in that program unit
|
||
|
to @file{ste.c}.
|
||
|
|
||
|
@file{ste.c} ``expands'' each statement, in sequence, by
|
||
|
constructing the appropriate GBE information and calling
|
||
|
the appropriate GBE routines.
|
||
|
|
||
|
Details on the transformational phases follow.
|
||
|
Keep in mind that Fortran numbering is used,
|
||
|
so the first character on a line is column 1,
|
||
|
decimal numbering is used, and so on.
|
||
|
|
||
|
@menu
|
||
|
* g77stripcard::
|
||
|
* lex.c::
|
||
|
* sta.c::
|
||
|
* stb.c::
|
||
|
* expr.c::
|
||
|
* stc.c::
|
||
|
* std.c::
|
||
|
* ste.c::
|
||
|
|
||
|
* Gotchas (Transforming)::
|
||
|
* TBD (Transforming)::
|
||
|
@end menu
|
||
|
|
||
|
@node g77stripcard
|
||
|
@subsection g77stripcard
|
||
|
|
||
|
The @code{g77stripcard} program handles removing content beyond
|
||
|
column 72 (adjustable via a command-line option),
|
||
|
optionally warning about that content being something other
|
||
|
than trailing whitespace or Fortran commentary.
|
||
|
|
||
|
This program is needed because @code{lex.c} doesn't pay attention
|
||
|
to maximum line lengths at all, to make it easier to maintain,
|
||
|
as well as faster (for sources that don't depend on the maximum
|
||
|
column length vis-a-vis trailing non-blank non-commentary content).
|
||
|
|
||
|
Just how this program will be run---whether automatically for
|
||
|
old source (perhaps as the default for @file{.f} files?)---is not
|
||
|
yet determined.
|
||
|
|
||
|
In the meantime, it might as well be implemented as a typical UNIX pipe.
|
||
|
|
||
|
It should accept a @samp{-fline-length-@var{n}} option,
|
||
|
with the default line length set to 72.
|
||
|
|
||
|
When the text it strips off the end of a line is not blank
|
||
|
(not spaces and tabs),
|
||
|
it should insert an additional comment line
|
||
|
(beginning with @samp{!},
|
||
|
so it works for both fixed-form and free-form files)
|
||
|
containing the text,
|
||
|
following the stripped line.
|
||
|
The inserted comment should have a prefix of some kind,
|
||
|
TBD, that distinguishes the comment as representing stripped text.
|
||
|
Users could use that to @code{sed} out such lines, if they wished---it
|
||
|
seems silly to provide a command-line option to delete information
|
||
|
when it can be so easily filtered out by another program.
|
||
|
|
||
|
(This inserted comment should be designed to ``fit in'' well
|
||
|
with whatever the Fortran community is using these days for
|
||
|
preprocessor, translator, and other such products, like OpenMP.
|
||
|
What that's all about, and how @code{g77} can elegantly fit its
|
||
|
special comment conventions into it all, is TBD as well.
|
||
|
We don't want to reinvent the wheel here, but if there turn out
|
||
|
to be too many conflicting conventions, we might have to invent
|
||
|
one that looks nothing like the others, but which offers their
|
||
|
host products a better infrastructure in which to fit and coexist
|
||
|
peacefully.)
|
||
|
|
||
|
@code{g77stripcard} probably shouldn't do any tab expansion or other
|
||
|
fancy stuff.
|
||
|
People can use @code{expand} or other pre-filtering if they like.
|
||
|
The idea here is to keep each stage quite simple, while providing
|
||
|
excellent performance for ``normal'' code.
|
||
|
|
||
|
(Code with junk beyond column 73 is not really ``normal'',
|
||
|
as it comes from a card-punch heritage,
|
||
|
and will be increasingly hard for tomorrow's Fortran programmers to read.)
|
||
|
|
||
|
@node lex.c
|
||
|
@subsection lex.c
|
||
|
|
||
|
To help make the lexer simple, fast, and easy to maintain,
|
||
|
while also having @code{g77} generally encourage Fortran programmers
|
||
|
to write simple, maintainable, portable code by maximizing the
|
||
|
performance of compiling that kind of code:
|
||
|
|
||
|
@itemize @bullet
|
||
|
@item
|
||
|
There'll be just one lexer, for both fixed-form and free-form source.
|
||
|
|
||
|
@item
|
||
|
It'll care about the form only when handling the first 7 columns of
|
||
|
text, stuff like spaces between strings of alphanumerics, and
|
||
|
how lines are continued.
|
||
|
|
||
|
Some other distinctions will be handled by subsequent phases,
|
||
|
so at least one of them will have to know which form is involved.
|
||
|
|
||
|
For example, @samp{I = 2 . 4} is acceptable in fixed form,
|
||
|
and works in free form as well given the implementation @code{g77}
|
||
|
presently uses.
|
||
|
But the standard requires a diagnostic for it in free form,
|
||
|
so the parser has to be able to recognize that
|
||
|
the lexemes aren't contiguous
|
||
|
(information the lexer @emph{does} have to provide)
|
||
|
and that free-form source is being parsed,
|
||
|
so it can provide the diagnostic.
|
||
|
|
||
|
The @code{g77} lexer doesn't try to gather @samp{2 . 4} into a single lexeme.
|
||
|
Otherwise, it'd have to know a whole lot more about how to parse Fortran,
|
||
|
or subsequent phases (mainly parsing) would have two paths through
|
||
|
lots of critical code---one to handle the lexeme @samp{2}, @samp{.},
|
||
|
and @samp{4} in sequence, another to handle the lexeme @samp{2.4}.
|
||
|
|
||
|
@item
|
||
|
It won't worry about line lengths
|
||
|
(beyond the first 7 columns for fixed-form source).
|
||
|
|
||
|
That is, once it starts parsing the ``statement'' part of a line
|
||
|
(column 7 for fixed-form, column 1 for free-form),
|
||
|
it'll keep going until it finds a newline,
|
||
|
rather than ignoring everything past a particular column
|
||
|
(72 or 132).
|
||
|
|
||
|
The implication here is that there shouldn't @emph{be}
|
||
|
anything past that last column, other than whitespace or
|
||
|
commentary, because users using typical editors
|
||
|
(or viewing output as typically printed)
|
||
|
won't necessarily know just where the last column is.
|
||
|
|
||
|
Code that has ``garbage'' beyond the last column
|
||
|
(almost certainly only fixed-form code with a punched-card legacy,
|
||
|
such as code using columns 73-80 for ``sequence numbers'')
|
||
|
will have to be run through @code{g77stripcard} first.
|
||
|
|
||
|
Also, keeping track of the maximum column position while also watching out
|
||
|
for the end of a line @emph{and} while reading from a file
|
||
|
just makes things slower.
|
||
|
Since a file must be read, and watching for the end of the line
|
||
|
is necessary (unless the typical input file was preprocessed to
|
||
|
include the necessary number of trailing spaces),
|
||
|
dropping the tracking of the maximum column position
|
||
|
is the only way to reduce the complexity of the pertinent code
|
||
|
while maintaining high performance.
|
||
|
|
||
|
@item
|
||
|
ASCII encoding is assumed for the input file.
|
||
|
|
||
|
Code written in other character sets will have to be converted first.
|
||
|
|
||
|
@item
|
||
|
Tabs (ASCII code 9)
|
||
|
will be converted to spaces via the straightforward
|
||
|
approach.
|
||
|
|
||
|
Specifically, a tab is converted to between one and eight spaces
|
||
|
as necessary to reach column @var{n},
|
||
|
where dividing @samp{(@var{n} - 1)} by eight
|
||
|
results in a remainder of zero.
|
||
|
|
||
|
@item
|
||
|
Linefeeds (ASCII code 10)
|
||
|
mark the ends of lines.
|
||
|
|
||
|
@item
|
||
|
A carriage return (ASCII code 13)
|
||
|
is accept if it immediately precedes a linefeed,
|
||
|
in which case it is ignored.
|
||
|
|
||
|
Otherwise, it is rejected (with a diagnostic).
|
||
|
|
||
|
@item
|
||
|
Any other characters other than the above
|
||
|
that are not part of the GNU Fortran Character Set
|
||
|
(@pxref{Character Set})
|
||
|
are rejected with a diagnostic.
|
||
|
|
||
|
This includes backspaces, form feeds, and the like.
|
||
|
|
||
|
(It might make sense to allow a form feed in column 1
|
||
|
as long as that's the only character on a line.
|
||
|
It certainly wouldn't seem to cost much in terms of performance.)
|
||
|
|
||
|
@item
|
||
|
The end of the input stream (EOF)
|
||
|
ends the current line.
|
||
|
|
||
|
@item
|
||
|
The distinction between uppercase and lowercase letters
|
||
|
will be preserved.
|
||
|
|
||
|
It will be up to subsequent phases to decide to fold case.
|
||
|
|
||
|
Current plans are to permit any casing for Fortran (reserved) keywords
|
||
|
while preserving casing for user-defined names.
|
||
|
(This might not be made the default for @file{.f} files, though.)
|
||
|
|
||
|
Preserving case seems necessary to provide more direct access
|
||
|
to facilities outside of @code{g77}, such as to C or Pascal code.
|
||
|
|
||
|
Names of intrinsics will probably be matchable in any case,
|
||
|
However, there probably won't be any option to require
|
||
|
a particular mixed-case appearance of intrinsics
|
||
|
(as there was for @code{g77} prior to version 0.6),
|
||
|
because that's painful to maintain,
|
||
|
and probably nobody uses it.
|
||
|
|
||
|
(How @samp{external SiN; r = sin(x)} would be handled is TBD.
|
||
|
I think old @code{g77} might already handle that pretty elegantly,
|
||
|
but whether we can cope with allowing the same fragment to reference
|
||
|
a @emph{different} procedure, even with the same interface,
|
||
|
via @samp{s = SiN(r)}, needs to be determined.
|
||
|
If it can't, we need to make sure that when code introduces
|
||
|
a user-defined name, any intrinsic matching that name
|
||
|
using a case-insensitive comparison
|
||
|
is ``turned off''.)
|
||
|
|
||
|
@item
|
||
|
Backslashes in @code{CHARACTER} and Hollerith constants
|
||
|
are not allowed.
|
||
|
|
||
|
This avoids the confusion introduced by some Fortran compiler vendors
|
||
|
providing C-like interpretation of backslashes,
|
||
|
while others provide straight-through interpretation.
|
||
|
|
||
|
Some kind of lexical construct (TBD) will be provided to allow
|
||
|
flagging of a @code{CHARACTER}
|
||
|
(but probably not a Hollerith)
|
||
|
constant that permits backslashes.
|
||
|
It'll necessarily be a prefix, such as:
|
||
|
|
||
|
@smallexample
|
||
|
PRINT *, C'This line has a backspace \b here.'
|
||
|
PRINT *, F'This line has a straight backslash \ here.'
|
||
|
@end smallexample
|
||
|
|
||
|
Further, command-line options might be provided to specify that
|
||
|
one prefix or the other is to be assumed as the default
|
||
|
for @code{CHARACTER} constants.
|
||
|
|
||
|
However, it seems more helpful for @code{g77} to provide a program
|
||
|
that converts prefix all constants
|
||
|
(or just those containing backslashes)
|
||
|
with the desired designation,
|
||
|
so printouts of code can be read
|
||
|
without knowing the compile-time options used when compiling it.
|
||
|
|
||
|
If such a program is provided
|
||
|
(let's name it @code{g77slash} for now),
|
||
|
then a command-line option to @code{g77} should not be provided.
|
||
|
(Though, given that it'll be easy to implement, it might be hard
|
||
|
to resist user requests for it ``to compile faster than if we
|
||
|
have to invoke another filter''.)
|
||
|
|
||
|
This program would take a command-line option to specify the
|
||
|
default interpretation of slashes,
|
||
|
affecting which prefix it uses for constants.
|
||
|
|
||
|
@code{g77slash} probably should automatically convert Hollerith
|
||
|
constants that contain slashes
|
||
|
to the appropriate @code{CHARACTER} constants.
|
||
|
Then @code{g77} wouldn't have to define a prefix syntax for Hollerith
|
||
|
constants specifying whether they want C-style or straight-through
|
||
|
backslashes.
|
||
|
@end itemize
|
||
|
|
||
|
The above implements nearly exactly what is specified by
|
||
|
@ref{Character Set},
|
||
|
and
|
||
|
@ref{Lines},
|
||
|
except it also provides automatic conversion of tabs
|
||
|
and ignoring of newline-related carriage returns.
|
||
|
|
||
|
It also effects the ``pure visual'' model,
|
||
|
by which is meant that a user viewing his code
|
||
|
in a typical text editor
|
||
|
(assuming it's not preprocessed via @code{g77stripcard} or similar)
|
||
|
doesn't need any special knowledge
|
||
|
of whether spaces on the screen are really tabs,
|
||
|
whether lines end immediately after the last visible non-space character
|
||
|
or after a number of spaces and tabs that follow it,
|
||
|
or whether the last line in the file is ended by a newline.
|
||
|
|
||
|
Most editors don't make these distinctions,
|
||
|
the ANSI FORTRAN 77 standard doesn't require them to,
|
||
|
and it permits a standard-conforming compiler
|
||
|
to define a method for transforming source code to
|
||
|
``standard form'' however it wants.
|
||
|
|
||
|
So, GNU Fortran defines it such that users have the best chance
|
||
|
of having the code be interpreted the way it looks on the screen
|
||
|
of the typical editor.
|
||
|
|
||
|
(Fancy editors should @emph{never} be required to correctly read code
|
||
|
written in classic two-dimensional-plaintext form.
|
||
|
By correct reading I mean ability to read it, book-like, without
|
||
|
mistaking text ignored by the compiler for program code and vice versa,
|
||
|
and without having to count beyond the first several columns.
|
||
|
The vague meaning of ASCII TAB, among other things, complicates
|
||
|
this somewhat, but as long as ``everyone'', including the editor,
|
||
|
other tools, and printer, agrees about the every-eighth-column convention,
|
||
|
the GNU Fortran ``pure visual'' model meets these requirements.
|
||
|
Any language or user-visible source form
|
||
|
requiring special tagging of tabs,
|
||
|
the ends of lines after spaces/tabs,
|
||
|
and so on, is broken by this definition.
|
||
|
Fortunately, Fortran @emph{itself} is not broken,
|
||
|
even if most vendor-supplied defaults for their Fortran compilers @emph{are}
|
||
|
in this regard.)
|
||
|
|
||
|
Further, this model provides a clean interface
|
||
|
to whatever preprocessors or code-generators are used
|
||
|
to produce input to this phase of @code{g77}.
|
||
|
Mainly, they need not worry about long lines.
|
||
|
|
||
|
@node sta.c
|
||
|
@subsection sta.c
|
||
|
|
||
|
@node stb.c
|
||
|
@subsection stb.c
|
||
|
|
||
|
@node expr.c
|
||
|
@subsection expr.c
|
||
|
|
||
|
@node stc.c
|
||
|
@subsection stc.c
|
||
|
|
||
|
@node std.c
|
||
|
@subsection std.c
|
||
|
|
||
|
@node ste.c
|
||
|
@subsection ste.c
|
||
|
|
||
|
@node Gotchas (Transforming)
|
||
|
@subsection Gotchas (Transforming)
|
||
|
|
||
|
This section is not about transforming ``gotchas'' into something else.
|
||
|
It is about the weirder aspects of transforming Fortran,
|
||
|
however that's defined,
|
||
|
into a more modern, canonical form.
|
||
|
|
||
|
@subsubsection Multi-character Lexemes
|
||
|
|
||
|
Each lexeme carries with it a pointer to where it appears in the source.
|
||
|
|
||
|
To provide the ability for diagnostics to point to column numbers,
|
||
|
in addition to line numbers and names,
|
||
|
lexemes that represent more than one (significant) character
|
||
|
in the source code need, generally,
|
||
|
to provide pointers to where each @emph{character} appears in the source.
|
||
|
|
||
|
This provides the ability to properly identify the precise location
|
||
|
of the problem in code like
|
||
|
|
||
|
@smallexample
|
||
|
SUBROUTINE X
|
||
|
END
|
||
|
BLOCK DATA X
|
||
|
END
|
||
|
@end smallexample
|
||
|
|
||
|
which, in fixed-form source, would result in single lexemes
|
||
|
consisting of the strings @samp{SUBROUTINEX} and @samp{BLOCKDATAX}.
|
||
|
(The problem is that @samp{X} is defined twice,
|
||
|
so a pointer to the @samp{X} in the second definition,
|
||
|
as well as a follow-up pointer to the corresponding pointer in the first,
|
||
|
would be preferable to pointing to the beginnings of the statements.)
|
||
|
|
||
|
This need also arises when parsing (and diagnosing) @code{FORMAT}
|
||
|
statements.
|
||
|
|
||
|
Further, it arises when diagnosing
|
||
|
@code{FMT=} specifiers that contain constants
|
||
|
(or partial constants, or even propagated constants!)
|
||
|
in I/O statements, as in:
|
||
|
|
||
|
@smallexample
|
||
|
PRINT '(I2, 3HAB)', J
|
||
|
@end smallexample
|
||
|
|
||
|
(A pointer to the beginning of the prematurely-terminated Hollerith
|
||
|
constant, and/or to the close parenthese, is preferable to a pointer
|
||
|
to the open-parenthese or the apostrophe that precedes it.)
|
||
|
|
||
|
Multi-character lexemes, which would seem to naturally include
|
||
|
at least digit strings, alphanumeric strings, @code{CHARACTER}
|
||
|
constants, and Hollerith constants, therefore need to provide
|
||
|
location information on each character.
|
||
|
(Maybe Hollerith constants don't, but it's unnecessary to except them.)
|
||
|
|
||
|
The question then arises, what about @emph{other} multi-character lexemes,
|
||
|
such as @samp{**} and @samp{//},
|
||
|
and Fortran 90's @samp{(/}, @samp{/)}, @samp{::}, and so on?
|
||
|
|
||
|
Turns out there's a need to identify the location of the second character
|
||
|
of these two-character lexemes.
|
||
|
For example, in @samp{I(/J) = K}, the slash needs to be diagnosed
|
||
|
as the problem, not the open parenthese.
|
||
|
Similarly, it is preferable to diagnose the second slash in
|
||
|
@samp{I = J // K} rather than the first, given the implicit typing
|
||
|
rules, which would result in the compiler disallowing the attempted
|
||
|
concatenation of two integers.
|
||
|
(Though, since that's more of a semantic issue,
|
||
|
it's not @emph{that} much preferable.)
|
||
|
|
||
|
Even sequences that could be parsed as digit strings could use location info,
|
||
|
for example, to diagnose the @samp{9} in the octal constant @samp{O'129'}.
|
||
|
(This probably will be parsed as a character string,
|
||
|
to be consistent with the parsing of @samp{Z'129A'}.)
|
||
|
|
||
|
To avoid the hassle of recording the location of the second character,
|
||
|
while also preserving the general rule that each significant character
|
||
|
is distinctly pointed to by the lexeme that contains it,
|
||
|
it's best to simply not have any fixed-size lexemes
|
||
|
larger than one character.
|
||
|
|
||
|
This new design is expected to make checking for two
|
||
|
@samp{*} lexemes in a row much easier than the old design,
|
||
|
so this is not much of a sacrifice.
|
||
|
It probably makes the lexer much easier to implement
|
||
|
than it makes the parser harder.
|
||
|
|
||
|
@subsubsection Space-padding Lexemes
|
||
|
|
||
|
Certain lexemes need to be padded with virtual spaces when the
|
||
|
end of the line (or file) is encountered.
|
||
|
|
||
|
This is necessary in fixed form, to handle lines that don't
|
||
|
extend to column 72, assuming that's the line length in effect.
|
||
|
|
||
|
@subsubsection Bizarre Free-form Hollerith Constants
|
||
|
|
||
|
Last I checked, the Fortran 90 standard actually required the compiler
|
||
|
to silently accept something like
|
||
|
|
||
|
@smallexample
|
||
|
FORMAT ( 1 2 Htwelve chars )
|
||
|
@end smallexample
|
||
|
|
||
|
as a valid @code{FORMAT} statement specifying a twelve-character
|
||
|
Hollerith constant.
|
||
|
|
||
|
The implication here is that, since the new lexer is a zero-feedback one,
|
||
|
it won't know that the special case of a @code{FORMAT} statement being parsed
|
||
|
requires apparently distinct lexemes @samp{1} and @samp{2} to be treated as
|
||
|
a single lexeme.
|
||
|
|
||
|
(This is a horrible misfeature of the Fortran 90 language.
|
||
|
It's one of many such misfeatures that almost make me want
|
||
|
to not support them, and forge ahead with designing a new
|
||
|
``GNU Fortran'' language that has the features,
|
||
|
but not the misfeatures, of Fortran 90,
|
||
|
and provide utility programs to do the conversion automatically.)
|
||
|
|
||
|
So, the lexer must gather distinct chunks of decimal strings into
|
||
|
a single lexeme in contexts where a single decimal lexeme might
|
||
|
start a Hollerith constant.
|
||
|
|
||
|
(Which probably means it might as well do that all the time
|
||
|
for all multi-character lexemes, even in free-form mode,
|
||
|
leaving it to subsequent phases to pull them apart as they see fit.)
|
||
|
|
||
|
Compare the treatment of this to how
|
||
|
|
||
|
@smallexample
|
||
|
CHARACTER * 4 5 HEY
|
||
|
@end smallexample
|
||
|
|
||
|
and
|
||
|
|
||
|
@smallexample
|
||
|
CHARACTER * 12 HEY
|
||
|
@end smallexample
|
||
|
|
||
|
must be treated---the former must be diagnosed, due to the separation
|
||
|
between lexemes, the latter must be accepted as a proper declaration.
|
||
|
|
||
|
@subsubsection Hollerith Constants
|
||
|
|
||
|
Recognizing a Hollerith constant---specifically,
|
||
|
that an @samp{H} or @samp{h} after a digit string begins
|
||
|
such a constant---requires some knowledge of context.
|
||
|
|
||
|
Hollerith constants (such as @samp{2HAB}) can appear after:
|
||
|
|
||
|
@itemize @bullet
|
||
|
@item
|
||
|
@samp{(}
|
||
|
|
||
|
@item
|
||
|
@samp{,}
|
||
|
|
||
|
@item
|
||
|
@samp{=}
|
||
|
|
||
|
@item
|
||
|
@samp{+}, @samp{-}, @samp{/}
|
||
|
|
||
|
@item
|
||
|
@samp{*}, except as noted below
|
||
|
@end itemize
|
||
|
|
||
|
Hollerith constants don't appear after:
|
||
|
|
||
|
@itemize @bullet
|
||
|
@item
|
||
|
@samp{CHARACTER*},
|
||
|
which can be treated generally as
|
||
|
any @samp{*} that is the second lexeme of a statement
|
||
|
@end itemize
|
||
|
|
||
|
@subsubsection Confusing Function Keyword
|
||
|
|
||
|
While
|
||
|
|
||
|
@smallexample
|
||
|
REAL FUNCTION FOO ()
|
||
|
@end smallexample
|
||
|
|
||
|
must be a @code{FUNCTION} statement and
|
||
|
|
||
|
@smallexample
|
||
|
REAL FUNCTION FOO (5)
|
||
|
@end smallexample
|
||
|
|
||
|
must be a type-definition statement,
|
||
|
|
||
|
@smallexample
|
||
|
REAL FUNCTION FOO (@var{names})
|
||
|
@end smallexample
|
||
|
|
||
|
where @var{names} is a comma-separated list of names,
|
||
|
can be one or the other.
|
||
|
|
||
|
The only way to disambiguate that statement
|
||
|
(short of mandating free-form source or a short maximum
|
||
|
length for name for external procedures)
|
||
|
is based on the context of the statement.
|
||
|
|
||
|
In particular, the statement is known to be within an
|
||
|
already-started program unit
|
||
|
(but not at the outer level of the @code{CONTAINS} block),
|
||
|
it is a type-declaration statement.
|
||
|
|
||
|
Otherwise, the statement is a @code{FUNCTION} statement,
|
||
|
in that it begins a function program unit
|
||
|
(external, or, within @code{CONTAINS}, nested).
|
||
|
|
||
|
@subsubsection Weird READ
|
||
|
|
||
|
The statement
|
||
|
|
||
|
@smallexample
|
||
|
READ (N)
|
||
|
@end smallexample
|
||
|
|
||
|
is equivalent to either
|
||
|
|
||
|
@smallexample
|
||
|
READ (UNIT=(N))
|
||
|
@end smallexample
|
||
|
|
||
|
or
|
||
|
|
||
|
@smallexample
|
||
|
READ (FMT=(N))
|
||
|
@end smallexample
|
||
|
|
||
|
depending on which would be valid in context.
|
||
|
|
||
|
Specifically, if @samp{N} is type @code{INTEGER},
|
||
|
@samp{READ (FMT=(N))} would not be valid,
|
||
|
because parentheses may not be used around @samp{N},
|
||
|
whereas they may around it in @samp{READ (UNIT=(N))}.
|
||
|
|
||
|
Further, if @samp{N} is type @code{CHARACTER},
|
||
|
the opposite is true---@samp{READ (UNIT=(N))} is not valid,
|
||
|
but @samp{READ (FMT=(N))} is.
|
||
|
|
||
|
Strictly speaking, if anything follows
|
||
|
|
||
|
@smallexample
|
||
|
READ (N)
|
||
|
@end smallexample
|
||
|
|
||
|
in the statement, whether the first lexeme after the close
|
||
|
parenthese is a comma could be used to disambiguate the two cases,
|
||
|
without looking at the type of @samp{N},
|
||
|
because the comma is required for the @samp{READ (FMT=(N))}
|
||
|
interpretation and disallowed for the @samp{READ (UNIT=(N))}
|
||
|
interpretation.
|
||
|
|
||
|
However, in practice, many Fortran compilers allow
|
||
|
the comma for the @samp{READ (UNIT=(N))}
|
||
|
interpretation anyway
|
||
|
(in that they generally allow a leading comma before
|
||
|
an I/O list in an I/O statement),
|
||
|
and much code takes advantage of this allowance.
|
||
|
|
||
|
(This is quite a reasonable allowance, since the
|
||
|
juxtaposition of a comma-separated list immediately
|
||
|
after an I/O control-specification list, which is also comma-separated,
|
||
|
without an intervening comma,
|
||
|
looks sufficiently ``wrong'' to programmers
|
||
|
that they can't resist the itch to insert the comma.
|
||
|
@samp{READ (I, J), K, L} simply looks cleaner than
|
||
|
@samp{READ (I, J) K, L}.)
|
||
|
|
||
|
So, type-based disambiguation is needed unless strict adherence
|
||
|
to the standard is always assumed, and we're not going to assume that.
|
||
|
|
||
|
@node TBD (Transforming)
|
||
|
@subsection TBD (Transforming)
|
||
|
|
||
|
Continue researching gotchas, designing the transformational process,
|
||
|
and implementing it.
|
||
|
|
||
|
Specific issues to resolve:
|
||
|
|
||
|
@itemize @bullet
|
||
|
@item
|
||
|
Just where should @code{INCLUDE} processing take place?
|
||
|
|
||
|
Clearly before (or part of) statement identification (@file{sta.c}),
|
||
|
since determining whether @samp{I(J)=K} is a statement-function
|
||
|
definition or an assignment statement requires knowing the context,
|
||
|
which in turn requires having processed @code{INCLUDE} files.
|
||
|
|
||
|
@item
|
||
|
Just where should (if it was implemented) @code{USE} processing take place?
|
||
|
|
||
|
This gets into the whole issue of how @code{g77} should handle the concept
|
||
|
of modules.
|
||
|
I think GNAT already takes on this issue, but don't know more than that.
|
||
|
Jim Giles has written extensively on @code{comp.lang.fortran}
|
||
|
about his opinions on module handling, as have others.
|
||
|
Jim's views should be taken into account.
|
||
|
|
||
|
Actually, Richard M. Stallman (RMS) also has written up
|
||
|
some guidelines for implementing such things,
|
||
|
but I'm not sure where I read them.
|
||
|
Perhaps the old @email{gcc2@@cygnus.com} list.
|
||
|
|
||
|
If someone could dig references to these up and get them to me,
|
||
|
that would be much appreciated!
|
||
|
Even though modules are not on the short-term list for implementation,
|
||
|
it'd be helpful to know @emph{now} how to avoid making them harder to
|
||
|
implement them @emph{later}.
|
||
|
|
||
|
@item
|
||
|
Should the @code{g77} command become just a script that invokes
|
||
|
all the various preprocessing that might be needed,
|
||
|
thus making it seem slower than necessary for legacy code
|
||
|
that people are unwilling to convert,
|
||
|
or should we provide a separate script for that,
|
||
|
thus encouraging people to convert their code once and for all?
|
||
|
|
||
|
At least, a separate script to behave as old @code{g77} did,
|
||
|
perhaps named @code{g77old}, might ease the transition,
|
||
|
as might a corresponding one that converts source codes
|
||
|
named @code{g77oldnew}.
|
||
|
|
||
|
These scripts would take all the pertinent options @code{g77} used
|
||
|
to take and run the appropriate filters,
|
||
|
passing the results to @code{g77} or just making new sources out of them
|
||
|
(in a subdirectory, leaving the user to do the dirty deed of
|
||
|
moving or copying them over the old sources).
|
||
|
|
||
|
@item
|
||
|
Do other Fortran compilers provide a prefix syntax
|
||
|
to govern the treatment of backslashes in @code{CHARACTER}
|
||
|
(or Hollerith) constants?
|
||
|
|
||
|
Knowing what other compilers provide would help.
|
||
|
|
||
|
@item
|
||
|
Is it okay to drop support for the @samp{-fintrin-case-initcap},
|
||
|
@samp{-fmatch-case-initcap}, @samp{-fsymbol-case-initcap},
|
||
|
and @samp{-fcase-initcap} options?
|
||
|
|
||
|
I've asked @email{info-gnu-fortran@@gnu.org} for input on this.
|
||
|
Not having to support these makes it easier to write the new front end,
|
||
|
and might also avoid complicated its design.
|
||
|
@end itemize
|
||
|
|
||
|
@node Philosophy of Code Generation
|
||
|
@section Philosophy of Code Generation
|
||
|
|
||
|
Don't poke the bear.
|
||
|
|
||
|
The @code{g77} front end generates code
|
||
|
via the @code{gcc} back end.
|
||
|
|
||
|
@cindex GNU Back End (GBE)
|
||
|
@cindex GBE
|
||
|
@cindex @code{gcc}, back end
|
||
|
@cindex back end, gcc
|
||
|
@cindex code generator
|
||
|
The @code{gcc} back end (GBE) is a large, complex
|
||
|
labyrinth of intricate code
|
||
|
written in a combination of the C language
|
||
|
and specialized languages internal to @code{gcc}.
|
||
|
|
||
|
While the @emph{code} that implements the GBE
|
||
|
is written in a combination of languages,
|
||
|
the GBE itself is,
|
||
|
to the front end for a language like Fortran,
|
||
|
best viewed as a @emph{compiler}
|
||
|
that compiles its own, unique, language.
|
||
|
|
||
|
The GBE's ``source'', then, is written in this language,
|
||
|
which consists primarily of
|
||
|
a combination of calls to GBE functions
|
||
|
and @dfn{tree} nodes
|
||
|
(which are, themselves, created
|
||
|
by calling GBE functions).
|
||
|
|
||
|
So, the @code{g77} generates code by, in effect,
|
||
|
translating the Fortran code it reads
|
||
|
into a form ``written'' in the ``language''
|
||
|
of the @code{gcc} back end.
|
||
|
|
||
|
@cindex GBEL
|
||
|
@cindex GNU Back End Language (GBEL)
|
||
|
This language will heretofore be referred to as @dfn{GBEL},
|
||
|
for GNU Back End Language.
|
||
|
|
||
|
GBEL is an evolving language,
|
||
|
not fully specified in any published form
|
||
|
as of this writing.
|
||
|
It offers many facilities,
|
||
|
but its ``core'' facilities
|
||
|
are those that corresponding most directly
|
||
|
to those needed to support @code{gcc}
|
||
|
(compiling code written in GNU C).
|
||
|
|
||
|
The @code{g77} Fortran Front End (FFE)
|
||
|
is designed and implemented
|
||
|
to navigate the currents and eddies
|
||
|
of ongoing GBEL and @code{gcc} development
|
||
|
while also delivering on the potential
|
||
|
of an integrated FFE
|
||
|
(as compared to using a converter like @code{f2c}
|
||
|
and feeding the output into @code{gcc}).
|
||
|
|
||
|
Goals of the FFE's code-generation strategy include:
|
||
|
|
||
|
@itemize @bullet
|
||
|
@item
|
||
|
High likelihood of generation of correct code,
|
||
|
or, failing that, producing a fatal diagnostic or crashing.
|
||
|
|
||
|
@item
|
||
|
Generation of highly optimized code,
|
||
|
as directed by the user
|
||
|
via GBE-specific (versus @code{g77}-specific) constructs,
|
||
|
such as command-line options.
|
||
|
|
||
|
@item
|
||
|
Fast overall (FFE plus GBE) compilation.
|
||
|
|
||
|
@item
|
||
|
Preservation of source-level debugging information.
|
||
|
@end itemize
|
||
|
|
||
|
The strategies historically, and currently, used by the FFE
|
||
|
to achieve these goals include:
|
||
|
|
||
|
@itemize @bullet
|
||
|
@item
|
||
|
Use of GBEL constructs that most faithfully encapsulate
|
||
|
the semantics of Fortran.
|
||
|
|
||
|
@item
|
||
|
Avoidance of GBEL constructs that are so rarely used,
|
||
|
or limited to use in specialized situations not related to Fortran,
|
||
|
that their reliability and performance has not yet been established
|
||
|
as sufficient for use by the FFE.
|
||
|
|
||
|
@item
|
||
|
Flexible design, to readily accommodate changes to specific
|
||
|
code-generation strategies, perhaps governed by command-line options.
|
||
|
@end itemize
|
||
|
|
||
|
@cindex Bear-poking
|
||
|
@cindex Poking the bear
|
||
|
``Don't poke the bear'' somewhat summarizes the above strategies.
|
||
|
The GBE is the bear.
|
||
|
The FFE is designed and implemented to avoid poking it
|
||
|
in ways that are likely to just annoy it.
|
||
|
The FFE usually either tackles it head-on,
|
||
|
or avoids treating it in ways dissimilar to how
|
||
|
the @code{gcc} front end treats it.
|
||
|
|
||
|
For example, the FFE uses the native array facility in the back end
|
||
|
instead of the lower-level pointer-arithmetic facility
|
||
|
used by @code{gcc} when compiling @code{f2c} output).
|
||
|
Theoretically, this presents more opportunities for optimization,
|
||
|
faster compile times,
|
||
|
and the production of more faithful debugging information.
|
||
|
These benefits were not, however, immediately realized,
|
||
|
mainly because @code{gcc} itself makes little or no use
|
||
|
of the native array facility.
|
||
|
|
||
|
Complex arithmetic is a case study of the evolution of this strategy.
|
||
|
When originally implemented,
|
||
|
the GBEL had just evolved its own native complex-arithmetic facility,
|
||
|
so the FFE took advantage of that.
|
||
|
|
||
|
When porting @code{g77} to 64-bit systems,
|
||
|
it was discovered that the GBE didn't really
|
||
|
implement its native complex-arithmetic facility properly.
|
||
|
|
||
|
The short-term solution was to rewrite the FFE
|
||
|
to instead use the lower-level facilities
|
||
|
that'd be used by @code{gcc}-compiled code
|
||
|
(assuming that code, itself, didn't use the native complex type
|
||
|
provided, as an extension, by @code{gcc}),
|
||
|
since these were known to work,
|
||
|
and, in any case, if shown to not work,
|
||
|
would likely be rapidly fixed
|
||
|
(since they'd likely not work for vanilla C code in similar circumstances).
|
||
|
|
||
|
However, the rewrite accommodated the original, native approach as well
|
||
|
by offering a command-line option to select it over the emulated approach.
|
||
|
This allowed users, and especially GBE maintainers, to try out
|
||
|
fixes to complex-arithmetic support in the GBE
|
||
|
while @code{g77} continued to default to compiling more code correctly,
|
||
|
albeit producing (typically) slower executables.
|
||
|
|
||
|
As of April 1999, it appeared that the last few bugs
|
||
|
in the GBE's support of its native complex-arithmetic facility
|
||
|
were worked out.
|
||
|
The FFE was changed back to default to using that native facility,
|
||
|
leaving emulation as an option.
|
||
|
|
||
|
Other Fortran constructs---arrays, character strings,
|
||
|
complex division, @code{COMMON} and @code{EQUIVALENCE} aggregates,
|
||
|
and so on---involve issues similar to those pertaining to complex arithmetic.
|
||
|
|
||
|
So, it is possible that the history
|
||
|
of how the FFE handled complex arithmetic
|
||
|
will be repeated, probably in modified form
|
||
|
(and hopefully over shorter timeframes),
|
||
|
for some of these other facilities.
|
||
|
|
||
|
@node Two-pass Design
|
||
|
@section Two-pass Design
|
||
|
|
||
|
The FFE does not tell the GBE anything about a program unit
|
||
|
until after the last statement in that unit has been parsed.
|
||
|
(A program unit is a Fortran concept that corresponds, in the C world,
|
||
|
mostly closely to functions definitions in ISO C.
|
||
|
That is, a program unit in Fortran is like a top-level function in C.
|
||
|
Nested functions, found among the extensions offered by GNU C,
|
||
|
correspond roughly to Fortran's statement functions.)
|
||
|
|
||
|
So, while parsing the code in a program unit,
|
||
|
the FFE saves up all the information
|
||
|
on statements, expressions, names, and so on,
|
||
|
until it has seen the last statement.
|
||
|
|
||
|
At that point, the FFE revisits the saved information
|
||
|
(in what amounts to a second @dfn{pass} over the program unit)
|
||
|
to perform the actual translation of the program unit into GBEL,
|
||
|
ultimating in the generation of assembly code for it.
|
||
|
|
||
|
Some lookahead is performed during this second pass,
|
||
|
so the FFE could be viewed as a ``two-plus-pass'' design.
|
||
|
|
||
|
@menu
|
||
|
* Two-pass Code::
|
||
|
* Why Two Passes::
|
||
|
@end menu
|
||
|
|
||
|
@node Two-pass Code
|
||
|
@subsection Two-pass Code
|
||
|
|
||
|
Most of the code that turns the first pass (parsing)
|
||
|
into a second pass for code generation
|
||
|
is in @file{@value{path-g77}/std.c}.
|
||
|
|
||
|
It has external functions,
|
||
|
called mainly by siblings in @file{@value{path-g77}/stc.c},
|
||
|
that record the information on statements and expressions
|
||
|
in the order they are seen in the source code.
|
||
|
These functions save that information.
|
||
|
|
||
|
It also has an external function that revisits that information,
|
||
|
calling the siblings in @file{@value{path-g77}/ste.c},
|
||
|
which handles the actual code generation
|
||
|
(by generating GBEL code,
|
||
|
that is, by calling GBE routines
|
||
|
to represent and specify expressions, statements, and so on).
|
||
|
|
||
|
@node Why Two Passes
|
||
|
@subsection Why Two Passes
|
||
|
|
||
|
The need for two passes was not immediately evident
|
||
|
during the design and implementation of the code in the FFE
|
||
|
that was to produce GBEL.
|
||
|
Only after a few kludges,
|
||
|
to handle things like incorrectly-guessed @code{ASSIGN} label nature,
|
||
|
had been implemented,
|
||
|
did enough evidence pile up to make it clear
|
||
|
that @file{std.c} had to be introduced to intercept,
|
||
|
save, then revisit as part of a second pass,
|
||
|
the digested contents of a program unit.
|
||
|
|
||
|
Other such missteps have occurred during the evolution of the FFE,
|
||
|
because of the different goals of the FFE and the GBE.
|
||
|
|
||
|
Because the GBE's original, and still primary, goal
|
||
|
was to directly support the GNU C language,
|
||
|
the GBEL, and the GBE itself,
|
||
|
requires more complexity
|
||
|
on the part of most front ends
|
||
|
than it requires of @code{gcc}'s.
|
||
|
|
||
|
For example,
|
||
|
the GBEL offers an interface that permits the @code{gcc} front end
|
||
|
to implement most, or all, of the language features it supports,
|
||
|
without the front end having to
|
||
|
make use of non-user-defined variables.
|
||
|
(It's almost certainly the case that all of K&R C,
|
||
|
and probably ANSI C as well,
|
||
|
is handled by the @code{gcc} front end
|
||
|
without declaring such variables.)
|
||
|
|
||
|
The FFE, on the other hand, must resort to a variety of ``tricks''
|
||
|
to achieve its goals.
|
||
|
|
||
|
Consider the following C code:
|
||
|
|
||
|
@smallexample
|
||
|
int
|
||
|
foo (int a, int b)
|
||
|
@{
|
||
|
int c = 0;
|
||
|
|
||
|
if ((c = bar (c)) == 0)
|
||
|
goto done;
|
||
|
|
||
|
quux (c << 1);
|
||
|
|
||
|
done:
|
||
|
return c;
|
||
|
@}
|
||
|
@end smallexample
|
||
|
|
||
|
Note what kinds of objects are declared, or defined, before their use,
|
||
|
and before any actual code generation involving them
|
||
|
would normally take place:
|
||
|
|
||
|
@itemize @bullet
|
||
|
@item
|
||
|
Return type of function
|
||
|
|
||
|
@item
|
||
|
Entry point(s) of function
|
||
|
|
||
|
@item
|
||
|
Dummy arguments
|
||
|
|
||
|
@item
|
||
|
Variables
|
||
|
|
||
|
@item
|
||
|
Initial values for variables
|
||
|
@end itemize
|
||
|
|
||
|
Whereas, the following items can, and do,
|
||
|
suddenly appear ``out of the blue'' in C:
|
||
|
|
||
|
@itemize @bullet
|
||
|
@item
|
||
|
Label references
|
||
|
|
||
|
@item
|
||
|
Function references
|
||
|
@end itemize
|
||
|
|
||
|
Not surprisingly, the GBE faithfully permits the latter set of items
|
||
|
to be ``discovered'' partway through GBEL ``programs'',
|
||
|
just as they are permitted to in C.
|
||
|
|
||
|
Yet, the GBE has tended, at least in the past,
|
||
|
to be reticent to fully support similar ``late'' discovery
|
||
|
of items in the former set.
|
||
|
|
||
|
This makes Fortran a poor fit for the ``safe'' subset of GBEL.
|
||
|
Consider:
|
||
|
|
||
|
@smallexample
|
||
|
FUNCTION X (A, ARRAY, ID1)
|
||
|
CHARACTER*(*) A
|
||
|
DOUBLE PRECISION X, Y, Z, TMP, EE, PI
|
||
|
REAL ARRAY(ID1*ID2)
|
||
|
COMMON ID2
|
||
|
EXTERNAL FRED
|
||
|
|
||
|
ASSIGN 100 TO J
|
||
|
CALL FOO (I)
|
||
|
IF (I .EQ. 0) PRINT *, A(0)
|
||
|
GOTO 200
|
||
|
|
||
|
ENTRY Y (Z)
|
||
|
ASSIGN 101 TO J
|
||
|
200 PRINT *, A(1)
|
||
|
READ *, TMP
|
||
|
GOTO J
|
||
|
100 X = TMP * EE
|
||
|
RETURN
|
||
|
101 Y = TMP * PI
|
||
|
CALL FRED
|
||
|
DATA EE, PI /2.71D0, 3.14D0/
|
||
|
END
|
||
|
@end smallexample
|
||
|
|
||
|
Here are some observations about the above code,
|
||
|
which, while somewhat contrived,
|
||
|
conforms to the FORTRAN 77 and Fortran 90 standards:
|
||
|
|
||
|
@itemize @bullet
|
||
|
@item
|
||
|
The return type of function @samp{X} is not known
|
||
|
until the @samp{DOUBLE PRECISION} line has been parsed.
|
||
|
|
||
|
@item
|
||
|
Whether @samp{A} is a function or a variable
|
||
|
is not known until the @samp{PRINT *, A(0)} statement
|
||
|
has been parsed.
|
||
|
|
||
|
@item
|
||
|
The bounds of the array of argument @samp{ARRAY}
|
||
|
depend on a computation involving
|
||
|
the subsequent argument @samp{ID1}
|
||
|
and the blank-common member @samp{ID2}.
|
||
|
|
||
|
@item
|
||
|
Whether @samp{Y} and @samp{Z} are local variables,
|
||
|
additional function entry points,
|
||
|
or dummy arguments to additional entry points
|
||
|
is not known
|
||
|
until the @code{ENTRY} statement is parsed.
|
||
|
|
||
|
@item
|
||
|
Similarly, whether @samp{TMP} is a local variable is not known
|
||
|
until the @samp{READ *, TMP} statement is parsed.
|
||
|
|
||
|
@item
|
||
|
The initial values for @samp{EE} and @samp{PI}
|
||
|
are not known until after the @code{DATA} statement is parsed.
|
||
|
|
||
|
@item
|
||
|
Whether @samp{FRED} is a function returning type @code{REAL}
|
||
|
or a subroutine
|
||
|
(which can be thought of as returning type @code{void}
|
||
|
@emph{or}, to support alternate returns in a simple way,
|
||
|
type @code{int})
|
||
|
is not known
|
||
|
until the @samp{CALL FRED} statement is parsed.
|
||
|
|
||
|
@item
|
||
|
Whether @samp{100} is a @code{FORMAT} label
|
||
|
or the label of an executable statement
|
||
|
is not known
|
||
|
until the @samp{X =} statement is parsed.
|
||
|
(These two types of labels get @emph{very} different treatment,
|
||
|
especially when @code{ASSIGN}'ed.)
|
||
|
|
||
|
@item
|
||
|
That @samp{J} is a local variable is not known
|
||
|
until the first @code{ASSIGN} statement is parsed.
|
||
|
(This happens @emph{after} executable code has been seen.)
|
||
|
@end itemize
|
||
|
|
||
|
Very few of these ``discoveries''
|
||
|
can be accommodated by the GBE as it has evolved over the years.
|
||
|
The GBEL doesn't support several of them,
|
||
|
and those it might appear to support
|
||
|
don't always work properly,
|
||
|
especially in combination with other GBEL and GBE features,
|
||
|
as implemented in the GBE.
|
||
|
|
||
|
(Had the GBE and its GBEL originally evolved to support @code{g77},
|
||
|
the shoe would be on the other foot, so to speak---most, if not all,
|
||
|
of the above would be directly supported by the GBEL,
|
||
|
and a few C constructs would probably not, as they are in reality,
|
||
|
be supported.
|
||
|
Both this mythical, and today's real, GBE caters to its GBEL
|
||
|
by, sometimes, scrambling around, cleaning up after itself---after
|
||
|
discovering that assumptions it made earlier during code generation
|
||
|
are incorrect.)
|
||
|
|
||
|
So, the FFE handles these discrepancies---between the order in which
|
||
|
it discovers facts about the code it is compiling,
|
||
|
and the order in which the GBEL and GBE support such discoveries---by
|
||
|
performing what amounts to two
|
||
|
passes over each program unit.
|
||
|
|
||
|
(A few ambiguities can remain at that point,
|
||
|
such as whether, given @samp{EXTERNAL BAZ}
|
||
|
and no other reference to @samp{BAZ} in the program unit,
|
||
|
it is a subroutine, a function, or a block-data---which, in C-speak,
|
||
|
governs its declared return type.
|
||
|
Fortunately, these distinctions are easily finessed
|
||
|
for the procedure, library, and object-file interfaces
|
||
|
supported by @code{g77}.)
|
||
|
|
||
|
@node Challenges Posed
|
||
|
@section Challenges Posed
|
||
|
|
||
|
Consider the following Fortran code, which uses various extensions
|
||
|
(including some to Fortran 90):
|
||
|
|
||
|
@smallexample
|
||
|
SUBROUTINE X(A)
|
||
|
CHARACTER*(*) A
|
||
|
COMPLEX CFUNC
|
||
|
INTEGER*2 CLOCKS(200)
|
||
|
INTEGER IFUNC
|
||
|
|
||
|
CALL SYSTEM_CLOCK (CLOCKS (IFUNC (CFUNC ('('//A//')'))))
|
||
|
@end smallexample
|
||
|
|
||
|
The above poses the following challenges to any Fortran compiler
|
||
|
that uses run-time interfaces, and a run-time library, roughly similar
|
||
|
to those used by @code{g77}:
|
||
|
|
||
|
@itemize @bullet
|
||
|
@item
|
||
|
Assuming the library routine that supports @code{SYSTEM_CLOCK}
|
||
|
expects to set an @code{INTEGER*4} variable via its @code{COUNT} argument,
|
||
|
the compiler must make available to it a temporary variable of that type.
|
||
|
|
||
|
@item
|
||
|
Further, after the @code{SYSTEM_CLOCK} library routine returns,
|
||
|
the compiler must ensure that the temporary variable it wrote
|
||
|
is copied into the appropriate element of the @samp{CLOCKS} array.
|
||
|
(This assumes the compiler doesn't just reject the code,
|
||
|
which it should if it is compiling under some kind of a ``strict'' option.)
|
||
|
|
||
|
@item
|
||
|
To determine the correct index into the @samp{CLOCKS} array,
|
||
|
(putting aside the fact that the index, in this particular case,
|
||
|
need not be computed until after
|
||
|
the @code{SYSTEM_CLOCK} library routine returns),
|
||
|
the compiler must ensure that the @code{IFUNC} function is called.
|
||
|
|
||
|
That requires evaluating its argument,
|
||
|
which requires, for @code{g77}
|
||
|
(assuming @code{-ff2c} is in force),
|
||
|
reserving a temporary variable of type @code{COMPLEX}
|
||
|
for use as a repository for the return value
|
||
|
being computed by @samp{CFUNC}.
|
||
|
|
||
|
@item
|
||
|
Before invoking @samp{CFUNC},
|
||
|
is argument must be evaluated,
|
||
|
which requires allocating, at run time,
|
||
|
a temporary large enough to hold the result of the concatenation,
|
||
|
as well as actually performing the concatenation.
|
||
|
|
||
|
@item
|
||
|
The large temporary needed during invocation of @code{CFUNC}
|
||
|
should, ideally, be deallocated
|
||
|
(or, at least, left to the GBE to dispose of, as it sees fit)
|
||
|
as soon as @code{CFUNC} returns,
|
||
|
which means before @code{IFUNC} is called
|
||
|
(as it might need a lot of dynamically allocated memory).
|
||
|
@end itemize
|
||
|
|
||
|
@code{g77} currently doesn't support all of the above,
|
||
|
but, so that it might someday, it has evolved to handle
|
||
|
at least some of the above requirements.
|
||
|
|
||
|
Meeting the above requirements is made more challenging
|
||
|
by conforming to the requirements of the GBEL/GBE combination.
|
||
|
|
||
|
@node Transforming Statements
|
||
|
@section Transforming Statements
|
||
|
|
||
|
Most Fortran statements are given their own block,
|
||
|
and, for temporary variables they might need, their own scope.
|
||
|
(A block is what distinguishes @samp{@{ foo (); @}}
|
||
|
from just @samp{foo ();} in C.
|
||
|
A scope is included with every such block,
|
||
|
providing a distinct name space for local variables.)
|
||
|
|
||
|
Label definitions for the statement precede this block,
|
||
|
so @samp{10 PRINT *, I} is handled more like
|
||
|
@samp{fl10: @{ @dots{} @}} than @samp{@{ fl10: @dots{} @}}
|
||
|
(where @samp{fl10} is just a notation meaning ``Fortran Label 10''
|
||
|
for the purposes of this document).
|
||
|
|
||
|
@menu
|
||
|
* Statements Needing Temporaries::
|
||
|
* Transforming DO WHILE::
|
||
|
* Transforming Iterative DO::
|
||
|
* Transforming Block IF::
|
||
|
* Transforming SELECT CASE::
|
||
|
@end menu
|
||
|
|
||
|
@node Statements Needing Temporaries
|
||
|
@subsection Statements Needing Temporaries
|
||
|
|
||
|
Any temporaries needed during, but not beyond,
|
||
|
execution of a Fortran statement,
|
||
|
are made local to the scope of that statement's block.
|
||
|
|
||
|
This allows the GBE to share storage for these temporaries
|
||
|
among the various statements without the FFE
|
||
|
having to manage that itself.
|
||
|
|
||
|
(The GBE could, of course, decide to optimize
|
||
|
management of these temporaries.
|
||
|
For example, it could, theoretically,
|
||
|
schedule some of the computations involving these temporaries
|
||
|
to occur in parallel.
|
||
|
More practically, it might leave the storage for some temporaries
|
||
|
``live'' beyond their scopes, to reduce the number of
|
||
|
manipulations of the stack pointer at run time.)
|
||
|
|
||
|
Temporaries needed across distinct statement boundaries usually
|
||
|
are associated with Fortran blocks (such as @code{DO}/@code{END DO}).
|
||
|
(Also, there might be temporaries not associated with blocks at all---these
|
||
|
would be in the scope of the entire program unit.)
|
||
|
|
||
|
Each Fortran block @emph{should} get its own block/scope in the GBE.
|
||
|
This is best, because it allows temporaries to be more naturally handled.
|
||
|
However, it might pose problems when handling labels
|
||
|
(in particular, when they're the targets of @code{GOTO}s outside the Fortran
|
||
|
block), and generally just hassling with replicating
|
||
|
parts of the @code{gcc} front end
|
||
|
(because the FFE needs to support
|
||
|
an arbitrary number of nested back-end blocks
|
||
|
if each Fortran block gets one).
|
||
|
|
||
|
So, there might still be a need for top-level temporaries, whose
|
||
|
``owning'' scope is that of the containing procedure.
|
||
|
|
||
|
Also, there seems to be problems declaring new variables after
|
||
|
generating code (within a block) in the back end, leading to, e.g.,
|
||
|
@samp{label not defined before binding contour} or similar messages,
|
||
|
when compiling with @samp{-fstack-check} or
|
||
|
when compiling for certain targets.
|
||
|
|
||
|
Because of that, and because sometimes these temporaries are not
|
||
|
discovered until in the middle of of generating code for an expression
|
||
|
statement (as in the case of the optimization for @samp{X**I}),
|
||
|
it seems best to always
|
||
|
pre-scan all the expressions that'll be expanded for a block
|
||
|
before generating any of the code for that block.
|
||
|
|
||
|
This pre-scan then handles discovering and declaring, to the back end,
|
||
|
the temporaries needed for that block.
|
||
|
|
||
|
It's also important to treat distinct items in an I/O list as distinct
|
||
|
statements deserving their own blocks.
|
||
|
That's because there's a requirement
|
||
|
that each I/O item be fully processed before the next one,
|
||
|
which matters in cases like @samp{READ (*,*), I, A(I)}---the
|
||
|
element of @samp{A} read in the second item
|
||
|
@emph{must} be determined from the value
|
||
|
of @samp{I} read in the first item.
|
||
|
|
||
|
@node Transforming DO WHILE
|
||
|
@subsection Transforming DO WHILE
|
||
|
|
||
|
@samp{DO WHILE(expr)} @emph{must} be implemented
|
||
|
so that temporaries needed to evaluate @samp{expr}
|
||
|
are generated just for the test, each time.
|
||
|
|
||
|
Consider how @samp{DO WHILE (A//B .NE. 'END'); @dots{}; END DO} is transformed:
|
||
|
|
||
|
@smallexample
|
||
|
for (;;)
|
||
|
@{
|
||
|
int temp0;
|
||
|
|
||
|
@{
|
||
|
char temp1[large];
|
||
|
|
||
|
libg77_catenate (temp1, a, b);
|
||
|
temp0 = libg77_ne (temp1, 'END');
|
||
|
@}
|
||
|
|
||
|
if (! temp0)
|
||
|
break;
|
||
|
|
||
|
@dots{}
|
||
|
@}
|
||
|
@end smallexample
|
||
|
|
||
|
In this case, it seems like a time/space tradeoff
|
||
|
between allocating and deallocating @samp{temp1} for each iteration
|
||
|
and allocating it just once for the entire loop.
|
||
|
|
||
|
However, if @samp{temp1} is allocated just once for the entire loop,
|
||
|
it could be the wrong size for subsequent iterations of that loop
|
||
|
in cases like @samp{DO WHILE (A(I:J)//B .NE. 'END')},
|
||
|
because the body of the loop might modify @samp{I} or @samp{J}.
|
||
|
|
||
|
So, the above implementation is used,
|
||
|
though a more optimal one can be used
|
||
|
in specific circumstances.
|
||
|
|
||
|
@node Transforming Iterative DO
|
||
|
@subsection Transforming Iterative DO
|
||
|
|
||
|
An iterative @code{DO} loop
|
||
|
(one that specifies an iteration variable)
|
||
|
is required by the Fortran standards
|
||
|
to be implemented as though an iteration count
|
||
|
is computed before entering the loop body,
|
||
|
and that iteration count used to determine
|
||
|
the number of times the loop body is to be performed
|
||
|
(assuming the loop isn't cut short via @code{GOTO} or @code{EXIT}).
|
||
|
|
||
|
The FFE handles this by allocating a temporary variable
|
||
|
to contain the computed number of iterations.
|
||
|
Since this variable must be in a scope that includes the entire loop,
|
||
|
a GBEL block is created for that loop,
|
||
|
and the variable declared as belonging to the scope of that block.
|
||
|
|
||
|
@node Transforming Block IF
|
||
|
@subsection Transforming Block IF
|
||
|
|
||
|
Consider:
|
||
|
|
||
|
@smallexample
|
||
|
SUBROUTINE X(A,B,C)
|
||
|
CHARACTER*(*) A, B, C
|
||
|
LOGICAL LFUNC
|
||
|
|
||
|
IF (LFUNC (A//B)) THEN
|
||
|
CALL SUBR1
|
||
|
ELSE IF (LFUNC (A//C)) THEN
|
||
|
CALL SUBR2
|
||
|
ELSE
|
||
|
CALL SUBR3
|
||
|
END
|
||
|
@end smallexample
|
||
|
|
||
|
The arguments to the two calls to @samp{LFUNC}
|
||
|
require dynamic allocation (at run time),
|
||
|
but are not required during execution of the @code{CALL} statements.
|
||
|
|
||
|
So, the scopes of those temporaries must be within blocks inside
|
||
|
the block corresponding to the Fortran @code{IF} block.
|
||
|
|
||
|
This cannot be represented ``naturally''
|
||
|
in vanilla C, nor in GBEL.
|
||
|
The @code{if}, @code{elseif}, @code{else},
|
||
|
and @code{endif} constructs
|
||
|
provided by both languages must,
|
||
|
for a given @code{if} block,
|
||
|
share the same C/GBE block.
|
||
|
|
||
|
Therefore, any temporaries needed during evaluation of @samp{expr}
|
||
|
while executing @samp{ELSE IF(expr)}
|
||
|
must either have been predeclared
|
||
|
at the top of the corresponding @code{IF} block,
|
||
|
or declared within a new block for that @code{ELSE IF}---a block that,
|
||
|
since it cannot contain the @code{else} or @code{else if} itself
|
||
|
(due to the above requirement),
|
||
|
actually implements the rest of the @code{IF} block's
|
||
|
@code{ELSE IF} and @code{ELSE} statements
|
||
|
within an inner block.
|
||
|
|
||
|
The FFE takes the latter approach.
|
||
|
|
||
|
@node Transforming SELECT CASE
|
||
|
@subsection Transforming SELECT CASE
|
||
|
|
||
|
@code{SELECT CASE} poses a few interesting problems for code generation,
|
||
|
if efficiency and frugal stack management are important.
|
||
|
|
||
|
Consider @samp{SELECT CASE (I('PREFIX'//A))},
|
||
|
where @samp{A} is @code{CHARACTER*(*)}.
|
||
|
In a case like this---basically,
|
||
|
in any case where largish temporaries are needed
|
||
|
to evaluate the expression---those temporaries should
|
||
|
not be ``live'' during execution of any of the @code{CASE} blocks.
|
||
|
|
||
|
So, evaluation of the expression is best done within its own block,
|
||
|
which in turn is within the @code{SELECT CASE} block itself
|
||
|
(which contains the code for the CASE blocks as well,
|
||
|
though each within their own block).
|
||
|
|
||
|
Otherwise, we'd have the rough equivalent of this pseudo-code:
|
||
|
|
||
|
@smallexample
|
||
|
@{
|
||
|
char temp[large];
|
||
|
|
||
|
libg77_catenate (temp, 'prefix', a);
|
||
|
|
||
|
switch (i (temp))
|
||
|
@{
|
||
|
case 0:
|
||
|
@dots{}
|
||
|
@}
|
||
|
@}
|
||
|
@end smallexample
|
||
|
|
||
|
And that would leave temp[large] in scope during the CASE blocks
|
||
|
(although a clever back end *could* see that it isn't referenced
|
||
|
in them, and thus free that temp before executing the blocks).
|
||
|
|
||
|
So this approach is used instead:
|
||
|
|
||
|
@smallexample
|
||
|
@{
|
||
|
int temp0;
|
||
|
|
||
|
@{
|
||
|
char temp1[large];
|
||
|
|
||
|
libg77_catenate (temp1, 'prefix', a);
|
||
|
temp0 = i (temp1);
|
||
|
@}
|
||
|
|
||
|
switch (temp0)
|
||
|
@{
|
||
|
case 0:
|
||
|
@dots{}
|
||
|
@}
|
||
|
@}
|
||
|
@end smallexample
|
||
|
|
||
|
Note how @samp{temp1} goes out of scope before starting the switch,
|
||
|
thus making it easy for a back end to free it.
|
||
|
|
||
|
The problem @emph{that} solution has, however,
|
||
|
is with @samp{SELECT CASE('prefix'//A)}
|
||
|
(which is currently not supported).
|
||
|
|
||
|
Unless the GBEL is extended to support arbitrarily long character strings
|
||
|
in its @code{case} facility,
|
||
|
the FFE has to implement @code{SELECT CASE} on @code{CHARACTER}
|
||
|
(probably excepting @code{CHARACTER*1})
|
||
|
using a cascade of
|
||
|
@code{if}, @code{elseif}, @code{else}, and @code{endif} constructs
|
||
|
in GBEL.
|
||
|
|
||
|
To prevent the (potentially large) temporary,
|
||
|
needed to hold the selected expression itself (@samp{'prefix'//A}),
|
||
|
from being in scope during execution of the @code{CASE} blocks,
|
||
|
two approaches are available:
|
||
|
|
||
|
@itemize @bullet
|
||
|
@item
|
||
|
Pre-evaluate all the @code{CASE} tests,
|
||
|
producing an integer ordinal that is used,
|
||
|
a la @samp{temp0} in the earlier example,
|
||
|
as if @samp{SELECT CASE(temp0)} had been written.
|
||
|
|
||
|
Each corresponding @code{CASE} is replaced with @samp{CASE(@var{i})},
|
||
|
where @var{i} is the ordinal for that case,
|
||
|
determined while, or before,
|
||
|
generating the cascade of @code{if}-related constructs
|
||
|
to cope with @code{CHARACTER} selection.
|
||
|
|
||
|
@item
|
||
|
Make @samp{temp0} above just
|
||
|
large enough to hold the longest @code{CASE} string
|
||
|
that'll actually be compared against the expression
|
||
|
(in this case, @samp{'prefix'//A}).
|
||
|
|
||
|
Since that length must be constant
|
||
|
(because @code{CASE} expressions are all constant),
|
||
|
it won't be so large,
|
||
|
and, further, @samp{temp1} need not be dynamically allocated,
|
||
|
since normal @code{CHARACTER} assignment can be used
|
||
|
into the fixed-length @samp{temp0}.
|
||
|
@end itemize
|
||
|
|
||
|
Both of these solutions require @code{SELECT CASE} implementation
|
||
|
to be changed so all the corresponding @code{CASE} statements
|
||
|
are seen during the actual code generation for @code{SELECT CASE}.
|
||
|
|
||
|
@node Transforming Expressions
|
||
|
@section Transforming Expressions
|
||
|
|
||
|
The interactions between statements, expressions, and subexpressions
|
||
|
at program run time can be viewed as:
|
||
|
|
||
|
@smallexample
|
||
|
@var{action}(@var{expr})
|
||
|
@end smallexample
|
||
|
|
||
|
Here, @var{action} is the series of steps
|
||
|
performed to effect the statement,
|
||
|
and @var{expr} is the expression
|
||
|
whose value is used by @var{action}.
|
||
|
|
||
|
Expanding the above shows a typical order of events at run time:
|
||
|
|
||
|
@smallexample
|
||
|
Evaluate @var{expr}
|
||
|
Perform @var{action}, using result of evaluation of @var{expr}
|
||
|
Clean up after evaluating @var{expr}
|
||
|
@end smallexample
|
||
|
|
||
|
So, if evaluating @var{expr} requires allocating memory,
|
||
|
that memory can be freed before performing @var{action}
|
||
|
only if it is not needed to hold the result of evaluating @var{expr}.
|
||
|
Otherwise, it must be freed no sooner than
|
||
|
after @var{action} has been performed.
|
||
|
|
||
|
The above are recursive definitions,
|
||
|
in the sense that they apply to subexpressions of @var{expr}.
|
||
|
|
||
|
That is, evaluating @var{expr} involves
|
||
|
evaluating all of its subexpressions,
|
||
|
performing the @var{action} that computes the
|
||
|
result value of @var{expr},
|
||
|
then cleaning up after evaluating those subexpressions.
|
||
|
|
||
|
The recursive nature of this evaluation is implemented
|
||
|
via recursive-descent transformation of the top-level statements,
|
||
|
their expressions, @emph{their} subexpressions, and so on.
|
||
|
|
||
|
However, that recursive-descent transformation is,
|
||
|
due to the nature of the GBEL,
|
||
|
focused primarily on generating a @emph{single} stream of code
|
||
|
to be executed at run time.
|
||
|
|
||
|
Yet, from the above, it's clear that multiple streams of code
|
||
|
must effectively be simultaneously generated
|
||
|
during the recursive-descent analysis of statements.
|
||
|
|
||
|
The primary stream implements the primary @var{action} items,
|
||
|
while at least two other streams implement
|
||
|
the evaluation and clean-up items.
|
||
|
|
||
|
Requirements imposed by expressions include:
|
||
|
|
||
|
@itemize @bullet
|
||
|
@item
|
||
|
Whether the caller needs to have a temporary ready
|
||
|
to hold the value of the expression.
|
||
|
|
||
|
@item
|
||
|
Other stuff???
|
||
|
@end itemize
|
||
|
|
||
|
@node Internal Naming Conventions
|
||
|
@section Internal Naming Conventions
|
||
|
|
||
|
Names exported by FFE modules have the following (regular-expression) forms.
|
||
|
Note that all names beginning @code{ffe@var{mod}} or @code{FFE@var{mod}},
|
||
|
where @var{mod} is lowercase or uppercase alphanumerics, respectively,
|
||
|
are exported by the module @code{ffe@var{mod}},
|
||
|
with the source code doing the exporting in @file{@var{mod}.h}.
|
||
|
(Usually, the source code for the implementation is in @file{@var{mod}.c}.)
|
||
|
|
||
|
Identifiers that don't fit the following forms
|
||
|
are not considered exported,
|
||
|
even if they are according to the C language.
|
||
|
(For example, they might be made available to other modules
|
||
|
solely for use within expansions of exported macros,
|
||
|
not for use within any source code in those other modules.)
|
||
|
|
||
|
@table @code
|
||
|
@item ffe@var{mod}
|
||
|
The single typedef exported by the module.
|
||
|
|
||
|
@item FFE@var{umod}_[A-Z][A-Z0-9_]*
|
||
|
(Where @var{umod} is the uppercase for of @var{mod}.)
|
||
|
|
||
|
A @code{#define} or @code{enum} constant of the type @code{ffe@var{mod}}.
|
||
|
|
||
|
@item ffe@var{mod}[A-Z][A-Z][a-z0-9]*
|
||
|
A typedef exported by the module.
|
||
|
|
||
|
The portion of the identifier after @code{ffe@var{mod}} is
|
||
|
referred to as @code{ctype}, a capitalized (mixed-case) form
|
||
|
of @code{type}.
|
||
|
|
||
|
@item FFE@var{umod}_@var{type}[A-Z][A-Z0-9_]*[A-Z0-9]?
|
||
|
(Where @var{umod} is the uppercase for of @var{mod}.)
|
||
|
|
||
|
A @code{#define} or @code{enum} constant of the type
|
||
|
@code{ffe@var{mod}@var{type}},
|
||
|
where @var{type} is the lowercase form of @var{ctype}
|
||
|
in an exported typedef.
|
||
|
|
||
|
@item ffe@var{mod}_@var{value}
|
||
|
A function that does or returns something,
|
||
|
as described by @var{value} (see below).
|
||
|
|
||
|
@item ffe@var{mod}_@var{value}_@var{input}
|
||
|
A function that does or returns something based
|
||
|
primarily on the thing described by @var{input} (see below).
|
||
|
@end table
|
||
|
|
||
|
Below are names used for @var{value} and @var{input},
|
||
|
along with their definitions.
|
||
|
|
||
|
@table @code
|
||
|
@item col
|
||
|
A column number within a line (first column is number 1).
|
||
|
|
||
|
@item file
|
||
|
An encapsulation of a file's name.
|
||
|
|
||
|
@item find
|
||
|
Looks up an instance of some type that matches specified criteria,
|
||
|
and returns that, even if it has to create a new instance or
|
||
|
crash trying to find it (as appropriate).
|
||
|
|
||
|
@item initialize
|
||
|
Initializes, usually a module. No type.
|
||
|
|
||
|
@item int
|
||
|
A generic integer of type @code{int}.
|
||
|
|
||
|
@item is
|
||
|
A generic integer that contains a true (non-zero) or false (zero) value.
|
||
|
|
||
|
@item len
|
||
|
A generic integer that contains the length of something.
|
||
|
|
||
|
@item line
|
||
|
A line number within a source file,
|
||
|
or a global line number.
|
||
|
|
||
|
@item lookup
|
||
|
Looks up an instance of some type that matches specified criteria,
|
||
|
and returns that, or returns nil.
|
||
|
|
||
|
@item name
|
||
|
A @code{text} that points to a name of something.
|
||
|
|
||
|
@item new
|
||
|
Makes a new instance of the indicated type.
|
||
|
Might return an existing one if appropriate---if so,
|
||
|
similar to @code{find} without crashing.
|
||
|
|
||
|
@item pt
|
||
|
Pointer to a particular character (line, column pairs)
|
||
|
in the input file (source code being compiled).
|
||
|
|
||
|
@item run
|
||
|
Performs some herculean task. No type.
|
||
|
|
||
|
@item terminate
|
||
|
Terminates, usually a module. No type.
|
||
|
|
||
|
@item text
|
||
|
A @code{char *} that points to generic text.
|
||
|
@end table
|