This is Info file f/g77.info, produced by Makeinfo version 1.68 from the input file ../../../src/gcc-2.95.3/gcc/f/g77.texi. INFO-DIR-SECTION Programming START-INFO-DIR-ENTRY * g77: (g77). The GNU Fortran compiler. END-INFO-DIR-ENTRY This file documents the use and the internals of the GNU Fortran (`g77') compiler. It corresponds to the GCC-2.95 version of `g77'. Published by the Free Software Foundation 59 Temple Place - Suite 330 Boston, MA 02111-1307 USA Copyright (C) 1995-1999 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the sections entitled "GNU General Public License," "Funding for Free Software," and "Protect Your Freedom--Fight `Look And Feel'" are included exactly as in the original, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that the sections entitled "GNU General Public License," "Funding for Free Software," and "Protect Your Freedom--Fight `Look And Feel'", and this permission notice, may be included in translations approved by the Free Software Foundation instead of in the original English. Contributed by James Craig Burley (). Inspired by a first pass at translating `g77-0.5.16/f/DOC' that was contributed to Craig by David Ronis ().  File: g77.info, Node: Adding Options, Next: Projects, Prev: Service, Up: Top Adding Options ************** To add a new command-line option to `g77', first decide what kind of option you wish to add. Search the `g77' and `gcc' documentation for one or more options that is most closely like the one you want to add (in terms of what kind of effect it has, and so on) to help clarify its nature. * *Fortran options* are options that apply only when compiling Fortran programs. They are accepted by `g77' and `gcc', but they apply only when compiling Fortran programs. * *Compiler options* are options that apply when compiling most any kind of program. *Fortran options* are listed in the file `egcs/gcc/f/lang-options.h', which is used during the build of `gcc' to build a list of all options that are accepted by at least one language's compiler. This list goes into the `lang_options' array in `gcc/toplev.c', which uses this array to determine whether a particular option should be offered to the linked-in front end for processing by calling `lang_option_decode', which, for `g77', is in `egcs/gcc/f/com.c' and just calls `ffe_decode_option'. If the linked-in front end "rejects" a particular option passed to it, `toplev.c' just ignores the option, because *some* language's compiler is willing to accept it. This allows commands like `gcc -fno-asm foo.c bar.f' to work, even though Fortran compilation does not currently support the `-fno-asm' option; even though the `f771' version of `lang_decode_option' rejects `-fno-asm', `toplev.c' doesn't produce a diagnostic because some other language (C) does accept it. This also means that commands like `g77 -fno-asm foo.f' yield no diagnostics, despite the fact that no phase of the command was able to recognize and process `-fno-asm'--perhaps a warning about this would be helpful if it were possible. Code that processes Fortran options is found in `egcs/gcc/f/top.c', function `ffe_decode_option'. This code needs to check positive and negative forms of each option. The defaults for Fortran options are set in their global definitions, also found in `egcs/gcc/f/top.c'. Many of these defaults are actually macros defined in `egcs/gcc/f/target.h', since they might be machine-specific. However, since, in practice, GNU compilers should behave the same way on all configurations (especially when it comes to language constructs), the practice of setting defaults in `target.h' is likely to be deprecated and, ultimately, stopped in future versions of `g77'. Accessor macros for Fortran options, used by code in the `g77' FFE, are defined in `egcs/gcc/f/top.h'. *Compiler options* are listed in `gcc/toplev.c' in the array `f_options'. An option not listed in `lang_options' is looked up in `f_options' and handled from there. The defaults for compiler options are set in the global definitions for the corresponding variables, some of which are in `gcc/toplev.c'. You can set different defaults for *Fortran-oriented* or *Fortran-reticent* compiler options by changing the source code of `g77' and rebuilding. How to do this depends on the version of `g77': `G77 0.5.24 (EGCS 1.1)' `G77 0.5.25 (EGCS 1.2)' Change the `lang_init_options' routine in `egcs/gcc/f/com.c'. (Note that these versions of `g77' perform internal consistency checking automatically when the `-fversion' option is specified.) `G77 0.5.23' `G77 0.5.24 (EGCS 1.0)' Change the way `f771' handles the `-fset-g77-defaults' option, which is always provided as the first option when called by `g77' or `gcc'. This code is in `ffe_decode_options' in `egcs/gcc/f/top.c'. Have it change just the variables that you want to default to a different setting for Fortran compiles compared to compiles of other languages. The `-fset-g77-defaults' option is passed to `f771' automatically because of the specification information kept in `egcs/gcc/f/lang-specs.h'. This file tells the `gcc' command how to recognize, in this case, Fortran source files (those to be preprocessed, and those that are not), and further, how to invoke the appropriate programs (including `f771') to process those source files. It is in `egcs/gcc/f/lang-specs.h' that `-fset-g77-defaults', `-fversion', and other options are passed, as appropriate, even when the user has not explicitly specified them. Other "internal" options such as `-quiet' also are passed via this mechanism.  File: g77.info, Node: Projects, Next: Front End, Prev: Adding Options, Up: Top Projects ******** If you want to contribute to `g77' by doing research, design, specification, documentation, coding, or testing, the following information should give you some ideas. More relevant information might be available from `ftp://alpha.gnu.org/gnu/g77/projects/'. * Menu: * Efficiency:: Make `g77' itself compile code faster. * Better Optimization:: Teach `g77' to generate faster code. * Simplify Porting:: Make `g77' easier to configure, build, and install. * More Extensions:: Features many users won't know to ask for. * Machine Model:: `g77' should better leverage `gcc'. * Internals Documentation:: Make maintenance easier. * Internals Improvements:: Make internals more robust. * Better Diagnostics:: Make using `g77' on new code easier.  File: g77.info, Node: Efficiency, Next: Better Optimization, Up: Projects Improve Efficiency ================== Don't bother doing any performance analysis until most of the following items are taken care of, because there's no question they represent serious space/time problems, although some of them show up only given certain kinds of (popular) input. * Improve `malloc' package and its uses to specify more info about memory pools and, where feasible, use obstacks to implement them. * Skip over uninitialized portions of aggregate areas (arrays, `COMMON' areas, `EQUIVALENCE' areas) so zeros need not be output. This would reduce memory usage for large initialized aggregate areas, even ones with only one initialized element. As of version 0.5.18, a portion of this item has already been accomplished. * Prescan the statement (in `sta.c') so that the nature of the statement is determined as much as possible by looking entirely at its form, and not looking at any context (previous statements, including types of symbols). This would allow ripping out of the statement-confirmation, symbol retraction/confirmation, and diagnostic inhibition mechanisms. Plus, it would result in much-improved diagnostics. For example, `CALL some-intrinsic(...)', where the intrinsic is not a subroutine intrinsic, would result actual error instead of the unimplemented-statement catch-all. * Throughout `g77', don't pass line/column pairs where a simple `ffewhere' type, which points to the error as much as is desired by the configuration, will do, and don't pass `ffelexToken' types where a simple `ffewhere' type will do. Then, allow new default configuration of `ffewhere' such that the source line text is not preserved, and leave it to things like Emacs' next-error function to point to them (now that `next-error' supports column, or, perhaps, character-offset, numbers). The change in calling sequences should improve performance somewhat, as should not having to save source lines. (Whether this whole item will improve performance is questionable, but it should improve maintainability.) * Handle `DATA (A(I),I=1,1000000)/1000000*2/' more efficiently, especially as regards the assembly output. Some of this might require improving the back end, but lots of improvement in space/time required in `g77' itself can be fairly easily obtained without touching the back end. Maybe type-conversion, where necessary, can be speeded up as well in cases like the one shown (converting the `2' into `2.'). * If analysis shows it to be worthwhile, optimize `lex.c'. * Consider redesigning `lex.c' to not need any feedback during tokenization, by keeping track of enough parse state on its own.  File: g77.info, Node: Better Optimization, Next: Simplify Porting, Prev: Efficiency, Up: Projects Better Optimization =================== Much of this work should be put off until after `g77' has all the features necessary for its widespread acceptance as a useful F77 compiler. However, perhaps this work can be done in parallel during the feature-adding work. * Do the equivalent of the trick of putting `extern inline' in front of every function definition in `libg2c' and #include'ing the resulting file in `f2c'+`gcc'--that is, inline all run-time-library functions that are at all worth inlining. (Some of this has already been done, such as for integral exponentiation.) * When doing `CHAR_VAR = CHAR_FUNC(...)', and it's clear that types line up and `CHAR_VAR' is addressable or not a `VAR_DECL', make `CHAR_VAR', not a temporary, be the receiver for `CHAR_FUNC'. (This is now done for `COMPLEX' variables.) * Design and implement Fortran-specific optimizations that don't really belong in the back end, or where the front end needs to give the back end more info than it currently does. * Design and implement a new run-time library interface, with the code going into `libgcc' so no special linking is required to link Fortran programs using standard language features. This library would speed up lots of things, from I/O (using precompiled formats, doing just one, or, at most, very few, calls for arrays or array sections, and so on) to general computing (array/section implementations of various intrinsics, implementation of commonly performed loops that aren't likely to be optimally compiled otherwise, etc.). Among the important things the library would do are: * Be a one-stop-shop-type library, hence shareable and usable by all, in that what are now library-build-time options in `libg2c' would be moved at least to the `g77' compile phase, if not to finer grains (such as choosing how list-directed I/O formatting is done by default at `OPEN' time, for preconnected units via options or even statements in the main program unit, maybe even on a per-I/O basis with appropriate pragma-like devices). * Probably requiring the new library design, change interface to normally have `COMPLEX' functions return their values in the way `gcc' would if they were declared `__complex__ float', rather than using the mechanism currently used by `CHARACTER' functions (whereby the functions are compiled as returning void and their first arg is a pointer to where to store the result). (Don't append underscores to external names for `COMPLEX' functions in some cases once `g77' uses `gcc' rather than `f2c' calling conventions.) * Do something useful with `doiter' references where possible. For example, `CALL FOO(I)' cannot modify `I' if within a `DO' loop that uses `I' as the iteration variable, and the back end might find that info useful in determining whether it needs to read `I' back into a register after the call. (It normally has to do that, unless it knows `FOO' never modifies its passed-by-reference argument, which is rarely the case for Fortran-77 code.)  File: g77.info, Node: Simplify Porting, Next: More Extensions, Prev: Better Optimization, Up: Projects Simplify Porting ================ Making `g77' easier to configure, port, build, and install, either as a single-system compiler or as a cross-compiler, would be very useful. * A new library (replacing `libg2c') should improve portability as well as produce more optimal code. Further, `g77' and the new library should conspire to simplify naming of externals, such as by removing unnecessarily added underscores, and to reduce/eliminate the possibility of naming conflicts, while making debugger more straightforward. Also, it should make multi-language applications more feasible, such as by providing Fortran intrinsics that get Fortran unit numbers given C `FILE *' descriptors. * Possibly related to a new library, `g77' should produce the equivalent of a `gcc' `main(argc, argv)' function when it compiles a main program unit, instead of compiling something that must be called by a library implementation of `main()'. This would do many useful things such as provide more flexibility in terms of setting up exception handling, not requiring programmers to start their debugging sessions with `breakpoint MAIN__' followed by `run', and so on. * The GBE needs to understand the difference between alignment requirements and desires. For example, on Intel x86 machines, `g77' currently imposes overly strict alignment requirements, due to the back end, but it would be useful for Fortran and C programmers to be able to override these *recommendations* as long as they don't violate the actual processor *requirements*.  File: g77.info, Node: More Extensions, Next: Machine Model, Prev: Simplify Porting, Up: Projects More Extensions =============== These extensions are not the sort of things users ask for "by name", but they might improve the usability of `g77', and Fortran in general, in the long run. Some of these items really pertain to improving `g77' internals so that some popular extensions can be more easily supported. * Look through all the documentation on the GNU Fortran language, dialects, compiler, missing features, bugs, and so on. Many mentions of incomplete or missing features are sprinkled throughout. It is not worth repeating them here. * Consider adding a `NUMERIC' type to designate typeless numeric constants, named and unnamed. The idea is to provide a forward-looking, effective replacement for things like the old-style `PARAMETER' statement when people really need typelessness in a maintainable, portable, clearly documented way. Maybe `TYPELESS' would include `CHARACTER', `POINTER', and whatever else might come along. (This is not really a call for polymorphism per se, just an ability to express limited, syntactic polymorphism.) * Support `OPEN(...,KEY=(...),...)'. * Support arbitrary file unit numbers, instead of limiting them to 0 through `MXUNIT-1'. (This is a `libg2c' issue.) * `OPEN(NOSPANBLOCKS,...)' is treated as `OPEN(UNIT=NOSPANBLOCKS,...)', so a later `UNIT=' in the first example is invalid. Make sure this is what users of this feature would expect. * Currently `g77' disallows `READ(1'10)' since it is an obnoxious syntax, but supporting it might be pretty easy if needed. More details are needed, such as whether general expressions separated by an apostrophe are supported, or maybe the record number can be a general expression, and so on. * Support `STRUCTURE', `UNION', `MAP', and `RECORD' fully. Currently there is no support at all for `%FILL' in `STRUCTURE' and related syntax, whereas the rest of the stuff has at least some parsing support. This requires either major changes to `libg2c' or its replacement. * F90 and `g77' probably disagree about label scoping relative to `INTERFACE' and `END INTERFACE', and their contained procedure interface bodies (blocks?). * `ENTRY' doesn't support F90 `RESULT()' yet, since that was added after S8.112. * Empty-statement handling (10 ;;CONTINUE;;) probably isn't consistent with the final form of the standard (it was vague at S8.112). * It seems to be an "open" question whether a file, immediately after being `OPEN'ed,is positioned at the beginning, the end, or wherever--it might be nice to offer an option of opening to "undefined" status, requiring an explicit absolute-positioning operation to be performed before any other (besides `CLOSE') to assist in making applications port to systems (some IBM?) that `OPEN' to the end of a file or some such thing.  File: g77.info, Node: Machine Model, Next: Internals Documentation, Prev: More Extensions, Up: Projects Machine Model ============= This items pertain to generalizing `g77''s view of the machine model to more fully accept whatever the GBE provides it via its configuration. * Switch to using `REAL_VALUE_TYPE' to represent floating-point constants exclusively so the target float format need not be required. This means changing the way `g77' handles initialization of aggregate areas having more than one type, such as `REAL' and `INTEGER', because currently it initializes them as if they were arrays of `char' and uses the bit patterns of the constants of the various types in them to determine what to stuff in elements of the arrays. * Rely more and more on back-end info and capabilities, especially in the area of constants (where having the `g77' front-end's IL just store the appropriate tree nodes containing constants might be best). * Suite of C and Fortran programs that a user/administrator can run on a machine to help determine the configuration for `g77' before building and help determine if the compiler works (especially with whatever libraries are installed) after building.  File: g77.info, Node: Internals Documentation, Next: Internals Improvements, Prev: Machine Model, Up: Projects Internals Documentation ======================= Better info on how `g77' works and how to port it is needed. Much of this should be done only after the redesign planned for 0.6 is complete. *Note Front End::, which contains some information on `g77' internals.  File: g77.info, Node: Internals Improvements, Next: Better Diagnostics, Prev: Internals Documentation, Up: Projects Internals Improvements ====================== Some more items that would make `g77' more reliable and easier to maintain: * Generally make expression handling focus more on critical syntax stuff, leaving semantics to callers. For example, anything a caller can check, semantically, let it do so, rather than having `expr.c' do it. (Exceptions might include things like diagnosing `FOO(I--K:)=BAR' where `FOO' is a `PARAMETER'--if it seems important to preserve the left-to-right-in-source order of production of diagnostics.) * Come up with better naming conventions for `-D' to establish requirements to achieve desired implementation dialect via `proj.h'. * Clean up used tokens and `ffewhere's in `ffeglobal_terminate_1'. * Replace `sta.c' `outpooldisp' mechanism with `malloc_pool_use'. * Check for `opANY' in more places in `com.c', `std.c', and `ste.c', and get rid of the `opCONVERT(opANY)' kludge (after determining if there is indeed no real need for it). * Utility to read and check `bad.def' messages and their references in the code, to make sure calls are consistent with message templates. * Search and fix `&ffe...' and similar so that `ffe...ptr...' macros are available instead (a good argument for wishing this could have written all this stuff in C++, perhaps). On the other hand, it's questionable whether this sort of improvement is really necessary, given the availability of tools such as Emacs and Perl, which make finding any address-taking of structure members easy enough? * Some modules truly export the member names of their structures (and the structures themselves), maybe fix this, and fix other modules that just appear to as well (by appending `_', though it'd be ugly and probably not worth the time). * Implement C macros `RETURNS(value)' and `SETS(something,value)' in `proj.h' and use them throughout `g77' source code (especially in the definitions of access macros in `.h' files) so they can be tailored to catch code writing into a `RETURNS()' or reading from a `SETS()'. * Decorate throughout with `const' and other such stuff. * All F90 notational derivations in the source code are still based on the S8.112 version of the draft standard. Probably should update to the official standard, or put documentation of the rules as used in the code...uh...in the code. * Some `ffebld_new' calls (those outside of `ffeexpr.c' or inside but invoked via paths not involving `ffeexpr_lhs' or `ffeexpr_rhs') might be creating things in improper pools, leading to such things staying around too long or (doubtful, but possible and dangerous) not long enough. * Some `ffebld_list_new' (or whatever) calls might not be matched by `ffebld_list_bottom' (or whatever) calls, which might someday matter. (It definitely is not a problem just yet.) * Probably not doing clean things when we fail to `EQUIVALENCE' something due to alignment/mismatch or other problems--they end up without `ffestorag' objects, so maybe the backend (and other parts of the front end) can notice that and handle like an `opANY' (do what it wants, just don't complain or crash). Most of this seems to have been addressed by now, but a code review wouldn't hurt.  File: g77.info, Node: Better Diagnostics, Prev: Internals Improvements, Up: Projects Better Diagnostics ================== These are things users might not ask about, or that need to be looked into, before worrying about. Also here are items that involve reducing unnecessary diagnostic clutter. * When `FUNCTION' and `ENTRY' point types disagree (`CHARACTER' lengths, type classes, and so on), `ANY'-ize the offending `ENTRY' point and any *new* dummies it specifies. * Speed up and improve error handling for data when repeat-count is specified. For example, don't output 20 unnecessary messages after the first necessary one for: INTEGER X(20) CONTINUE DATA (X(I), J= 1, 20) /20*5/ END (The `CONTINUE' statement ensures the `DATA' statement is processed in the context of executable, not specification, statements.)  File: g77.info, Node: Front End, Next: Diagnostics, Prev: Projects, Up: Top Front End ********* This chapter describes some aspects of the design and implementation of the `g77' front end. Much of the information below applies not to current releases of `g77', but to the 0.6 rewrite being designed and implemented as of late May, 1999. To find about things that are "To Be Determined" or "To Be Done", search for the string TBD. If you want to help by working on one or more of these items, email me at . If you're planning to do more than just research issues and offer comments, see `http://www.gnu.org/software/contribute.html' for steps you might need to take first. * Menu: * Overview of Sources:: * Overview of Translation Process:: * Philosophy of Code Generation:: * Two-pass Design:: * Challenges Posed:: * Transforming Statements:: * Transforming Expressions:: * Internal Naming Conventions::  File: g77.info, Node: Overview of Sources, Next: Overview of Translation Process, Up: Front End Overview of Sources =================== The current directory layout includes the following: `{No Value For "srcdir"}/gcc/' Non-g77 files in gcc `{No Value For "srcdir"}/gcc/f/' GNU Fortran front end sources `{No Value For "srcdir"}/libf2c/' `libg2c' configuration and `g2c.h' file generation `{No Value For "srcdir"}/libf2c/libF77/' General support and math portion of `libg2c' `{No Value For "srcdir"}/libf2c/libI77/' I/O portion of `libg2c' `{No Value For "srcdir"}/libf2c/libU77/' Additional interfaces to Unix `libc' for `libg2c' Components of note in `g77' are described below. `f/' as a whole contains the source for `g77', while `libf2c/' contains a portion of the separate program `f2c'. Note that the `libf2c' code is not part of the program `g77', just distributed with it. `f/' contains text files that document the Fortran compiler, source files for the GNU Fortran Front End (FFE), and some other stuff. The `g77' compiler code is placed in `f/' because it, along with its contents, is designed to be a subdirectory of a `gcc' source directory, `gcc/', which is structured so that language-specific front ends can be "dropped in" as subdirectories. The C++ front end (`g++'), is an example of this--it resides in the `cp/' subdirectory. Note that the C front end (also referred to as `gcc') is an exception to this, as its source files reside in the `gcc/' directory itself. `libf2c/' contains the run-time libraries for the `f2c' program, also used by `g77'. These libraries normally referred to collectively as `libf2c'. When built as part of `g77', `libf2c' is installed under the name `libg2c' to avoid conflict with any existing version of `libf2c', and thus is often referred to as `libg2c' when the `g77' version is specifically being referred to. The `netlib' version of `libf2c/' contains two distinct libraries, `libF77' and `libI77', each in their own subdirectories. In `g77', this distinction is not made, beyond maintaining the subdirectory structure in the source-code tree. `libf2c/' is not part of the program `g77', just distributed with it. It contains files not present in the official (`netlib') version of `libf2c', and also contains some minor changes made from `libf2c', to fix some bugs, and to facilitate automatic configuration, building, and installation of `libf2c' (as `libg2c') for use by `g77' users. See `libf2c/README' for more information, including licensing conditions governing distribution of programs containing code from `libg2c'. `libg2c', `g77''s version of `libf2c', adds Dave Love's implementation of `libU77', in the `libf2c/libU77/' directory. This library is distributed under the GNU Library General Public License (LGPL)--see the file `libf2c/libU77/COPYING.LIB' for more information, as this license governs distribution conditions for programs containing code from this portion of the library. Files of note in `f/' and `libf2c/' are described below: `f/BUGS' Lists some important bugs known to be in g77. Or use Info (or GNU Emacs Info mode) to read the "Actual Bugs" node of the `g77' documentation: info -f f/g77.info -n "Actual Bugs" `f/ChangeLog' Lists recent changes to `g77' internals. `libf2c/ChangeLog' Lists recent changes to `libg2c' internals. `f/NEWS' Contains the per-release changes. These include the user-visible changes described in the node "Changes" in the `g77' documentation, plus internal changes of import. Or use: info -f f/g77.info -n News `f/g77.info*' The `g77' documentation, in Info format, produced by building `g77'. All users of `g77' (not just installers) should read this, using the `more' command if neither the `info' command, nor GNU Emacs (with its Info mode), are available, or if users aren't yet accustomed to using these tools. All of these files are readable as "plain text" files, though they're easier to navigate using Info readers such as `info' and GNU Emacs Info mode. If you want to explore the FFE code, which lives entirely in `f/', here are a few clues. The file `g77spec.c' contains the `g77'-specific source code for the `g77' command only--this just forms a variant of the `gcc' command, so, just as the `gcc' command itself does not contain the C front end, the `g77' command does not contain the Fortran front end (FFE). The FFE code ends up in an executable named `f771', which does the actual compiling, so it contains the FFE plus the `gcc' back end (GBE), the latter to do most of the optimization, and the code generation. The file `parse.c' is the source file for `yyparse()', which is invoked by the GBE to start the compilation process, for `f771'. The file `top.c' contains the top-level FFE function `ffe_file' and it (along with top.h) define all `ffe_[a-z].*', `ffe[A-Z].*', and `FFE_[A-Za-z].*' symbols. The file `fini.c' is a `main()' program that is used when building the FFE to generate C header and source files for recognizing keywords. The files `malloc.c' and `malloc.h' comprise a memory manager that defines all `malloc_[a-z].*', `malloc[A-Z].*', and `MALLOC_[A-Za-z].*' symbols. All other modules named XYZ are comprised of all files named `XYZ*.EXT' and define all `ffeXYZ_[a-z].*', `ffeXYZ[A-Z].*', and `FFEXYZ_[A-Za-z].*' symbols. If you understand all this, congratulations--it's easier for me to remember how it works than to type in these regular expressions. But it does make it easy to find where a symbol is defined. For example, the symbol `ffexyz_set_something' would be defined in `xyz.h' and implemented there (if it's a macro) or in `xyz.c'. The "porting" files of note currently are: `proj.c' `proj.h' This defines the "language" used by all the other source files, the language being Standard C plus some useful things like `ARRAY_SIZE' and such. `target.c' `target.h' These describe the target machine in terms of what data types are supported, how they are denoted (to what C type does an `INTEGER*8' map, for example), how to convert between them, and so on. Over time, versions of `g77' rely less on this file and more on run-time configuration based on GBE info in `com.c'. `com.c' `com.h' These are the primary interface to the GBE. `ste.c' `ste.h' This contains code for implementing recognized executable statements in the GBE. `src.c' `src.h' These contain information on the format(s) of source files (such as whether they are never to be processed as case-insensitive with regard to Fortran keywords). If you want to debug the `f771' executable, for example if it crashes, note that the global variables `lineno' and `input_filename' are usually set to reflect the current line being read by the lexer during the first-pass analysis of a program unit and to reflect the current line being processed during the second-pass compilation of a program unit. If an invocation of the function `ffestd_exec_end' is on the stack, the compiler is in the second pass, otherwise it is in the first. (This information might help you reduce a test case and/or work around a bug in `g77' until a fix is available.)  File: g77.info, Node: Overview of Translation Process, Next: Philosophy of Code Generation, Prev: Overview of Sources, Up: Front End Overview of Translation Process =============================== The order of phases translating source code to the form accepted by the GBE is: 1. Stripping punched-card sources (`g77stripcard.c') 2. Lexing (`lex.c') 3. Stand-alone statement identification (`sta.c') 4. Parsing (`stb.c' and `expr.c') 5. Constructing (`stc.c') 6. Collecting (`std.c') 7. Expanding (`ste.c') To get a rough idea of how a particularly twisted Fortran statement gets treated by the passes, consider: FORMAT(I2 4H)=(J/ & I3) The job of `lex.c' is to know enough about Fortran syntax rules to break the statement up into distinct lexemes without requiring any feedback from subsequent phases: `FORMAT' `(' `I24H' `)' `=' `(' `J' `/' `I3' `)' The job of `sta.c' is to figure out the kind of statement, or, at least, statement form, that sequence of lexemes represent. The sooner it can do this (in terms of using the smallest number of lexemes, starting with the first for each statement), the better, because that leaves diagnostics for problems beyond the recognition of the statement form to subsequent phases, which can usually better describe the nature of the problem. In this case, the `=' at "level zero" (not nested within parentheses) tells `sta.c' that this is an *assignment-form*, not `FORMAT', statement. An assignment-form statement might be a statement-function definition or an executable assignment statement. To make that determination, `sta.c' looks at the first two lexemes. Since the second lexeme is `(', the first must represent an array for this to be an assignment statement, else it's a statement function. Either way, `sta.c' hands off the statement to `stb.c' (either its statement-function parser or its assignment-statement parser). `stb.c' forms a statement-specific record containing the pertinent information. That information includes a source expression and, for an assignment statement, a destination expression. Expressions are parsed by `expr.c'. This record is passed to `stc.c', which copes with the implications of the statement within the context established by previous statements. For example, if it's the first statement in the file or after an `END' statement, `stc.c' recognizes that, first of all, a main program unit is now being lexed (and tells that to `std.c' before telling it about the current statement). `stc.c' attaches whatever information it can, usually derived from the context established by the preceding statements, and passes the information to `std.c'. `std.c' saves this information away, since the GBE cannot cope with information that might be incomplete at this stage. For example, `I3' might later be determined to be an argument to an alternate `ENTRY' point. When `std.c' is told about the end of an external (top-level) program unit, it passes all the information it has saved away on statements in that program unit to `ste.c'. `ste.c' "expands" each statement, in sequence, by constructing the appropriate GBE information and calling the appropriate GBE routines. Details on the transformational phases follow. Keep in mind that Fortran numbering is used, so the first character on a line is column 1, decimal numbering is used, and so on. * Menu: * g77stripcard:: * lex.c:: * sta.c:: * stb.c:: * expr.c:: * stc.c:: * std.c:: * ste.c:: * Gotchas (Transforming):: * TBD (Transforming)::  File: g77.info, Node: g77stripcard, Next: lex.c, Up: Overview of Translation Process g77stripcard ------------ The `g77stripcard' program handles removing content beyond column 72 (adjustable via a command-line option), optionally warning about that content being something other than trailing whitespace or Fortran commentary. This program is needed because `lex.c' doesn't pay attention to maximum line lengths at all, to make it easier to maintain, as well as faster (for sources that don't depend on the maximum column length vis-a-vis trailing non-blank non-commentary content). Just how this program will be run--whether automatically for old source (perhaps as the default for `.f' files?)--is not yet determined. In the meantime, it might as well be implemented as a typical UNIX pipe. It should accept a `-fline-length-N' option, with the default line length set to 72. When the text it strips off the end of a line is not blank (not spaces and tabs), it should insert an additional comment line (beginning with `!', so it works for both fixed-form and free-form files) containing the text, following the stripped line. The inserted comment should have a prefix of some kind, TBD, that distinguishes the comment as representing stripped text. Users could use that to `sed' out such lines, if they wished--it seems silly to provide a command-line option to delete information when it can be so easily filtered out by another program. (This inserted comment should be designed to "fit in" well with whatever the Fortran community is using these days for preprocessor, translator, and other such products, like OpenMP. What that's all about, and how `g77' can elegantly fit its special comment conventions into it all, is TBD as well. We don't want to reinvent the wheel here, but if there turn out to be too many conflicting conventions, we might have to invent one that looks nothing like the others, but which offers their host products a better infrastructure in which to fit and coexist peacefully.) `g77stripcard' probably shouldn't do any tab expansion or other fancy stuff. People can use `expand' or other pre-filtering if they like. The idea here is to keep each stage quite simple, while providing excellent performance for "normal" code. (Code with junk beyond column 73 is not really "normal", as it comes from a card-punch heritage, and will be increasingly hard for tomorrow's Fortran programmers to read.)  File: g77.info, Node: lex.c, Next: sta.c, Prev: g77stripcard, Up: Overview of Translation Process lex.c ----- To help make the lexer simple, fast, and easy to maintain, while also having `g77' generally encourage Fortran programmers to write simple, maintainable, portable code by maximizing the performance of compiling that kind of code: * There'll be just one lexer, for both fixed-form and free-form source. * It'll care about the form only when handling the first 7 columns of text, stuff like spaces between strings of alphanumerics, and how lines are continued. Some other distinctions will be handled by subsequent phases, so at least one of them will have to know which form is involved. For example, `I = 2 . 4' is acceptable in fixed form, and works in free form as well given the implementation `g77' presently uses. But the standard requires a diagnostic for it in free form, so the parser has to be able to recognize that the lexemes aren't contiguous (information the lexer *does* have to provide) and that free-form source is being parsed, so it can provide the diagnostic. The `g77' lexer doesn't try to gather `2 . 4' into a single lexeme. Otherwise, it'd have to know a whole lot more about how to parse Fortran, or subsequent phases (mainly parsing) would have two paths through lots of critical code--one to handle the lexeme `2', `.', and `4' in sequence, another to handle the lexeme `2.4'. * It won't worry about line lengths (beyond the first 7 columns for fixed-form source). That is, once it starts parsing the "statement" part of a line (column 7 for fixed-form, column 1 for free-form), it'll keep going until it finds a newline, rather than ignoring everything past a particular column (72 or 132). The implication here is that there shouldn't *be* anything past that last column, other than whitespace or commentary, because users using typical editors (or viewing output as typically printed) won't necessarily know just where the last column is. Code that has "garbage" beyond the last column (almost certainly only fixed-form code with a punched-card legacy, such as code using columns 73-80 for "sequence numbers") will have to be run through `g77stripcard' first. Also, keeping track of the maximum column position while also watching out for the end of a line *and* while reading from a file just makes things slower. Since a file must be read, and watching for the end of the line is necessary (unless the typical input file was preprocessed to include the necessary number of trailing spaces), dropping the tracking of the maximum column position is the only way to reduce the complexity of the pertinent code while maintaining high performance. * ASCII encoding is assumed for the input file. Code written in other character sets will have to be converted first. * Tabs (ASCII code 9) will be converted to spaces via the straightforward approach. Specifically, a tab is converted to between one and eight spaces as necessary to reach column N, where dividing `(N - 1)' by eight results in a remainder of zero. * Linefeeds (ASCII code 10) mark the ends of lines. * A carriage return (ASCII code 13) is accept if it immediately precedes a linefeed, in which case it is ignored. Otherwise, it is rejected (with a diagnostic). * Any other characters other than the above that are not part of the GNU Fortran Character Set (*note Character Set::.) are rejected with a diagnostic. This includes backspaces, form feeds, and the like. (It might make sense to allow a form feed in column 1 as long as that's the only character on a line. It certainly wouldn't seem to cost much in terms of performance.) * The end of the input stream (EOF) ends the current line. * The distinction between uppercase and lowercase letters will be preserved. It will be up to subsequent phases to decide to fold case. Current plans are to permit any casing for Fortran (reserved) keywords while preserving casing for user-defined names. (This might not be made the default for `.f' files, though.) Preserving case seems necessary to provide more direct access to facilities outside of `g77', such as to C or Pascal code. Names of intrinsics will probably be matchable in any case, However, there probably won't be any option to require a particular mixed-case appearance of intrinsics (as there was for `g77' prior to version 0.6), because that's painful to maintain, and probably nobody uses it. (How `external SiN; r = sin(x)' would be handled is TBD. I think old `g77' might already handle that pretty elegantly, but whether we can cope with allowing the same fragment to reference a *different* procedure, even with the same interface, via `s = SiN(r)', needs to be determined. If it can't, we need to make sure that when code introduces a user-defined name, any intrinsic matching that name using a case-insensitive comparison is "turned off".) * Backslashes in `CHARACTER' and Hollerith constants are not allowed. This avoids the confusion introduced by some Fortran compiler vendors providing C-like interpretation of backslashes, while others provide straight-through interpretation. Some kind of lexical construct (TBD) will be provided to allow flagging of a `CHARACTER' (but probably not a Hollerith) constant that permits backslashes. It'll necessarily be a prefix, such as: PRINT *, C'This line has a backspace \b here.' PRINT *, F'This line has a straight backslash \ here.' Further, command-line options might be provided to specify that one prefix or the other is to be assumed as the default for `CHARACTER' constants. However, it seems more helpful for `g77' to provide a program that converts prefix all constants (or just those containing backslashes) with the desired designation, so printouts of code can be read without knowing the compile-time options used when compiling it. If such a program is provided (let's name it `g77slash' for now), then a command-line option to `g77' should not be provided. (Though, given that it'll be easy to implement, it might be hard to resist user requests for it "to compile faster than if we have to invoke another filter".) This program would take a command-line option to specify the default interpretation of slashes, affecting which prefix it uses for constants. `g77slash' probably should automatically convert Hollerith constants that contain slashes to the appropriate `CHARACTER' constants. Then `g77' wouldn't have to define a prefix syntax for Hollerith constants specifying whether they want C-style or straight-through backslashes. The above implements nearly exactly what is specified by *Note Character Set::, and *Note Lines::, except it also provides automatic conversion of tabs and ignoring of newline-related carriage returns. It also effects the "pure visual" model, by which is meant that a user viewing his code in a typical text editor (assuming it's not preprocessed via `g77stripcard' or similar) doesn't need any special knowledge of whether spaces on the screen are really tabs, whether lines end immediately after the last visible non-space character or after a number of spaces and tabs that follow it, or whether the last line in the file is ended by a newline. Most editors don't make these distinctions, the ANSI FORTRAN 77 standard doesn't require them to, and it permits a standard-conforming compiler to define a method for transforming source code to "standard form" however it wants. So, GNU Fortran defines it such that users have the best chance of having the code be interpreted the way it looks on the screen of the typical editor. (Fancy editors should *never* be required to correctly read code written in classic two-dimensional-plaintext form. By correct reading I mean ability to read it, book-like, without mistaking text ignored by the compiler for program code and vice versa, and without having to count beyond the first several columns. The vague meaning of ASCII TAB, among other things, complicates this somewhat, but as long as "everyone", including the editor, other tools, and printer, agrees about the every-eighth-column convention, the GNU Fortran "pure visual" model meets these requirements. Any language or user-visible source form requiring special tagging of tabs, the ends of lines after spaces/tabs, and so on, is broken by this definition. Fortunately, Fortran *itself* is not broken, even if most vendor-supplied defaults for their Fortran compilers *are* in this regard.) Further, this model provides a clean interface to whatever preprocessors or code-generators are used to produce input to this phase of `g77'. Mainly, they need not worry about long lines.  File: g77.info, Node: sta.c, Next: stb.c, Prev: lex.c, Up: Overview of Translation Process sta.c -----  File: g77.info, Node: stb.c, Next: expr.c, Prev: sta.c, Up: Overview of Translation Process stb.c -----  File: g77.info, Node: expr.c, Next: stc.c, Prev: stb.c, Up: Overview of Translation Process expr.c ------  File: g77.info, Node: stc.c, Next: std.c, Prev: expr.c, Up: Overview of Translation Process stc.c -----  File: g77.info, Node: std.c, Next: ste.c, Prev: stc.c, Up: Overview of Translation Process std.c -----  File: g77.info, Node: ste.c, Next: Gotchas (Transforming), Prev: std.c, Up: Overview of Translation Process ste.c -----