Node:GPI files, Next:Automake, Previous:Parameter passing, Up:Internals
This section documents the mechanism how GPC transfers information from the exporting modules and units to the program, module or unit which imports (uses) the information.
A GPI file contains a precompiled GNU Pascal interface. "Precompiled" means in this context that the interface already has been parsed (i.e. the front-end has done its work), but that no assembler output has been produced yet.
The GPI file format is an implementation-dependent (but not too implementation-dependent ;-) file format for storing GNU Pascal interfaces to be exported - Extended Pascal and PXSC module interfaces as well as interface parts of UCSD/Borland Pascal units compiled with GNU Pascal.
To see what information is stored in or loaded from a GPI file, run
GPC with an additional command-line option --debug-gpi
. Then,
GPC will write a human-readable version of what is being
stored/loaded to the standard error file handle. (See also:
Tree nodes.) Note: This will usually produce
huge amounts of output!
While parsing an interface, GPC stores the names of exported objects
in tree lists - look for handle_autoexport
in the GPC source
files. At the end of the interface, everything is stored in one or
more GPI files. This is done in module.c
. There you can find
the source of create_gpi_files()
which documents the file
format:
First, a header of 33 bytes containing the string GNU Pascal
unit/module interface
plus a newline.
This is followed by an integer containing the "magic" value
12345678 (hexadecimal) to carry information about the endianness.
Note that, though a single GPI file is always specific to a
particular target architecture, the host architecture (i.e., the
system on which GPC runs) can be different (cross-compilers).
Currently, GPC is not able to convert endianness in GPI files "on
the fly", but at least it will detect and reject GPI files with the
"wrong" endianness. When writing GPI files, always the host's
endianness is used (this seems to be a good idea even when
converting on the fly will be supported in the future, since most
often, GPI files created by a cross-compiler will be read again by
the same cross-compiler). "Integer" here and in the following
paragraphs means a gpi_int
(which is currently defined as
HOST_WIDE_INT
).
The rest of the GPI file consists of chunks. Each chunk starts with a one-byte code that describes the type of the chunk. It is followed by an integer that describes the size of the chunk (excluding this chunk header). The further contents depend on the type, as listed below.
For the numeric values of the chunk type codes, please refer to
GPI_CHUNKS
in module.c
. Chunk types denoted with
(*)
must occur exactly once in a GPI file. Other types may
occur any number of times (including zero times). The order of
chunks is arbitrary. "String" here simply means a character
sequence whose length is the chunk's length (so no terminator is
needed).
GPI_CHUNK_VERSION
(String) (*)
USE_GPI_DEBUG_KEY
is used (which will insert a "magic"
value at the beginning of each node in the node table, see below, so
errors in GPI files will be detected more reliably), D
is
appended to this version string. (Currently,
USE_GPI_DEBUG_KEY
is used by default.) Furthermore, the GCC
backend version is appended, since it also influences GPI files.
GPI_CHUNK_TARGET
(String) (*)
GPI_CHUNK_MODULE_NAME
(String) (*)
GPI_CHUNK_SRCFILE
(String) (*)
GPI_CHUNK_IMPORT
The checksum is currently a simple weighted sum over the contents of
the GPI_CHUNK_NODES
chunk's contents (see below). This might
be replaced in the future by a MD5 hash or something else more
elaborate.
GPI_CHUNK_LINK
(String)
GPI_CHUNK_LIB
(String)
-l
).
GPI_CHUNK_INITIALIZER
(String)
GPI_CHUNK_MODULE_NAME
chunk.
GPI_CHUNK_GPC_MAIN_NAME
(String)
gpc-main
option given in this interface. (More than one
occurrence is pointless.)
GPI_CHUNK_NODES
(*)
../tree.h
and ../tree.def
from the GNU compiler back-end. (See also:
Tree nodes.)
The main problem when storing tree nodes is that they form a complicated structure in memory with a lot of circular references (actually, not a tree, but a directed graph in the usual terminology, so the name "tree nodes" is actually a misnomer), so the storing mechanism must make sure that nothing is stored multiple times.
The functions load_node()
and store_node_fields()
do
the main work of loading/storing the contents of a tree node with
references to all its contained pointers in a GPI file. Each tree
node has a TREE_CODE
indicating what kind of information it
contains. Each kind of tree nodes must be stored in a different way
which is not described here. See the source of these functions for
details.
As most tree nodes contain pointers to other tree nodes,
load_node()
is an (indirectly) recursive function. Since this
recursion can be circular (think of a record containing a pointer to
a record of the same type), we must resolve references to tree nodes
which already have been loaded. For this reason, all tree nodes
being loaded are kept in a table (rb.nodes
). They are entered
there before all their fields have been loaded (because
loading them is what causes the recursion). So the table contains
some incomplete nodes during loading, but at the end of loading a
GPI file, they have all been completed.
On the other hand, for store_node_fields()
the (seeming)
recursion must be resolved to an iterative process so that the
single tree nodes are stored one after another in the file, and not
mixed together. This is the job of store_tree()
. It uses a
hash table (see get_node_id()
) for efficiency.
When re-exporting (directly or indirectly) a node that was imported from another interface, and a later compiler run imports both interfaces, it must merge the corresponding nodes loaded from both interfaces. Otherwise it would get only similar, but not identical items. However, we cannot simply omit the re-exported nodes from the new interface in case a later compiler run imports only one of them. The same problem occurs when a module exports several interfaces. In this case, a program that imports more than one of them must recognize their contents as identical where they overlap.
Therefore, each node in a GPI file is prefixed (immediately before
its tree code) with information about the interface it was
originally imported from or stored in first. This information is
represented as a reference to an INTERFACE_NAME_NODE
followed
by the id (as an integer) of the node in that interface. If the node
is imported again and re-re-exported, this information is copied
unchanged, so it will always refer to the interface the node was
originally contained it. For nodes that appear in an interface for
the first time (the normal case), a single 0 integer is stored
instead of interface INTERFACE_NAME_NODE
and id (for
shortness, since this information is implicit).
This mechanism is not applied to INTERFACE_NAME_NODE
s since
there would be a problem when the identifier they represent is the
name of the interface they come from; neither to
IDENTIFIER_NODE
s because they are handled somewhat specially
by the backend (e.g., they contain fields like
IDENTIFIER_VALUE
which depend on the currently active
declarations, so storing and loading them in GPI files would be
wrong) because there is only one IDENTIFIER_NODE
ever made
for any particular name. But for the same reason, it is no problem
that the mechanism can't be applied to them.
INTERFACE_NAME_NODE
s are a special kind of tree nodes, only
used for this purpose. They contain the name of the interface, the
name of the module (to detect the unlikely case that different
modules have interfaces of the same name which otherwise might
confuse GPC), and the checksum of that interface. The latter may
seem redundant with the checksum stored in the
GPI_CHUNK_IMPORT
chunk, but in fact it is not. On the one
hand, GPI_CHUNK_IMPORT
chunks occur only for interfaces
imported directly, while the INTERFACE_NAME_NODE
mechanism
might also refer to interfaces imported indirectly. On the other
hand, storing the checksum in the GPI_CHUNK_IMPORT
chunks
allows the automake mechanism to detect discrepancies and force
recompilation of the imported module, whereas during the handling of
the GPI_CHUNK_NODES
chunk, the imported modules must already
have been loaded. (It would be possible to scan the
GPI_CHUNK_NODES
chunk while deciding whether to recompile,
but that would be a lot of extra effort, compared to storing the
checksum in the GPI_CHUNK_IMPORT
chunks.)
Finally, at the end of the GPI_CHUNK_NODES
chunk, a checksum
of its own contents (excluding the checksum itself, of course) is
appended. This is to detect corrupted GPI files and is independent
of the other uses of checksums.
GPI_CHUNK_OFFSETS
(*)
integer_type_node
or NULL_TREE
) which are used very
often and have fixed meanings. They have been assigned predefined
ids, so they don't have to be stored in the GPI file at all. Their
number and values are fixed (but may change between different GPC
versions), see SPECIAL_NODES
in module.c
.
For the remaining nodes, the GPI_CHUNK_OFFSETS
table contains
the file offsets as integers where they are stored within the (only)
GPI_CHUNK_NODES
chunk. The offsets are relative to the start
of that chunk, i.e. after the chunk header. After the table (but
still in this chunk) the id of the main node which contains the list
of all exported names is stored as an integer. (Currently, this is
always the last node, but for the file format definition, this is
not guaranteed.)
GPI_CHUNK_IMPLEMENTATION
That's it. Now you should be able to "read" GPI files using GPC's
--debug-gpi
option. There is also a utility
gpidump.pas
in the utils
directory to decode and show
the contents of GPI files. It does also some amount of integrity
checking (a little more than GPC does while loading GPI files), so
if you suspect a problem with GPI files, you might want to run
gpidump
on them, discarding its standard output (it writes
all error reports to standard error, of course).
If you encounter a case where the loaded information differs too
much from the stored information, you have found a bug -
congratulations! What "too much" means, depends on the object
being stored in or loaded from the GPI file. Remember that the order
things are loaded from a GPI file is the reversed order
things are stored when considering different recursion
levels, but the same order when considering the same
recursion level. (This is important when using --debug-gpi
;
with gpidump
you can read the file in any order you like.)