Declare-Before-Use and Forward Declarations
The fixup mechanism of article A194 solves the forward-jump problem at the machine-code level. A parallel problem appears at the identifier level. When a source program references a function, a type, or a variable before the referenced entity has been declared in the source text, the compiler faces the same question that the forward-jump problem posed. How can the compiler emit code that references an identifier whose meaning is not yet known?
Two solutions are possible. The first solution is to defer identifier resolution until the whole program has been parsed, at which point all identifiers are known and all references can be resolved. This is the multi-pass discipline. It admits forward references freely at the source level and pays the cost in whole-program state and non-linear compilation that article A189 discussed as the boundary between single-pass and multi-pass architectures.
The second solution is to require the source language to declare every identifier before it is used. The compiler that reads the source in a single forward pass finds each identifier already in its symbol table by the time the identifier’s use appears. Forward identifier references are syntactically forbidden. This is the declare-before-use discipline that Wirth adopted across his language line and that this article develops as the second technique of the trio.
Declare-Before-Use as a Language Design Rule
The declare-before-use rule states that every identifier used in a source expression must have been declared at an earlier point in the source file. The rule applies uniformly to types, constants, variables, functions, procedures, and modules. The rule is a lexical property of the source text, not a semantic property of the compiled program. A program that uses an identifier whose declaration appears later in the source is rejected at compile time regardless of whether the program would compute a correct result if the compiler could resolve the reference.
The rule has two variants. The strict variant, applied in Wirth’s Pascal and Modula-2, requires that the declaration precede the use in the source file’s top-to-bottom order. The scoped variant, applied in some later languages, requires that the declaration precede the use within the current lexical scope but permits outer scopes to be extended after inner scopes that reference them are parsed. This article concentrates on the strict variant because it is the version that most directly enables single-pass compilation.
Why Declare-Before-Use Enables Single-Pass Compilation
At any point $i$ in the source text, the compiler maintains a symbol table $T_i$ containing all identifiers that have been declared in the source up to position $i$. The current-scope symbol table grows monotonically at the top level as the compiler advances through the source. Formally, for positions $i$ inside the outermost scope,
\[T_i \subseteq T_{i+1} \qquad \text{for all } i.\]Inside nested scopes, the symbol table grows and shrinks as inner scopes open and close. Article A196 describes the scope-popping discipline that returns the nested-scope portion to the enclosing scope’s state whenever an inner block closes. The monotonic growth of this article applies to the outermost scope, which holds top-level declarations and does not pop during compilation.
The declare-before-use rule states that every identifier used at position $i$ is already in $T_i$. The compiler resolves the reference by looking up the identifier in $T_i$. Identifier resolution therefore requires only the current symbol table state, not future symbol table state. Formally,
\[\operatorname{resolve}(\text{id}, i) = \operatorname{lookup}(\text{id}, T_i).\]The resolution cost is the cost of the lookup operation, which depends on the symbol table’s data structure
\[C_{\text{resolve}}(\text{id}, i) = \begin{cases} O(1) & \text{hash-indexed table,} \\ O(\log \lvert T_i \rvert) & \text{tree-indexed table.} \end{cases}\]The lookup does not require inspecting any code past position $i$. The compiler does not maintain any whole-program state to support identifier resolution.
The alternative, under which forward references are allowed at the source level, requires the compiler to either defer resolution until a whole-program pass completes or maintain a pending-reference queue that is checked at each new declaration. Both alternatives add whole-program state and increase per-identifier cost. Neither alternative is compatible with the strict stream-processor discipline that this series develops.
Forward Declarations for Mutual Recursion
The declare-before-use rule has one immediate consequence that Wirth’s languages address through a syntactic escape hatch. Two procedures that call each other mutually recursively cannot both satisfy declare-before-use. At least one call must precede the callee’s declaration.
Wirth resolved this
by requiring
an explicit
forward declaration
that
introduces
the procedure’s signature
into
the symbol table
before
the procedure body appears.
The syntax
in Pascal
uses the keyword
forward
in place of
the procedure body.
The compiler
treats the forward declaration
as
a promise
that
the body
will appear later
in the source.
The forward-declared procedure occupies one entry in the symbol table starting at the position of the forward declaration. The full body, when it appears, does not add a new symbol table entry. It resolves the forward declaration by providing the compiled code for the promised body.
Formally, the symbol table entry for a forward-declared procedure carries a status flag that distinguishes the two states.
\[T_i(\text{id}) \in \{\text{forward-declared}, \text{defined}\}.\]The status transitions at the point the body appears. A call site that uses the identifier during the forward-declared phase generates a code emission that references the procedure’s future address through the same fixup mechanism that article A194 developed for forward jumps.
A Worked Example, Mutual Recursion in Pascal
Consider two procedures $P$ and $Q$ that call each other mutually. The Pascal code uses a forward declaration to satisfy declare-before-use.
PROCEDURE Q(x: integer); FORWARD;
PROCEDURE P(x: integer);
BEGIN
IF x > 0 THEN
Q(x - 1)
END;
PROCEDURE Q(x: integer);
BEGIN
IF x > 0 THEN
P(x - 1)
END;
The compiler processes this source in four phases.
Phase 1: forward declaration of Q.
The compiler
parses
PROCEDURE Q(x: integer); FORWARD;
and adds
the symbol table entry
The entry carries the procedure’s signature but no compiled address yet. The address will be supplied when the body appears.
Phase 2: body of P.
The compiler
parses the body of $P$.
Inside the body,
the call Q(x - 1)
looks up
$Q$
in the symbol table
and finds it
in
the forward-declared state.
The compiler
emits
a call instruction
with
a placeholder
for
$Q$’s eventual address
and
adds
a fixup entry
for
this placeholder.
At the end of
$P$’s body,
$P$ receives
its own
symbol table entry
in the defined state.
Phase 3: body of Q.
The compiler
parses the body of $Q$.
The call P(x - 1)
looks up
$P$
in the symbol table
and finds it
in
the defined state
because $P$
was compiled
in phase 2.
The call instruction
receives
$P$’s known address
directly,
with no
fixup entry needed.
At the end of
$Q$’s body,
the compiler
resolves
$Q$’s forward declaration
by
supplying
$Q$’s
now-known address.
All fixup entries
that
reference $Q$
receive
their patched addresses.
Phase 4: end of compilation. The compilation completes with both procedures in the defined state and all fixup entries retired.
The example demonstrates that declare-before-use combined with forward declarations handles mutual recursion without any whole-program pass. The forward declaration supplies the promise that the identifier will have a defined meaning by the end of the compilation unit. The fixup mechanism supplies the mechanical bridge between the call site that emits the reference and the definition site that eventually supplies the referent’s address. The two techniques compose naturally.
The Programmer’s Cost
The declare-before-use rule imposes a non-trivial burden on the programmer. Source code must be organised so that declarations appear before uses. For programs with substantial cross-referencing between procedures or types, the ordering is not always obvious. Programmers must either plan the source layout carefully or use forward declarations liberally.
The Wirth tradition argued that the discipline was itself a design benefit. A source file that respects declare-before-use reads naturally from top to bottom. Every reference resolves against material that the reader has already encountered. The reader never needs to scan forward to find a definition. The argument holds some empirical weight for academic programs and for carefully organised industrial code. The argument holds less weight for rapidly evolving industrial code where programmer time dominates compiler time.
Modern languages
have varied
in
their response
to
this trade-off.
Some languages,
including
Pascal,
Modula-2,
Oberon,
and
older C dialects,
maintain
the strict discipline.
Others,
including
Java,
Python,
Go,
and
modern C dialects
with
appropriate
forward declarations,
allow
scoped forward references
within
compilation units.
Still others,
including
early
functional languages
like ML
and
Haskell,
require
explicit
let rec
or equivalent
constructs
for
mutually recursive definitions
but
otherwise
follow
the declare-before-use rule.
Interaction with Fixup Tables
Declare-before-use and fixup tables together form the essential pair of techniques that single-pass compilers use to handle forward references. Fixup tables handle forward references to code addresses. Declare-before-use handles forward references to identifiers. The forward-declaration escape hatch bridges the two mechanisms by allowing an identifier to be introduced into the symbol table before its code address is known, and using the fixup mechanism to resolve the code address when the body appears.
Formally, if $\iota$ is an identifier declared as forward at position $p_1$ and defined at position $p_2$ with $p_1 < p_2$, then between positions $p_1$ and $p_2$, each use of $\iota$ at position $i$ generates a fixup entry with target identity $\iota$,
\[T_i(\iota) = \text{forward-declared} \implies F' = F \cup \{(a_i, \iota)\},\]where $a_i$ is the placeholder position in the emitted output for the reference at $i$. At position $p_2$, when the body of $\iota$ completes emission, the compiler resolves all fixup entries whose target identity is $\iota$ against the body’s start address,
\[\operatorname{resolve}(\iota, a_{\iota\text{-body}}) \quad \text{at position } p_2,\]and the symbol table entry transitions to the defined state.
The composition of the two mechanisms handles mutual recursion in strict single-pass compilation without either whole-program state or non-linear compilation cost.
Modern Language Choices
Contemporary language designers face the same trade-off that Wirth encountered. Strict declare-before-use enables single-pass compilation and simple compiler implementation at the cost of programmer flexibility in source organisation. Free forward references give the programmer freedom at the cost of compiler complexity and compilation-cost linearity.
Several contemporary languages have adopted positions between the two extremes.
Go requires declare-before-use within a single function body but permits forward references between top-level declarations in the same package. The Go compiler performs a preliminary pass over the top-level declarations to collect identifier signatures, then compiles the bodies in a subsequent pass that resolves references against the collected signatures. The overall shape is cheaper than a full multi-pass architecture because the preliminary pass does not need to type-check bodies or emit code.
Rust allows forward references freely within a module. The compiler performs name resolution as a distinct phase before type checking and code generation. Rust’s compilation model does not attempt strict single-pass compilation.
Zig permits declaration order to be arbitrary within a container. Compile-time evaluation uses lazy evaluation with dependency tracking, so that a compile-time expression is evaluated only when its result is needed and after any expressions it depends on have completed. Runtime code is compiled after compile-time evaluation completes for the declarations it depends on.
WebAssembly at the bytecode level uses a declare-signatures-before-any-body discipline. Every function reference in a WebAssembly module uses a function index that must be statically resolvable. The binary format places the function-index space in sections that precede the code section, so all function signatures are known before any function body is verified or compiled. Forward references between functions are resolved through the function index space, which is established before any function body is compiled. This is a specific form of the declare-before-use rule adapted to the requirements of a portable bytecode format.
Header Files and Separate Compilation
The declare-before-use discipline extends across compilation-unit boundaries through the mechanism of declaration files. In C and C-plus-plus, declaration files carry the header extension and contain declarations without implementations. A source file that uses symbols defined in another compilation unit includes the declaration file for that unit, which introduces the symbol declarations into the current compilation unit’s symbol table before the uses appear.
Modula-2 formalised this discipline through the definition-implementation module split. A Modula-2 module consists of a definition module and an implementation module. The definition module contains the exported declarations, which is what importing modules see. The implementation module contains the bodies. The two modules are separate files and can be compiled independently. The importer’s compilation depends only on the definition module, not on the implementation module, which allows the importer’s compilation to proceed before the exporter’s implementation is complete.
Oberon inherited this discipline in a simpler form through the exported-marker convention. A single module file contains both declarations and implementations, and exports are marked with a single asterisk after the exported identifier. The compiler, when importing a module, scans the exported declarations without reading the implementations. The imported module’s declarations appear in the importing module’s symbol table before any use.
Modern module systems in Rust, Zig, and recent C-plus-plus modules continue this pattern. Each module publishes a declaration interface that importing modules see. The interface supports declare-before-use across module boundaries without requiring the importer to have access to the exporter’s implementation.
Conclusion
Declare-before-use is the language-design lever that brings identifier resolution into the single-pass discipline. The rule requires the source to declare every identifier before its use, which reduces identifier resolution to a lookup in the current symbol table. Forward declarations provide the syntactic escape hatch that allows mutual recursion without compromising the rule. The combination of declare-before-use, forward declarations, and the fixup mechanism of article A194 handles every forward reference that a block-structured source language admits, without requiring whole-program state or non-linear compilation cost. Article A196 covers the third technique of the trio, namely the symbol table data structure itself and the scope-popping discipline that keeps the compiler’s working memory bounded.
References
Book
- Compiler Construction, Niklaus Wirth, Addison-Wesley, 1996
- Project Oberon, The Design of an Operating System and Compiler, Niklaus Wirth and Jürg Gutknecht, revised edition 2013
Reference
- Pascal programming language declaration ordering
- Modula-2 definition and implementation module split
- Go language declaration ordering
- Rust name resolution model
- Zig compile-time ordering
Related Post
- Compilation as a Streaming Discipline, article A188 in this series
- Wirth’s Single-Pass Line, PL/0 through Oberon, article A189 in this series
- Fixup Tables and the Forward-Jump Problem, article A194 in this series