Table of Contents
This chapter is work in progress. Not all parts have been finished yet. The latest revision of this manual may contain more material. Refer to the online version.
The GPP package is a tool suite for generic pretty-printing. GPP supports pretty-printing of parse-trees in the AsFix format with comment preservation and of abstract syntax trees. GPP supports the output formats plain text, LaTeX, and HTML. Formattings are defined in pretty print tables, which can be generated from SDF syntax definitions.
The Box language is used in the GPP framework as a language-independent intermediate representation. The input language dependent parts and the output format dependent parts of GPP are connected through this intermediate representation.
Box is a mark-up language to describe the intended layout of text
and is used in pretty print
tables. A Box expression is constructed by composing
sub-boxes using Box operators. These operators specify the
relative ordering of boxes. Examples of Box operators are the
H
and V
operator which
format boxes horizontally and vertically, respectively.
The exact formatting of each Box operator can be customized using
Box options. For example, to control the horizontal layout between
boxes the H
operator supports the
hs
space option.
For a detailed description of Box (including a description of all available Box operators) we refer to To Reuse Or To Be Reused (Chapter 4)
Pretty-print tables are used to specify how language constructs have to be pretty-printed. Pretty-print tables are used in combination with GPP front-ends, such ast2abox.
Pretty-print tables use Box as language to specify formatting of language constructs. A pretty-print table contains mappings from constructor names to Box expressions. For example, for the SDF production
Exp "+" Exp -> Exp {cons("Plus")}
A pretty-print entry looks like:
Plus -- H hs=1 [ _1 "+" _2]
Pretty-print tables are ordered such that pretty-print rules occuring first take preceedence over overlapping pretty-print rules defined later. The syntax of pretty-print tables is available in GPP.
Pretty-print tables can be generated from SDF syntax definitions using ppgen. Generated pretty-print rules can easiliy be customized by overruling them in additional pretty-print tables. The tool pptable-diff notifies inconsistensies in pretty-print tables after the syntax definition has changed and can be used to bring inconsistent table up-to-date.
To be able to specify formattings for all nested constructs that are allowed in SDF productions, so called selectors are used in pretty-print tables to refer to specific parts of an SDF production and to define a formatting for them. For example, the SDF prodcution
"return" Exp? ";" -> Stm {cons("Return")}
contains a nested symbol A?. To specify a formatting for this production, two pretty-print entries can be used:
Return -- H [ KW["return"] _1 ";" ] Return.1:opt -- H [ _1 ]
A selector consists of a constructor name followed by a list of number+type tuples. A number selects a particular subtree of a constructor application, the type denotes the type of the selected construct (sequence, optional, separated list etc.). This rather verbose selector mechanism allows unambiguous selection of subtrees. Its verbosity (by specifying both the number of a subtree and its type), makes pretty-print tables easier to understand and, more importantly, it enables pretty-printing of AST's, because with type information, a concrete term can be correctly recontructed from an AST. Pretty-print tables can thus be used for both formatting of parse-trees and AST's.
Below we summarize which selector types are available:
For SDF optionals S?
For non-empty SDF lists S+
For possible empty SDF lists S*
For SDF separator lists {S1
S2}+
. Observe that the
symbol S1
and the
separator S2
are
ordinary subtrees of the iter-sep construct which can be
refered to as first and second subtree, respectively.
For SDF separator lists {S1
S2}*
. Its symbol
S1
and separator
S2
can be refered to
as first and second sub tree.
For SDF alternatives S1 |
S2 |
S3
. According to the SDF
syntax, alternatives are binary operators. The
pretty-printer flattens all subsequent
alternatives. Pretty-print rules can be specified for
each alternative individually by specifying the number
of each alternative. To be able to format literals in
alternative, a special formatting rule can be defined
for the construct (See the examples below).
For SDF alternatives (S1
S2
S3)
.
Below we list a simple SDF module with productions containing all above rule selectors.
module Symbols exports context-free syntax A? -> S {cons("ex1")} A+ -> S {cons("ex2")} A* -> S {cons("ex3")} {S1 S2}+ -> S {cons("ex4")} {S1 S2}* -> S {cons("ex5")} "one" | "two" | S? -> S {cons("ex6")} ("one" "two" S? ) -> S {cons("ex7")}
The following pretty-print table shows which pretty-print rules can be defined for this syntax:
[ ex1 -- _1, ex1.1:opt -- _1, ex2 -- _1, ex2.1:iter -- _1, ex3 -- _1, ex3.1:iter-star -- _1, ex4 -- _1, ex4.1:iter-sep -- _1 _2, ex5 -- _1, ex5.1:iter-star-sep -- _1 _2, ex6 -- _1, ex6.1:alt -- KW["one"] KW["two"] _1, ex6.1:alt.1:opt -- _1, ex7 -- _1, ex7.1:seq -- KW["one"] KW["two"] _1, ex7.1:seq.1:opt -- _1 ]
The pretty-print rule ex6.1:alt
is a special
case. It contains three Box expressions, one for each
alternative. It is used to specify a formatting for the
non-nested literals "one"
and
"two"
. During pretty-printing one of the three Box
expressions is selected, depending on alternative contained the
term to format.
In this section, we explain how you can generate a tool that restores all parentheses at the places where necessary according to the priorities and associativities of a language.
In this section, we explain how you can use Box in Stratego to implement a pretty-printer by hand.
After transformation, an abstract syntax tree should be turned into text again to be useful as a program. Mapping a tree into text can be seen as the inverse as parsing, and is thus called unparsing. When an unparser makes an attempt at producing human readable, instead of just compiler parsable, program text, an unparser is called a pretty-printer. We use the pretty-printing model as provided by the Generic Pretty-Printing package GPP. In this model a tree is unparsed to a Box expression, which contains text with markup for pretty-printing. A Box expression can be interpreted by different back-ends to produce text for different displaying devices, such as plain ASCII text, HTML, and LaTeX.
Unparsing is the inverse of parsing composed with abstract syntax tree composition. That is, an unparser turns an abstract syntax tree into a string, such that if the resulting string is parsed again, it produces the same abstract syntax tree.
An unparser can be organized in two phases. In the first phase,
each node in an abstract syntax tree is replaced with the concrete
syntax tree of the corresponding grammar production. In the
second phase, the strings at the leaves of the tree are
concatenated into a string. Figure 9.1
illustrates this process. The replacement of abstract syntax tree
nodes by concrete syntax patterns should be done according to the
productions of the syntax definition. An unparsing table is an
abstraction of a syntax definition definining the inverse mapping
from constructors to concrete syntax patterns. An entry c --
s1 ... sn
defines a mapping for constructor c
to the sequence s1 ... sn
, where each
s_i
is either a literal string or a parameter
_i
referring to the i
th argument of the
constructor. Figure 9.2 shows an
unparsing table for some expression and statement
constructors. Applying an unparsing mapping to an abstract syntax
tree results in a tree structure with strings at the leafs, as
illustrated in Figure 9.1.
Figure 9.2. Unparsing table
[ Var -- _1, Int -- _1, Plus -- _1 "+" _2, Minus -- _1 "-" _2, Assign -- _1 ":=" _2, Seq -- "(" _1 ")", Seq.1:iter-star-sep -- _1 ";", If -- "if" _1 "then" _2 "else" _3, Call -- _1 "(" _2 ")", Call.2:iter-star-sep -- _1 "," ]
Although the unparse of an abstract syntax tree is a text that can be parsed by a compiler, it is not necessarily a readable text. A pretty-printer is an unparser attempting to produce readable program text. A pretty-printer can be obtained by annotating the entries in an unparsing table with markup instructing a typesetting process. Figure 9.3 illustrates this process.
Box is a target independent formatting language, providing
combinators for declaring the two-dimensional positioning of boxes
of text. Typical combinators are H[b_1...b_n]
, which
combines the b_i
boxes horizontally, and
V[b_1...b_n]
, which combines the
b_i
boxes vertically. Figure 9.4 shows a pretty-print table with
Box markup. A more complete overview of the Box language and the
GPP tools can be found in Chapter 9.
Figure 9.4. Pretty-print table with Box markup
[ Var -- _1, Int -- _1, Plus -- H[_1 "+" _2], Minus -- H[_1 "-" _2], Assign -- H[_1 ":=" _2], Seq -- H hs=0["(" V[_1] ")"], Seq.1:iter-star-sep -- H hs=0[_1 ";"], If -- V[V is=2[H["if" _1 "then"] _2] V is=2["else" _3]], Call -- H hs=0[_1 "(" H[_2] ")"], Call.2:iter-star-sep -- H hs=0[_1 ","] ]
Figure 9.5. Pretty-printing of if-then-else statement
If(Eq(Var("n"),Int("1")), Int("0"), Times(Var("n"), Call(Var("fac"),[Minus(Var("n"),Int("1"))])))
if n = 1 then 0 else n * fac(n - 1)
Note: correct pretty-printing of an abstract syntax tree requires
that it contains nodes representing parentheses in the right
places. Otherwise, reparsing a pretty-printed string might get a
different interpretation. The sdf2parenthesize
tool generates from an SDF definition a Stratego program that
places parentheses at the necessary places in the tree.
ppgen -i m.def -o m.pp
.
Ppgen
generates from an SDF syntax definition a
pretty-print table with an entry for each context-free syntax
production with a constructor annotation. Typically it is
necessary to edit the pretty-print table to add appropriate Box
markup to the entries. The result should be saved under a
different name to avoid overwriting it.
ast2abox -p m.pp -i file.ast -o file.abox
.
ast2abox
maps an abstract syntax tree
file.ast
to an abstract syntax representation
file.abox
of a Box term based on a pretty-print
table m.pp
.
abox2text -i file.abox -o file.txt
.
abox2text
formats a Box term as ASCII text.
pp-aterm -i file1.trm -o file2.trm
.
pp-aterm
formats an ATerm as an ATerm in text
format, adding newlines and indentation to make the structure of
the term understandable. This is a useful tool to inspect terms
while debugging transformations.
term-to-dot -i file.trm -o file.dot (--tree | --graph)
.
Term-to-dot
is another visualization tool for terms
that creates a dot
graph representation, which can
be visualized using the dot
tool from the graphviz
graph layout package. Term-to-dot
can produce an
expanded tree view (--tree
), or a directed acyclic
graph view (--graph
) preserving the maximal sharing
in the term. This tool was used to produce the tree
visualizations in this chapter.
This tool is not part of the Stratego/XT distribution, but
included in the Stratego/XT Utilities package.