Chapter 9. Pretty Printing with GPP (*)

Table of Contents

9.1. Box Text Formatting Language
9.2. Pretty Print Tables
9.2.1. Pretty-Print Table Generation
9.2.2. Rule Selectors
9.2.3. Examples
9.3. Restoring Parenthesis
9.4. Pretty Printing using Stratego
9.5. Pretty-Printing (todo: imported)
9.5.1. Unparsing
9.5.2. Pretty-Printing
9.5.3. Disambiguation
9.6. Pretty-Printing and Term Visualization Tools (todo: imported)

Work in Progress

This chapter is work in progress. Not all parts have been finished yet. The latest revision of this manual may contain more material. Refer to the online version.

The GPP package is a tool suite for generic pretty-printing. GPP supports pretty-printing of parse-trees in the AsFix format with comment preservation and of abstract syntax trees. GPP supports the output formats plain text, LaTeX, and HTML. Formattings are defined in pretty print tables, which can be generated from SDF syntax definitions.

9.1. Box Text Formatting Language

The Box language is used in the GPP framework as a language-independent intermediate representation. The input language dependent parts and the output format dependent parts of GPP are connected through this intermediate representation.

Box is a mark-up language to describe the intended layout of text and is used in pretty print tables. A Box expression is constructed by composing sub-boxes using Box operators. These operators specify the relative ordering of boxes. Examples of Box operators are the H and V operator which format boxes horizontally and vertically, respectively.

The exact formatting of each Box operator can be customized using Box options. For example, to control the horizontal layout between boxes the H operator supports the hs space option.

For a detailed description of Box (including a description of all available Box operators) we refer to To Reuse Or To Be Reused (Chapter 4)

9.2. Pretty Print Tables

Pretty-print tables are used to specify how language constructs have to be pretty-printed. Pretty-print tables are used in combination with GPP front-ends, such ast2abox.

Pretty-print tables use Box as language to specify formatting of language constructs. A pretty-print table contains mappings from constructor names to Box expressions. For example, for the SDF production

Exp "+" Exp -> Exp {cons("Plus")}

A pretty-print entry looks like:

Plus -- H hs=1 [ _1 "+" _2]

Pretty-print tables are ordered such that pretty-print rules occuring first take preceedence over overlapping pretty-print rules defined later. The syntax of pretty-print tables is available in GPP.

9.2.1. Pretty-Print Table Generation

Pretty-print tables can be generated from SDF syntax definitions using ppgen. Generated pretty-print rules can easiliy be customized by overruling them in additional pretty-print tables. The tool pptable-diff notifies inconsistensies in pretty-print tables after the syntax definition has changed and can be used to bring inconsistent table up-to-date.

9.2.2. Rule Selectors

To be able to specify formattings for all nested constructs that are allowed in SDF productions, so called selectors are used in pretty-print tables to refer to specific parts of an SDF production and to define a formatting for them. For example, the SDF prodcution

"return" Exp? ";" -> Stm {cons("Return")}

contains a nested symbol A?. To specify a formatting for this production, two pretty-print entries can be used:

Return       -- H [ KW["return"] _1 ";" ]
Return.1:opt -- H [ _1 ]

A selector consists of a constructor name followed by a list of number+type tuples. A number selects a particular subtree of a constructor application, the type denotes the type of the selected construct (sequence, optional, separated list etc.). This rather verbose selector mechanism allows unambiguous selection of subtrees. Its verbosity (by specifying both the number of a subtree and its type), makes pretty-print tables easier to understand and, more importantly, it enables pretty-printing of AST's, because with type information, a concrete term can be correctly recontructed from an AST. Pretty-print tables can thus be used for both formatting of parse-trees and AST's.

Below we summarize which selector types are available:

opt

For SDF optionals S?

iter

For non-empty SDF lists S+

iter-star

For possible empty SDF lists S*

iter-sep

For SDF separator lists {S1 S2}+. Observe that the symbol S1 and the separator S2 are ordinary subtrees of the iter-sep construct which can be refered to as first and second subtree, respectively.

iter-star-sep

For SDF separator lists {S1 S2}*. Its symbol S1 and separator S2 can be refered to as first and second sub tree.

alt

For SDF alternatives S1 | S2 | S3. According to the SDF syntax, alternatives are binary operators. The pretty-printer flattens all subsequent alternatives. Pretty-print rules can be specified for each alternative individually by specifying the number of each alternative. To be able to format literals in alternative, a special formatting rule can be defined for the construct (See the examples below).

seq

For SDF alternatives (S1 S2 S3).

9.2.3. Examples

Below we list a simple SDF module with productions containing all above rule selectors.

module Symbols
exports
context-free syntax
  A?                  -> S {cons("ex1")}
  A+                  -> S {cons("ex2")}
  A*                  -> S {cons("ex3")}
  {S1 S2}+            -> S {cons("ex4")}
  {S1 S2}*            -> S {cons("ex5")}
  "one" | "two" | S?  -> S {cons("ex6")}
  ("one"  "two" S? )  -> S {cons("ex7")}

The following pretty-print table shows which pretty-print rules can be defined for this syntax:

[
  ex1                  -- _1,
  ex1.1:opt            -- _1,
  ex2                  -- _1,
  ex2.1:iter           -- _1,
  ex3                  -- _1,
  ex3.1:iter-star      -- _1,
  ex4                  -- _1,
  ex4.1:iter-sep       -- _1 _2,
  ex5                  -- _1,
  ex5.1:iter-star-sep  -- _1 _2,
  ex6                  -- _1,
  ex6.1:alt            -- KW["one"] KW["two"] _1,
  ex6.1:alt.1:opt      -- _1,
  ex7                  -- _1,
  ex7.1:seq            -- KW["one"] KW["two"] _1,
  ex7.1:seq.1:opt      -- _1
]

The pretty-print rule ex6.1:alt is a special case. It contains three Box expressions, one for each alternative. It is used to specify a formatting for the non-nested literals "one" and "two". During pretty-printing one of the three Box expressions is selected, depending on alternative contained the term to format.

9.3. Restoring Parenthesis

In this section, we explain how you can generate a tool that restores all parentheses at the places where necessary according to the priorities and associativities of a language.

9.4. Pretty Printing using Stratego

In this section, we explain how you can use Box in Stratego to implement a pretty-printer by hand.

9.5. Pretty-Printing (todo: imported)

After transformation, an abstract syntax tree should be turned into text again to be useful as a program. Mapping a tree into text can be seen as the inverse as parsing, and is thus called unparsing. When an unparser makes an attempt at producing human readable, instead of just compiler parsable, program text, an unparser is called a pretty-printer. We use the pretty-printing model as provided by the Generic Pretty-Printing package GPP. In this model a tree is unparsed to a Box expression, which contains text with markup for pretty-printing. A Box expression can be interpreted by different back-ends to produce text for different displaying devices, such as plain ASCII text, HTML, and LaTeX.

9.5.1. Unparsing

Unparsing is the inverse of parsing composed with abstract syntax tree composition. That is, an unparser turns an abstract syntax tree into a string, such that if the resulting string is parsed again, it produces the same abstract syntax tree.

An unparser can be organized in two phases. In the first phase, each node in an abstract syntax tree is replaced with the concrete syntax tree of the corresponding grammar production. In the second phase, the strings at the leaves of the tree are concatenated into a string. Figure 9.1 illustrates this process. The replacement of abstract syntax tree nodes by concrete syntax patterns should be done according to the productions of the syntax definition. An unparsing table is an abstraction of a syntax definition definining the inverse mapping from constructors to concrete syntax patterns. An entry c -- s1 ... sn defines a mapping for constructor c to the sequence s1 ... sn, where each s_i is either a literal string or a parameter _i referring to the ith argument of the constructor. Figure 9.2 shows an unparsing table for some expression and statement constructors. Applying an unparsing mapping to an abstract syntax tree results in a tree structure with strings at the leafs, as illustrated in Figure 9.1.

Figure 9.1.  Unparsing an abstract syntax tree.

Unparsing an abstract syntax tree.
Unparsing an abstract syntax tree.
f(a + 10) - 3

Figure 9.2.  Unparsing table

[
   Var                  -- _1,
   Int                  -- _1,
   Plus                 -- _1 "+" _2,
   Minus                -- _1 "-" _2,
   Assign               -- _1 ":=" _2,
   Seq                  -- "(" _1 ")",
   Seq.1:iter-star-sep  -- _1 ";",
   If                   -- "if" _1 "then" _2 "else" _3,
   Call                 -- _1 "(" _2 ")",
   Call.2:iter-star-sep -- _1 ","
]

9.5.2. Pretty-Printing

Although the unparse of an abstract syntax tree is a text that can be parsed by a compiler, it is not necessarily a readable text. A pretty-printer is an unparser attempting to produce readable program text. A pretty-printer can be obtained by annotating the entries in an unparsing table with markup instructing a typesetting process. Figure 9.3 illustrates this process.

Box is a target independent formatting language, providing combinators for declaring the two-dimensional positioning of boxes of text. Typical combinators are H[b_1...b_n], which combines the b_i boxes horizontally, and V[b_1...b_n], which combines the b_i boxes vertically. Figure 9.4 shows a pretty-print table with Box markup. A more complete overview of the Box language and the GPP tools can be found in Chapter 9.

Figure 9.3.  Pretty-printing with Box markup

Pretty-printing with Box markup
Pretty-printing with Box markup
f(a + 10) - 3

Figure 9.4.  Pretty-print table with Box markup

[
   Var                  -- _1,
   Int                  -- _1,
   Plus                 -- H[_1 "+" _2],
   Minus                -- H[_1 "-" _2],
   Assign               -- H[_1 ":=" _2],
   Seq                  -- H hs=0["(" V[_1] ")"],
   Seq.1:iter-star-sep  -- H hs=0[_1 ";"],
   If                   -- V[V is=2[H["if" _1 "then"] _2] 
                             V is=2["else" _3]],
   Call                 -- H hs=0[_1 "(" H[_2] ")"],
   Call.2:iter-star-sep -- H hs=0[_1 ","]
]

Figure 9.5.  Pretty-printing of if-then-else statement

If(Eq(Var("n"),Int("1")),
   Int("0"),
   Times(Var("n"),
         Call(Var("fac"),[Minus(Var("n"),Int("1"))])))
Pretty-printing of if-then-else statement
Pretty-printing of if-then-else statement
if n = 1 then
  0
else
  n * fac(n - 1)

9.5.3. Disambiguation

Note: correct pretty-printing of an abstract syntax tree requires that it contains nodes representing parentheses in the right places. Otherwise, reparsing a pretty-printed string might get a different interpretation. The sdf2parenthesize tool generates from an SDF definition a Stratego program that places parentheses at the necessary places in the tree.

9.6. Pretty-Printing and Term Visualization Tools (todo: imported)

ppgen -i m.def -o m.pp Ppgen generates from an SDF syntax definition a pretty-print table with an entry for each context-free syntax production with a constructor annotation. Typically it is necessary to edit the pretty-print table to add appropriate Box markup to the entries. The result should be saved under a different name to avoid overwriting it.

ast2abox -p m.pp -i file.ast -o file.abox ast2abox maps an abstract syntax tree file.ast to an abstract syntax representation file.abox of a Box term based on a pretty-print table m.pp.

abox2text -i file.abox -o file.txt abox2text formats a Box term as ASCII text.

pp-aterm -i file1.trm -o file2.trm pp-aterm formats an ATerm as an ATerm in text format, adding newlines and indentation to make the structure of the term understandable. This is a useful tool to inspect terms while debugging transformations.

term-to-dot -i file.trm -o file.dot (--tree | --graph) Term-to-dot is another visualization tool for terms that creates a dot graph representation, which can be visualized using the dot tool from the graphviz graph layout package. Term-to-dot can produce an expanded tree view (--tree), or a directed acyclic graph view (--graph) preserving the maximal sharing in the term. This tool was used to produce the tree visualizations in this chapter. This tool is not part of the Stratego/XT distribution, but included in the Stratego/XT Utilities package.