Chapter 8. Trees, Terms, and Tree Grammars (*)

Table of Contents

8.1. Regular Tree Grammars
8.2. Generating Code from an RTG
8.3. Creating a Subset of a Regular Tree Grammar
8.4. Format Checking
8.4.1. Introduction
8.4.2. Example
8.5. Signatures (todo: imported)
8.6. Signature Tools (todo: imported)

Work in Progress

This chapter is work in progress. Not all parts have been finished yet. The latest revision of this manual may contain more material. Refer to the online version.

8.1. Regular Tree Grammars

Explains regular tree grammars and how to generate them. See thesis.

8.2. Generating Code from an RTG

Typematch, API.

8.3. Creating a Subset of a Regular Tree Grammar

Explains rtg-script

8.4. Format Checking

Format-check analyzes whether the input ATerm conforms to the format that is specified in the RTG (Regular Tree Grammar).

8.4.1. Introduction

Format-check verifies that the input ATerm is part of the language defined in the RTG. If this is not the case, then the ATerm contains format errors. Format-check can operate in three modes: plain, visualize and XHTML.

The plain mode is used if the other modes are not enabled. In the plain mode format errors are reported and no result is written the the output (stdout or a file). Hence, if format-check is included in a pipeline, then the following tool will probably fail. If the input term is correct, then it is written to the output.

The visualize mode is enabled with the --vis option. In visualize mode format errors are reported and in a pretty-printed ATerm will be written to the output. All innermost parts of the ATerm that cause format errors are printed in red, if your terminal supports control characters for colors. If you want to browse through the ATerm with less, then you should use the -r flag.

The XHTML mode is enabled with the --xhtml option. In XHTML mode format errors are reported and an report in XHTML will be written to the output. This report shows the parts of the ATerm that are not formatted correctly. Also, moving with your mouse over the nodes of ATerm, will show the non-terminals that have be inferred by format-check (do not use IE6. Firefox or Mozilla is recommended).

Format-check reports all innermost format errors. That is, only the deepest format errors are reported. A format error is reported by showing the ATerm that is not in the correct format, and the inferred types of the children of the ATerm. In XHTML and visualize mode a format error of term in a list is presented by a red comma and term. This means that a type has been inferred for the term itself, but that it is not expected at this point in the list. If only the term is red, then no type could be inferred for the term itself.

In all modes format-check succeeds (exit code 0) if the ATerm contains no format errors. If the term does contain format errors, then format-check fails (exit code 1).

8.4.2. Example

Consider the RTG generated in the example of sdf2rtg:

regular tree grammar
  start Exp
    Exp      -> Minus(Exp,Exp)
    Exp      -> Plus(Exp,Exp)
    Exp      -> Mod(Exp,Exp)
    Exp      -> Div(Exp,Exp)
    Exp      -> Mul(Exp,Exp)
    Exp      -> Int(IntConst)
    Exp      -> Var(Id)
    IntConst -> <string>
    Id       -> <string>

The ATerm

Plus(Var("a"), Var("b"))

is part of the language defined by this RTG. Therefore, format-check succeeds:

$ format-check --rtg Exp.rtg -i exp1.trm

Note that format-check outputs the input term in this case. In this way format-check can be used in a pipeline of tools. On the other hand, the ATerm

Plus(Var("a"), Var(1))

is not part of the language defined by this RTG. Therefore, the invocation of format-check fails. Format-check also reports which term caused the failure:

$ format-check --rtg Exp.rtg -i exp2.trm
error: cannot type Var(1)
    inferred types of subterms:
    typed 1 as <int>

In large ATerms it might be difficult to find the incorrect subterm. To help with this, format-check supports the --vis argument. If this argument is used, then the result will pretty-printed (in the same way as pp-aterm) and the incorrect parts will be printed in red.

For example, consider this term:

Plus(Mul(Int(1), Var("a")), Minus(Var("b"), Div(1, Var("c"))))

format-check will visualize the errors in this ATerm:

The XHTML mode shows the errors in red as well. Moreover, if you move your moves over the terms, then you'll see the inferred types of the term.

$ format-check --rtg Exp.rtg -i exp3.trm --xhtml -o foo.html

You can now view the resulting file in your browser. You need a decent browser (Firefox or Mozilla. no IE).

8.5. Signatures (todo: imported)

The abstract syntax of a programming language or data format can be described by means of an algebraic signature. A signature declares for each constructor its arity m, the sorts of its arguments S1 * ... * Sm, and the sort of the resulting term S0 by means of a constructor declaration c:S1 * ... * Sm -> S0. A term can be validated against a signature by a format checker.

Signatures can be derived automatically from syntax definitions. For each production A1...An -> A0 {cons(c)} in a syntax definition, the corresponding constructor declaration is c:S1 * ... * Sm -> S0, where the Si are the sorts corresponding to the symbols Aj after leaving out literals and layout sorts. Figure 8.1 shows the signatures of statement and expression constructors for the example language from this chapter. The modules have been derived automatically from the syntax definitions in ??? and Figure 6.1.

Figure 8.1.  Signature for statement and expression constructors, automatically derived from a syntax definition.

module Statements
    FunDec  : Id * List(Id) * Exp -> FunDec
    FunDecs : List(FunDec) -> Dec
    VarDec  : Id * Exp -> Dec
    Call    : Var * List(Exp) -> Exp
    Let     : List(Dec) * List(Exp) -> Exp
    While   : Exp * Exp -> Exp
    If      : Exp * Exp * Exp -> Exp
    Seq     : List(Exp) -> Exp
    Assign  : Var * Exp -> Exp
    Gt      : Exp * Exp -> Exp
    Eq      : Exp * Exp -> Exp
    Minus   : Exp * Exp -> Exp
    Plus    : Exp * Exp -> Exp
    Times   : Exp * Exp -> Exp
    Uminus  : Exp -> Exp
    Int     : IntConst -> Exp
            : Var -> Exp
    Var     : Id -> Var
            : String -> IntConst
            : String -> Id
    Some : a -> Option(a)
    None : Option(a)
    Cons : a * List(a) -> List(a)
    Nil  : List(a)
    Conc : List(a) * List(a) -> List(a)

8.6. Signature Tools (todo: imported)

sdf2rtg -i m.def -o m.rtg -m M sdf2rtg derives from an SDF syntax definition m.def a regular tree grammar for the module M and all modules it imports.

rtg2sig -i m.def -o m.rtg rtg2sig generates from a regular tree grammar a Stratego signature.