This manual is being written for and with Stratego 0.16; You are advised to install the latest milestone for this release. See the download page at stratego-language.org
Program transformation is the mechanical manipulation of a program in order to improve it relative to some cost function C such that C(P) > C(tr(P)), i.e. the cost decreases as a result of applying the transformation. The cost of a program can be measured in different dimensions such as performance, memory usage, understandability, flexibility, maintainability, portability, correctness, or satisfaction of requirements. Related to these goals, program transformations are applied in different settings; e.g. compiler optimizations improve performance and refactoring tools aim at improving understandability. While transformations can be achieved by manual manipulation of programs, in general, the aim of program transformation is to increase programmer productivity by automating programming tasks, thus enabling programming at a higher-level of abstraction, and increasing maintainability and re-usability of programs. Automatic application of program transformations requires their implementation in a programming language. In order to make the implementation of transformations productive such a programming language should support abstractions for the domain of program transformation.
Stratego is a language designed for this purpose. It is a language based on the paradigm of rewriting with programmable rewriting strategies and dynamic rules.
Transformation by Rewriting.
Term rewriting is an attractive formalism for expressing basic
program transformations. A rewrite rule p1 -> p2
expresses that a program fragment matching the left-hand side
pattern p1 can be replaced by the instantiation of the right-hand
side pattern p2. For instance, the rewrite rule
|[ i + j ]| -> |[ k ]| where <add>(i, j) => k
expresses constant folding for addition, i.e. replacing an addition of two constants by their sum. Similarly, the rule
|[ if 0 then e1 else e2 ]| -> |[ e2 ]|
defines unreachable code elimination by reducing a conditional statement to its right branch since the left branch can never be executed. Thus, rewrite rules can directly express laws derived from the semantics of the programming language, making the verification of their correctness straightforward. A correct rule can be safely applied anywhere in a program. A set of rewrite rules can be directly operationalized by rewriting to normal form, i.e. exhaustive application of the rules to a term representing a program. If the rules are confluent and terminating, the order in which they are applied is irrelevant.
Limitations of Pure Rewriting. However, there are two limitations to the application of standard term rewriting techniques to program transformation: the need to intertwine rules and strategies in order to control the application of rewrite rules and the context-free nature of rewrite rules.
Transformation Strategies. Exhaustive application of all rules to the entire abstract syntax tree of a program is not adequate for most transformation problems. The system of rewrite rules expressing basic transformations is often non-confluent and/or non-terminating. An ad hoc solution that is often used is to encode control over the application of rules into the rules themselves by introducing additional function symbols. This intertwining of rules and strategies obscures the underlying program equalities, incurs a programming penalty in the form of rules that define a traversal through the abstract syntax tree, and disables the reuse of rules in different transformations.
Stratego solves the problem of control over the application of rules while maintaining the separation of rules and strategies. A strategy is a little program that makes a selection from the available rules and defines the order and position in the tree for applying the rules. Thus rules remain pure, are not intertwined with the strategy, and can be reused in multiple transformations. schemas.
Context-Senstive Transformation. The second problem of rewriting is the context-free nature of rewrite rules. A rule has access only to the term it is transforming. However, transformation problems are often context-sensitive. For example, when inlining a function at a call site, the call is replaced by the body of the function in which the actual parameters have been substituted for the formal parameters. This requires that the formal parameters and the body of the function are known at the call site, but these are only available higher-up in the syntax tree. There are many similar problems in program transformation, including bound variable renaming, typechecking, data flow transformations such as constant propagation, common-subexpression elimination, and dead code elimination. Although the basic transformations in all these applications can be expressed by means of rewrite rules, these require contextual information.
Outline. The following chapters give a tutorial for the Stratego language in which these ideas are explained and illustrated. The first three chapters outline the basic ideas of Stratego programming in bold strokes. Chapter 10 introduces the term notation used to represent source programs in Stratego. Chapter 11 shows how to set up, compile, and use a Stratego program. Chapter 12 introduces term rewrite rules and term rewriting. Chapter 13 argues the need for control over the application of rewrite rules and introduces strategies for achieving this.
The rest of the chapters in this tutorial explain the language in more detail. Chapter 14 examines the named rewrite rules, defines the notion of a rewriting strategy, and explains how to create reusable strategy expressions using definitions. Chapter 15 introduces combinators for the combination of simple transformations into more complex transformations. Chapter 16 re-examines the notion of a rule, and introduces the notions of building and matching terms, which provide the core to all manipulations of terms. It then goes on to show how these actions can be used to define higher-level constructs for expressing transformations, such as rewrite rules. Chapter 17 introduces the notion of data-type specific and generic traversal combinators. Chapter 18 shows how to generically define program analyses using type-unifying strategies. Chapter 19 shows ho to use the syntax of the source language in the patterns of transformation rules. Finally, Chapter 20 introduces the notion of dynamic rules for expressing context-sensitive transformations.
Table of Contents
Table of Contents
Stratego programs transform terms. When using Stratego for program transformation terms typically represent the abstract syntax tree of a program. But Stratego does not much care what a term represents. Terms can just as well represent structured documents, software models, or anything else that can be rendered in a structured format. The chapters in Part II show how to transform a program text into a term by means of parsing, and to turn a term into program text again by means of pretty-printing. From now on we will just assume that we have terms that should be transformed and ignore parsing and pretty-printing.
Terms in Stratego are terms in the Annotated Term Format, or
ATerms for short. The ATerm format provides a set of constructs
for representing trees, comparable to XML or abstract data types
in functional programming languages.
For example, the code 4 + f(5 * x)
might be
represented in a term as:
Plus(Int("4"), Call("f", [Mul(Int("5"), Var("x"))]))
ATerms are constructed from the following elements:
An integer constant, that is a list of decimal digits, is
an ATerm. Examples: 1
, 12343
.
A string constant, that is a list of characters between
double quotes is an ATerm. Special characters such as
double quotes and newlines should be escaped using a
backslash. The backslash character itself should be
escaped as well.
Examples: "foobar"
, "string with
quotes\""
, "escaped escape character\\ and a
newline\n".
A constructor is an identifier, that is an alphanumeric string starting with a letter, or a double quoted string.
A constructor application c(t1,...,tn)
creates a term by applying a constructor to a list of
zero or more terms.
For example, the term
Plus(Int("4"),Var("x"))
uses the
constructors Plus
, Int
, and
Var
to create a nested term from the strings
"4"
and "x"
.
When a constructor application has no subterms the
parentheses may be omitted. Thus, the term
Zero
is equivalent to
Zero()
. Some people consider it good style
to explicitly write the parentheses for nullary terms in
Stratego programs. Through this rule, it is clear that a
string is really a special case of a constructor
application.
A list is a term of the form [t1,...,tn]
,
that is a list of zero or more terms between square
brackets.
While all applications of a specific constructor
typically have the same number of subterms, lists can
have a variable number of subterms. The elements of a
list are typically of the same type, while the subterms
of a constructor application can vary in type.
Example: The second argument of the call to
"f"
in the term
Call("f",[Int("5"),Var("x")])
is a list of
expressions.
A tuple (t1,...,tn)
is a constructor
application without constructor.
Example: (Var("x"), Type("int"))
The elements defined above are used to create the
structural part of terms. Optionally, a term can be
annotated with a list terms. These annotations typically
carry additional semantic information about the term. An
annotated term has the form t{t1,...,tn}
.
Example: Lt(Var("n"),Int("1")){Type("bool")}
.
The contents of annotations is up to the application.
The term format described above is used in Stratego programs to denote terms, but is also used to exchange terms between programs. Thus, the internal format and the external format exactly coincide. Of course, internally a Stratego program uses a data-structure in memory with pointers rather than manipulating a textual representation of terms. But this is completely hidden from the Stratego programmer. There are a few facts that are useful to be aware of, though.
The internal representation used in Stratego programs maintains maximal sharing of subterms. This means that all occurrences of a subterm are represented by a pointer to the same node in memory. This makes comparing terms in Stratego for syntactic equality a very cheap operation, i.e., just a pointer comparison.
TODO: picture of tree vs dag representation
When writing a term to a file in order to exchange it with another tool there are several representations to choose from. The textual format described above is the canonical `meaning' of terms, but does not preserve maximal sharing. Therefore, there is also a Binary ATerm Format (BAF) that preserves sharing in terms. The program baffle can be used to convert between the textual and binary representations.
As a Stratego programmer you will be looking a lot at raw ATerms. Stratego pioneers would do this by opening an ATerm file in emacs and trying to get a sense of the structure by parenthesis highlighting and inserting newlines here and there. These days your life is much more pleasant through the tool pp-aterm, which adds layout to a term to make it readable. For example, parsing the following program
let function fact(n : int) : int = if n < 1 then 1 else (n * fact(n - 1)) in printint(fact(10)) end
produces the following ATerm (say in file fac.trm):
Let([FunDecs([FunDec("fact",[FArg("n",Tp(Tid("int")))],Tp(Tid("int")), If(Lt(Var("n"),Int("1")),Int("1"),Seq([Times(Var("n"),Call(Var("fact"), [Minus(Var("n"),Int("1"))]))])))])],[Call(Var("printint"),[Call(Var( "fact"),[Int("10")])])])
By pretty-printing the term using pp-aterm
as
$ pp-aterm -i fac.trm -o fac-pp.trm --max-term-size 20
we get a much more readable term:
Let( [ FunDecs( [ FunDec( "fact" , [FArg("n", Tp(Tid("int")))] , Tp(Tid("int")) , If( Lt(Var("n"), Int("1")) , Int("1") , Seq([ Times(Var("n"), Call(Var("fact"), [Minus(Var("n"), Int("1"))])) ]) ) ) ] ) ] , [ Call(Var("printint"), [Call(Var("fact"), [Int("10")])]) ] )
To use terms in Stratego programs, their constructors should be
declared in a signature. A signature declares a number of
sorts
and a number of constructors
for
these sorts. For each constructor, a signature declares the
number and types of its arguments. For example, the following
signature declares some typical constructors for constructing
abstract syntax trees of expressions in a programming language:
signature sorts Id Exp constructors : String -> Id Var : Id -> Exp Int : Int -> Exp Plus : Exp * Exp -> Exp Mul : Exp * Exp -> Exp Call : Id * List(Exp) -> Exp
Currently, the Stratego compiler only checks the arity of constructor applications against the signature. Still, it is considered good style to also declare the types of constructors in a sensible manner for the purpose of documentation. Also, a later version of the language may introduce typechecking.
Table of Contents
Now let's see how we can actually transform terms using Stratego programs. In the rest of this chapter we will first look at the structure of Stratego programs, and how to compile and run them. In the next chapters we will then see how to define transformations.
The simplest program you can write in Stratego is the following
identity.str
program:
module identity imports list-cons strategies main = id
It features the following elements:
Each Stratego file is a module, which has the
same name as the file it is stored in without the .str
extension.
A module may import other modules in order to
use the definitions in those modules.
A module may contain one or more strategies
sections
that introduce new strategy definitions. It will become clear later
what strategies and strategy definitions are.
Each Stratego program has one main definition,
which indicates the strategy to be executed on invocation of the
program.
In the example, the body of this program's main definition is the
identity strategy id
.
Now let's see what this program means. To find that out, we first
need to compile it, which we do using the Stratego compiler
strc
as follows:
$
strc -i identity.str
[ strc | info ] Compiling 'identity.str'
[ strc | info ] Front-end succeeded : [user/system] = [0.59s/0.56s]
[ strc | info ] Back-end succeeded : [user/system] = [0.46s/0.16s]
[ strc | info ] C compilation succeeded : [user/system] = [0.28s/0.23s]
[ strc | info ] Compilation succeeded : [user/system] = [1.35s/0.95s]
The -i
option of strc
indicates the
module to compile. The compiler also reads all imported modules, in
this case the list-cons.str
module that is part of the
Stratego library and that strc
magically knows how to
find. The compiler prints some information about what it is doing,
i.e., the stages of compilation that it goes through and the times
for those stages. You can turn this off using the argument
--verbose 0
. However, since the compiler is not very
fast, it may be satisfying to see something going on.
The result of compilation is an executable named
identity
after the name of the main module of the
program. Just to satisfy our curiosity we inspect the file system
to see what the compiler has done:
$
ls -l identity*
-rwxrwxr-x 1 7182 Sep 7 14:54 identity*
-rw------- 1 1362 Sep 7 14:54 identity.c
-rw-rw-r-- 1 200 Sep 7 14:54 identity.dep
-rw-rw-r-- 1 2472 Sep 7 14:54 identity.o
-rw-rw-r-- 1 57 Sep 7 13:03 identity.str
Here we see that in addition to the executable the compiler has
produced a couple of other files. First of all the
identity.c
file gives away the fact that the compiler
first translates a Stratego program to C and then uses the C
compiler to compile to machine code. The identity.o
file is the result of compiling the generated C program. Finally,
the contents of the identity.dep
file will look
somewhat like this:
identity: \ /usr/local/share/stratego-lib/collection/list/cons.rtree \ /usr/local/share/stratego-lib/list-cons.rtree \ ./identity.str
It is a rule in the Make language that declares the dependencies of
the identity
program. You can include this file in a
Makefile
to automate its compilation. For example, the
following Makefile
automates the compilation of the
identity
program:
include identity.dep identity : identity.str strc -i identity.str
Just invoke make
on the command-line whenever you
change something in the program.
Ok, we were digressing a bit. Let's turn back to finding out what
the identity
program does. When we execute the program
with some arbitrary arguments on the command-line, this is what
happens:
$
./identity foo bar
["./identity","foo","bar"]
The program writes to stdout
the list of command-line
arguments as a list of strings in the ATerm format.
So what we have learned is that a Stratego program applies its
main strategy to the list of command-line arguments, and writes
the resulting term to stdout
. Since the strategy in
the identity
program is the identity transformation
it just writes the original command-line arguments (as a term).
That was instructive, but not very useful. We are not interested
in transforming lists of strings, but rather programs represented
as terms. So we want to read a term from a file, transform it, and
write it to another file. Let's open the bag of tricks. The
identity-io
program improves the previous program:
module identity-io imports libstrategolib strategies main = io-wrap(id)
The program looks similar to the previous one, but there are a
couple of differences. First, instead of importing module
list-cons
, this module imports
libstrategolib
, which is the interface to the
separately compiled Stratego library. This library provides a host
of useful strategies that are needed in implementing program
transformations. Part IV gives an
overview of the Stratego library, and we will every now and then
use some useful strategies from the library before we get there.
Right now we are interested in the io-wrap
strategy
used above. It implements a wrapper strategy that takes care of
input and output for our program. To compile the program we need
to link it with the stratego-lib
library using the
-la
option:
$
strc -i identity-io.str -la stratego-lib
What the relation is between libstrategolib
and
stratego-lib
will become clear later; knowing that it
is needed to compile programs using libstrategolib
suffices
for now.
If we run the compiled identity-io
program with its
--help
option we see the standard interface supported
by the io-wrap
strategy:
$
./identity-io --help
Options:
-i f|--input f Read input from f
-o f|--output f Write output to f
-b Write binary output
-S|--silent Silent execution (same as --verbose 0)
--verbose i Verbosity level i (default 1)
( i as a number or as a verbosity descriptor:
emergency, alert, critical, error,
warning, notice, info, debug, vomit )
-k i | --keep i Keep intermediates (default 0)
--statistics i Print statistics (default 0 = none)
-h|-?|--help Display usage information
--about Display information about this program
--version Same as --about
The most relevant options are the -i
option for the
input file and the -o
option for the output file. For
instance, if we have some file foo-bar.trm
containing
an ATerm we can apply the program to it:
$
echo "Foo(Bar())" > foo-bar.trm$
./identity-io -i foo-bar.trm -o foo-bar2.trm$
cat foo-bar2.trm Foo(Bar)
If we leave out the -i
and/or -o
options,
input is read from stdin
and output is written to
stdout
. Thus, we can also invoke the program in a
pipe:
$
echo "Foo(Bar())" | ./identity-io
Foo(Bar)
Now it might seem that the identity-io
program just
copies its input file to the output file. In fact, the
identity-io
does not just accept any input. If we try
to apply the program to a text file that is not an ATerm, it
protests and fails:
$
echo "+ foo bar" | ./identity-io
readFromTextFile: parse error at line 0, col 0
not a valid term
./identity: rewriting failed
So we have written a program to check if a file represents an ATerm.
A Stratego program based on io-wrap
defines a
transformation from terms to terms. Such transformations can be
combined into more complex transformations, by creating a chain of
tool invocations.
For example, if we have a Stratego program trafo-a
applying some undefined transformation-a
to the input
term of the program
module trafo-a imports libstrategolib strategies main = io-wrap(transformation-a) transformation-a = ...
and we have another similar program trafo-b
applying a
transformation-b
module tool-b imports libstrategolib strategies main = io-wrap(transformation-b) transformation-b = ...
then we can combine the transformations to transform an
input
file to an output
file using a Unix
pipe, as in
$
tool-a -i input | tool-b -o output
or using an intermediate
file:
$
tool-a -i input -o intermediate$
tool-b -i intermediate -o output
We have just learned how to write, compile, and execute Stratego programs. This is the normal mode for development of transformation systems with Stratego. Indeed, we usually do not invoke the compiler from the command-line `by hand', but have an automated build system based on (auto)make to build all programs in a project at once. For learning to use the language this can be rather laborious, however. Therefore, we have also developed the Stratego Shell, an interactive interpreter for the Stratego language. The shell allows you to type in transformation strategies on the command-line and directly seeing their effect on the current term. While this does not scale to developing large programs, it can be instructive to experiment while learning the language. In the following chapters we will use the stratego-shell to illustrate various features of the language.
Here is a short session with the Stratego Shell that shows the basics of using it:
$
stratego-shellstratego>
:show ()stratego>
!Foo(Bar()) Foo(Bar)stratego>
id Foo(Bar)stratego>
fail command failedstratego>
:show Foo(Bar)stratego>
:quit Foo(Bar)$
The shell is invoked by calling the command
stratego-shell
on the regular command-line. The
stratego>
prompt then indicates that you have entered
the Stratego Shell. After the prompt you can enter strategies or
special shell commands.
Strategies are the statements and functions of the Stratego
language. A strategy transforms a term into a new term, or
fails. The term to which a strategy is applied, is called the
current term. In the Stratego Shell you can see
the current term with :show. In the session above
we see that the current term is the empty tuple if you have just
started the Stratego Shell.
At the prompt of the shell you can enter strategies. If the strategy
succeeds, then the shell will show the transformed term, which is
now the new current term.
For example, in the session above the strategy
!Foo(Bar())
replaces the current term with the term
Foo(Bar())
, which is echoed directly after applying the
strategy. The next strategy that is applied is the identity strategy
id
that we saw before. Here it becomes clear that it
just returns the term to which it is applied.
Thus, we have the following general scheme of applying a strategy to
the current term:
current term
stratego>
strategy expression
transformed current
stratego>
Strategies can also fail. For example, the application of the
fail
strategy always fails. In the case of failure, the
shell will print a message and leave the current term untouched:
current term
stratego>
strategy expression
command failedstratego>
:showcurrent term
Finally, you can leave the shell using the :quit command.
The Stratego Shell has a number of non-strategy commands to operate
the shell configuration. Theses commands are recognizable by the
:
prefix. The :help
command tells you
what commands are available in the shell:
$
stratego-shellstratego>
:help Rewriting strategy rewrite the current subject term with strategy Defining Strategies id = strategy define a strategy (doesn't change the current subject term) id : rule define a rule (doesn't change the current subject term) import modname import strategy definitions from 'modname' (file system or xtc) :undef id delete defintions of all strategies 'id'/(s,t) :undef id(s,t) delete defintion of strategy 'id'/(s,t) :reset delete all term bindings, all strategies, reset syntax. Debugging :show show the current subject term :autoshow on|off show the current subject term after each rewrite :binding id show term binding of id :bindings show all term bindings :showdef id show defintions of all strategies 'id'/(s,t) :showdef id(s,t) show defintion of strategy 'id'/(s,t) :showast id(s,t) show ast of defintion of strategy 'id'/(s,t) Concrete Syntax :syntax defname set the syntax to the sdf definition in 'defname'. XTC :xtc import pathname Misc :include file execute command in the script of `file` :verbose int set the verbosity level (0-9) :clear clear the screen :exit exit the Stratego Shell :quit same as :exit :q same as :exit :about information about the Stratego Shell :help show this help information stratego>
Let's summarize what we have learned so far about Stratego programming.
First, a Stratego program is divided into
modules, which reside in files with extension
.str
and have the following general form:
module mod0 imports libstrategolib mod1 mod2 signature sorts A B C constructors Foo : A -> B Bar : A strategies main = io-wrap(foo) foo = id
Modules can import other modules and can define signatures for
declaring the structure of terms and can define strategies, which we
not really know much about yet.
However, the io-wrap
strategy can be used to handle the
input and output of a Stratego program. This strategy is defined in
the libstrategolib
module, which provides an interface to the
Stratego Library.
The main module of a Stratego program should have a
main
strategy that defines the entry point of the
program.
Next, a Stratego program is compiled to an executable program using
the Stratego Compiler strc
.
$
strc -i mod0 -la stratego-lib
The resulting executable applies the main
strategy to
command-line arguments turned into a list-of-strings term.
The io-wrap
strategy interprets these command-line
arguments to handle input and output using standard command-line
options.
Finally, the Stratego Shell can be used to invoke strategies interactively.
$
stratego-shellstratego>
id ()stratego>
Next up: transforming terms with rewrite rules.
Table of Contents
In Part II we saw how terms provide a structured representation for programs derived from a formal definition of the syntax of a programming language. Transforming programs then requires tranformation of terms. In this chapter we show how to implement term transformations using term rewriting in Stratego. In term rewriting a term is transformed by repeated application of rewrite rules.
To see how this works we take as example the language of propositional formulae, also known as Boolean expressions:
module prop signature sorts Prop constructors False : Prop True : Prop Atom : String -> Prop Not : Prop -> Prop And : Prop * Prop -> Prop Or : Prop * Prop -> Prop Impl : Prop * Prop -> Prop Eq : Prop * Prop -> Prop
Given this signature we can write terms such as
And(Impl(True,False),False)
, and
And(Atom("p"),False))
. Atoms are also known as
proposition letters; they are the variables in propositional
formulae. That is, the truth value of an atom should be provided in
order to fully evaluate an expression. Here we will evaluate
expressions as far as possible, a transformation also known as
constant folding. We will do this using
rewrite rules that define how to simplify a
single operator application.
A term pattern is a term with meta
variables, which are identifiers that are not declared
as (nullary) constructors. For example, And(x, True)
is a term pattern with variable x
. Variables in term
patterns are sometimes called meta variables,
to distinguish them from variables in the source language being
processed. For example, while atoms in the proposition expressions
are variables from the point of view of the language, they are not
variables from the perspective of a Stratego program.
A term pattern p
matches with a
term t
, if there is a
substitution that replaces the variables in
p
such that it becomes equal to t
. For
example, the pattern And(x, True)
matches the term
And(Impl(True,Atom("p")),True)
because replacing the
variable x
in the pattern by
Impl(True,Atom("p"))
makes the pattern equal to the
term. Note that And(Atom("x"),True)
does
not match the term
And(Impl(True,Atom("p")),True)
, since the subterms
Atom("x")
and Impl(True,Atom("p"))
do not
match.
An unconditional rewrite rule has the form
L : p1 -> p2
, where L
is the name of the
rule, p1
is the left-hand side and p2
the
right-hand side term pattern.
A rewrite rule L : p1 -> p2
applies to a term
t
when the pattern p1
matches
t
. The result is the instantiation of p2
with the variable bindings found during matching.
For example, the rewrite rule
E : Eq(x, False) -> Not(x)
rewrites the term Eq(Atom("q"),False)
to
Not(Atom("q"))
, since the variable x
is
bound to the subterm Atom("q")
.
Now we can create similar evaluation rules for all constructors of
sort Prop
:
module prop-eval-rules imports prop rules E : Not(True) -> False E : Not(False) -> True E : And(True, x) -> x E : And(x, True) -> x E : And(False, x) -> False E : And(x, False) -> False E : Or(True, x) -> True E : Or(x, True) -> True E : Or(False, x) -> x E : Or(x, False) -> x E : Impl(True, x) -> x E : Impl(x, True) -> True E : Impl(False, x) -> True E : Eq(False, x) -> Not(x) E : Eq(x, False) -> Not(x) E : Eq(True, x) -> x E : Eq(x, True) -> x
Note that all rules have the same name, which is allowed in Stratego.
Next we want to normalize terms with respect to a collection of rewrite rules. This entails applying all rules to all subterms until no more rules can be applied. The following module defines a rewrite system based on the rules for propositions above:
module prop-eval imports libstrategolib prop-eval-rules strategies main = io-wrap(eval) eval = innermost(E)
The module imports the Stratego Library
(libstrategolib
) and the module with the evaluation
rules, and then defines the main
strategy to apply
innermost(E)
to the input term. (See the discussion of
io-wrap
in Section 11.2.)
The innermost
strategy from the library exhaustively
applies its argument transformation to the term it is applied to,
starting with `inner' subterms.
We can now compile the program as discussed in Chapter 11:
$
strc -i prop-eval.str -la stratego-lib
This results in an executable prop-eval
that can be
used to evaluate Boolean expressions. For example, here are some
applications of the program:
$
cat test1.prop And(Impl(True,And(False,True)),True)$
./prop-eval -i test1.prop False$
cat test2.prop And(Impl(True,And(Atom("p"),Atom("q"))),ATom("p"))$
./prop-eval -i test2.prop And(And(Atom("p"),Atom("q")),ATom("p"))
We can also import these definitions in the Stratego Shell, as illustrated by the following session:
$
stratego-shellstratego>
import prop-evalstratego>
!And(Impl(True(),And(False(),True())),True()) And(Impl(True,And(False,True)),True)stratego>
eval Falsestratego>
!And(Impl(True(),And(Atom("p"),Atom("q"))),ATom("p")) And(Impl(True,And(Atom("p"),Atom("q"))),ATom("p"))stratego>
eval And(And(Atom("p"),Atom("q")),ATom("p"))stratego>
:quit And(And(Atom("p"),Atom("q")),ATom("p"))$
The first command imports the prop-eval
module, which
recursively loads the evaluation rules and the library, thus making
its definitions available in the shell. The !
commands
replace the current term with a new term. (This
build strategy will be properly introduced in
Chapter 16.)
The next commands apply the eval
strategy to various
terms.
Next we extend the rewrite rules above to rewrite a Boolean expression to disjunctive normal form. A Boolean expression is in disjunctive normal form if it conforms to the following signature:
signature sorts Or And NAtom Atom constructors Or : Or * Or -> Or : And -> Or And : And * And -> And : NAtom -> And Not : Atom -> NAtom : Atom -> NAtom Atom : String -> Atom
We use this signature only to describe what a disjunctive normal
form is, not in an the actual Stratego program. This is not
necessary, since terms conforming to the DNF signature are also
Prop
terms as defined before.
For example, the disjunctive normal form of
And(Impl(Atom("r"),And(Atom("p"),Atom("q"))),ATom("p"))
is
Or(And(Not(Atom("r")),ATom("p")), And(And(Atom("p"),Atom("q")),ATom("p")))
Module prop-dnf-rules
extends the rules defined in
prop-eval-rules
with rules to achieve disjunctive
normal forms:
module prop-dnf-rules imports prop-eval-rules rules E : Impl(x, y) -> Or(Not(x), y) E : Eq(x, y) -> And(Impl(x, y), Impl(y, x)) E : Not(Not(x)) -> x E : Not(And(x, y)) -> Or(Not(x), Not(y)) E : Not(Or(x, y)) -> And(Not(x), Not(y)) E : And(Or(x, y), z) -> Or(And(x, z), And(y, z)) E : And(z, Or(x, y)) -> Or(And(z, x), And(z, y))
The first two rules rewrite implication (Impl
) and
equivalence (Eq
) to combinations of And
,
Or
, and Not
.
The third rule removes double negation.
The fifth and sixth rules implement the well known DeMorgan laws.
The last two rules define distribution of conjunction over
disjunction.
We turn this set of rewrite rules into a compilable Stratego program in the same way as before:
module prop-dnf imports libstrategolib prop-dnf-rules strategies main = io-wrap(dnf) dnf = innermost(E)
compile it in the usual way
$
strc -i prop-dnf.str -la stratego-lib
so that we can use it to transform terms:
$
cat test3.prop And(Impl(Atom("r"),And(Atom("p"),Atom("q"))),ATom("p"))$
./prop-dnf -i test3.prop Or(And(Not(Atom("r")),ATom("p")),And(And(Atom("p"),Atom("q")),ATom("p")))
We have seen how to define simple transformations on terms using
unconditional term rewrite rules. Using the innermost
strategy, rules are applied exhaustively to all subterms of the
subject term.
The implementation of a rewrite system in Stratego has the following
form:
module mod imports libstrategolib signature sorts A B C constructors Foo : A * B -> C rules R : p1 -> p2 R : p3 -> p4 strategies main = io-wrap(rewr) rewr = innermost(R)
The ingredients of such a program can be divided over several modules. Thus, a set of rules can be used in multiple rewrite systems.
Compiling the module by means of the command
$
strc -i mod.str -la stratego-lib
produces an executable mod
that can be used to
transform terms.
Table of Contents
In Chapter 12 we saw how term
rewriting can be used to implement transformations on programs
represented by means of terms.
Term rewriting involves exhaustively applying rules to subterms
until no more rules apply.
This requires a strategy for selecting the
order in which subterms are rewritten.
The innermost
strategy introduced in Chapter 12 applies rules automatically throughout a
term from inner to outer terms, starting with the leaves.
The nice thing about term rewriting is that there is no need to
define traversals over the syntax tree; the rules express basic
transformation steps and the strategy takes care of applying it
everywhere.
However, the complete normalization approach of rewriting turns out
not to be adequate for program transformation, because rewrite
systems for programming languages will often be non-terminating
and/or non-confluent.
In general, it is not desirable to apply all rules at the same time
or to apply all rules under all circumstances.
Consider for example, the following extension of
prop-dnf-rules
with distribution rules to achieve
conjunctive normal forms:
module prop-cnf imports prop-dnf-rules rules E : Or(And(x, y), z) -> And(Or(x, z), Or(y, z)) E : Or(z, And(x, y)) -> And(Or(z, x), Or(z, y)) strategies main = io-wrap(cnf) cnf = innermost(E)
This rewrite system is non-terminating because after applying one of the and-over-or distribution rules, the or-over-and distribution rules introduced here can be applied, and vice versa.
And(Or(Atom("p"),Atom("q")), Atom("r")) -> Or(And(Atom("p"), Atom("r")), And(Atom("q"), Atom("r"))) -> And(Or(Atom("p"), And(Atom("q"), Atom("r"))), Or(Atom("r"), And(Atom("q"), Atom("r")))) -> ...
There are a number of solutions to this problem. We'll first discuss a couple of solutions within pure rewriting, and then show how programmable rewriting strategies can overcome the problems of these solutions.
The non-termination of prop-cnf
is due to the fact that
the and-over-or and or-over-and distribution rules interfere with
each other.
This can be prevented by refactoring the module structure such that
the two sets of rules are not present in the same rewrite system.
For example, we could split module prop-dnf-rules
into
prop-simplify
and prop-dnf2
as follows:
module prop-simplify imports prop-eval-rules rules E : Impl(x, y) -> Or(Not(x), y) E : Eq(x, y) -> And(Impl(x, y), Impl(y, x)) E : Not(Not(x)) -> x E : Not(And(x, y)) -> Or(Not(x), Not(y)) E : Not(Or(x, y)) -> And(Not(x), Not(y))
module prop-dnf2 imports prop-simplify rules E : And(Or(x, y), z) -> Or(And(x, z), And(y, z)) E : And(z, Or(x, y)) -> Or(And(z, x), And(z, y)) strategies main = io-wrap(dnf) dnf = innermost(E)
Now we can reuse the rules from prop-simplify
without
the and-over-or distribution rules to create a
prop-cnf2
for normalizing to conjunctive normal form:
module prop-cnf2 imports prop-simplify rules E : Or(And(x, y), z) -> And(Or(x, z), Or(y, z)) E : Or(z, And(x, y)) -> And(Or(z, x), Or(z, y)) strategies main = io-wrap(cnf) cnf = innermost(E)
Although this solves the non-termination problem, it is not an ideal solution. In the first place it is not possible to apply the two transformations in the same program. In the second place, extrapolating the approach to fine-grained selection of rules might require definition of a single rule per module.
Another common solution to this kind of problem is to introduce
additional constructors that achieve normalization under a
restricted set of rules. That is, the original set of rules p1
-> p2
is transformed into rules of the form f(p_1) ->
p_2'
, where f
is some new constructor symbol and
the right-hand side of the rule also contains such new
constructors. In this style of programming, constructors such as
f
are called functions and are
distinghuished from constructors. Normal forms over such rewrite
systems are assumed to be free of these `function' symbols;
otherwise the function would have an incomplete definition.
To illustrate the approach we adapt the DNF rules by introducing the
function symbols Dnf
and DnfR
. (We ignore
the evaluation rules in this example.)
module prop-dnf3 imports libstrategolib prop signature constructors Dnf : Prop -> Prop DnfR : Prop -> Prop rules E : Dnf(Atom(x)) -> Atom(x) E : Dnf(Not(x)) -> DnfR(Not(Dnf(x))) E : Dnf(And(x, y)) -> DnfR(And(Dnf(x), Dnf(y))) E : Dnf(Or(x, y)) -> Or(Dnf(x), Dnf(y)) E : Dnf(Impl(x, y)) -> Dnf(Or(Not(x), y)) E : Dnf(Eq(x, y)) -> Dnf(And(Impl(x, y), Impl(y, x))) E : DnfR(Not(Not(x))) -> x E : DnfR(Not(And(x, y))) -> Or(Dnf(Not(x)), Dnf(Not(y))) E : DnfR(Not(Or(x, y))) -> Dnf(And(Not(x), Not(y))) D : DnfR(Not(x)) -> Not(x) E : DnfR(And(Or(x, y), z)) -> Or(Dnf(And(x, z)), Dnf(And(y, z))) E : DnfR(And(z, Or(x, y))) -> Or(Dnf(And(z, x)), Dnf(And(z, y))) D : DnfR(And(x, y)) -> And(x, y) strategies main = io-wrap(dnf) dnf = innermost(E <+ D)
The Dnf
function mimics the innermost normalization
strategy by recursively traversing terms.
The auxiliary transformation function DnfR
is used to
encode the distribution and negation rules.
The D
rules are default rules that
are only applied if none of the E
rules apply, as
specified by the strategy expression E <+ D
.
In order to compute the disjunctive normal form of a term, we have
to `apply' the Dnf
function to it, as illustrated in
the following application of the prop-dnf3
program:
$
cat test1.dnf Dnf(And(Impl(Atom("r"),And(Atom("p"),Atom("q"))),ATom("p")))$
./prop-dnf3 -i test1.dnf Or(And(Not(Atom("r")),Dnf(Dnf(ATom("p")))), And(And(Atom("p"),Atom("q")),Dnf(Dnf(ATom("p")))))
For conjunctive normal form we can create a similar definition, which can now co-exist with the definition of DNF. Indeed, we could then simultaneously rewrite one subterm to DNF and the other to CNF.
E : DC(x) -> (Dnf(x), Cnf(x))
In the solution above, the original rules have been completely
intertwined with the Dnf
transformation.
The rules for negation cannot be reused in the definition of
normalization to conjunctive normal form.
For each new transformation a new traversal function and new
transformation functions have to be defined.
Many additional rules had to be added to traverse the term to find
the places to apply the rules.
In the modular solution we had 5 basic rules and 2 additional rules
for DNF and 2 rules for CNF, 9 in total. In the functionalized
version we needed 13 rules for each
transformation, that is 26 rules in total.
In general, there are two problems with the functional approach to encoding the control over the application of rewrite rules, when comparing it to the original term rewriting approach: traversal overhead and loss of separation of rules and strategies.
In the first place, the functional encoding incurs a large overhead due to the explicit specification of traversal. In pure term rewriting, the strategy takes care of traversing the term in search of subterms to rewrite. In the functional approach traversal is spelled out in the definition of the function, requiring the specification of many additional rules. A traversal rule needs to be defined for each constructor in the signature and for each transformation. The overhead for transformation systems for real languages can be inferred from the number of constructors for some typical languages:
language : constructors Tiger : 65 C : 140 Java : 140 COBOL : 300 - 1200
In the second place, rewrite rules and the strategy that defines their application are completely intertwined. Another advantage of pure term rewriting is the separation of the specification of the rules and the strategy that controls their application. Intertwining these specifications makes it more difficult to understand the specification, since rules cannot be distinghuished from the transformation they are part of. Furthermore, intertwining makes it impossible to reuse the rules in a different transformation.
Stratego introduced the paradigm of programmable rewriting strategies with generic traversals, a unifying solution in which application of rules can be carefully controlled, while incurring minimal traversal overhead and preserving separation of rules and strategies.
The following are the design criteria for strategies in Stratego:
Basic transformation rules can be defined separately from the strategy that applies them, such that they can be understood independently.
A transformation can select the necessary set of rules from a collection (library) of rules.
A transformation can exercise complete control over the application of rules. This control may be fine-grained or course-grained depending on the application.
Transformations can be defined without overhead for the definition of traversals.
Rules can be reused in different transformations.
Traversal schemas can be defined generically and reused in different transformations.
In the next chapters we will examine the language constructs that Stratego provides for programming with strategies, starting with the low-level actions of building and matching terms. To get a feeling for the purpose of these constructs, we first look at a couple of typical idioms of strategic rewriting.
The basic idiom of program transformation achieved with term rewriting is that of cascading transformations. Instead of applying a single complex transformation algorithm to a program, a number of small, independent transformations are applied in combination throughout a program or program unit to achieve the desired effect. Although each individual transformation step achieves little, the cumulative effect can be significant, since each transformation feeds on the results of the ones that came before it.
One common cascading of transformations is accomplished by
exhaustively applying rewrite rules to a subject term. In Stratego
the definition of a cascading normalization strategy with respect to
rules R1
, ... ,Rn
can be formalized using
the innermost
strategy that we saw before:
simplify = innermost(R1 <+ ... <+ Rn)
The argument strategy of innermost
is a
selection of rules. By giving
different names to rules, we can control the
selection used in each transformation. There can be multiple
applications of innermost
to different sets of rules,
such that different transformations can co-exist in the same module
without interference. Thus, it is now possible to develop a large
library of transformation rules that can be called upon when
necessary, without having to compose a rewrite system by cutting and
pasting.
For example, the following module defines the normalization of
proposition formulae to both disjunctive and to conjunctive normal
form:
module prop-laws imports libstrategolib prop rules DefI : Impl(x, y) -> Or(Not(x), y) DefE : Eq(x, y) -> And(Impl(x, y), Impl(y, x)) DN : Not(Not(x)) -> x DMA : Not(And(x, y)) -> Or(Not(x), Not(y)) DMO : Not(Or(x, y)) -> And(Not(x), Not(y)) DAOL : And(Or(x, y), z) -> Or(And(x, z), And(y, z)) DAOR : And(z, Or(x, y)) -> Or(And(z, x), And(z, y)) DOAL : Or(And(x, y), z) -> And(Or(x, z), Or(y, z)) DOAR : Or(z, And(x, y)) -> And(Or(z, x), Or(z, y)) strategies dnf = innermost(DefI <+ DefE <+ DAOL <+ DAOR <+ DN <+ DMA <+ DMO) cnf = innermost(DefI <+ DefE <+ DOAL <+ DOAR <+ DN <+ DMA <+ DMO) main-dnf = io-wrap(dnf) main-cnf = io-wrap(cnf)
The rules are named, and for each strategy different selections from the ruleset are made.
The module even defines two main strategies, which allows us to use
one module for deriving multiple programs. Using the
--main
option of strc we declare
which strategy to invoke as main strategy in a particular
program. Using the -o
option we can give a different
name to each derived program.
$
strc -i prop-laws.str -la stratego-lib --main main-dnf -o prop-dnf4
Cascading transformations can be defined with other strategies as
well, and these strategies need not be exhaustive, but can be
simpler one-pass traversals.
For example, constant folding of Boolean expressions only requires a
simple one-pass bottom-up traversal. This can be achieved using the
bottomup
strategy according the the following scheme:
simplify = bottomup(repeat(R1 <+ ... <+ Rn))
The bottomup
strategy applies its argument strategy to
each subterm in a bottom-to-top traversal. The repeat
strategy applies its argument strategy repeatedly to a term.
Module prop-eval2
defines the evaluation rules for
Boolean expressions and a strategy for applying them using this
approach:
module prop-eval2 imports libstrategolib prop rules Eval : Not(True) -> False Eval : Not(False) -> True Eval : And(True, x) -> x Eval : And(x, True) -> x Eval : And(False, x) -> False Eval : And(x, False) -> False Eval : Or(True, x) -> True Eval : Or(x, True) -> True Eval : Or(False, x) -> x Eval : Or(x, False) -> x Eval : Impl(True, x) -> x Eval : Impl(x, True) -> True Eval : Impl(False, x) -> True Eval : Eq(False, x) -> Not(x) Eval : Eq(x, False) -> Not(x) Eval : Eq(True, x) -> x Eval : Eq(x, True) -> x strategies main = io-wrap(eval) eval = bottomup(repeat(Eval))
The strategy eval
applies these rules in a bottom-up
traversal over a term, using the bottomup(s)
strategy. At each sub-term, the rules are applied repeatedly until
no more rule applies using the repeat(s)
strategy. This
is sufficient for the Eval
rules, since the rules never
construct a term with subterms that can be rewritten.
Another typical example of the use of one-pass traversals is desugaring, that is rewriting language constructs to more basic language constructs. Simple desugarings can usually be expressed using a single top-to-bottom traversal according to the scheme
simplify = topdown(try(R1 <+ ... <+ Rn))
The topdown
strategy applies its argument strategy to
a term and then traverses the resulting term.
The try
strategy tries to apply its argument strategy
once to a term.
Module prop-desugar
defines a number of desugaring
rules for Boolean expressions, defining propositional operators in
terms of others. For example, rule DefN
defines
Not
in terms of Impl
, and rule
DefI
defines Impl
in terms of
Or
and Not
. So not all rules should be
applied in the same transformation or non-termination would result.
module prop-desugar imports prop libstrategolib rules DefN : Not(x) -> Impl(x, False) DefI : Impl(x, y) -> Or(Not(x), y) DefE : Eq(x, y) -> And(Impl(x, y), Impl(y, x)) DefO1 : Or(x, y) -> Impl(Not(x), y) DefO2 : Or(x, y) -> Not(And(Not(x), Not(y))) DefA1 : And(x, y) -> Not(Or(Not(x), Not(y))) DefA2 : And(x, y) -> Not(Impl(x, Not(y))) IDefI : Or(Not(x), y) -> Impl(x, y) IDefE : And(Impl(x, y), Impl(y, x)) -> Eq(x, y) strategies desugar = topdown(try(DefI <+ DefE)) impl-nf = topdown(repeat(DefN <+ DefA2 <+ DefO1 <+ DefE)) main-desugar = io-wrap(desugar) main-inf = io-wrap(impl-nf)
The strategies desugar
and impl-nf
define
two different desugaring transformation based on these rules.
The desugar
strategy gets rid of the implication and
equivalence operators, while the impl-nf
strategy
reduces an expression to implicative normal-form, a format in which
only implication (Impl
) and
False
are used.
A final example of a one-pass traversal is the downup
strategy, which applies its argument transformation during a
traversal on the way down, and again on the way up:
simplify = downup(repeat(R1 <+ ... <+ Rn))
An application of this strategy is a more efficient implementation of constant folding for Boolean expressions:
eval = downup(repeat(Eval))
This strategy reduces terms such as
And(... big expression ..., False)
in one step (to False
in this case), while the
bottomup
strategy defined above would first evaluate
the big expression.
Cascading transformations apply a number of rules one after another to an entire tree. But in some cases this is not appropriate. For instance, two transformations may be inverses of one another, so that repeatedly applying one and then the other would lead to non-termination. To remedy this difficulty, Stratego supports the idiom of staged transformation.
In staged computation, transformations are not applied to a subject term all at once, but rather in stages. In each stage, only rules from some particular subset of the entire set of available rules are applied. In the TAMPR program transformation system this idiom is called sequence of normal forms, since a program tree is transformed in a sequence of steps, each of which performs a normalization with respect to a specified set of rules. In Stratego this idiom can be expressed directly according to the following scheme:
strategies simplify = innermost(A1 <+ ... <+ Ak) ; innermost(B1 <+ ... <+ Bl) ; ... ; innermost(C1 <+ ... <+ Cm)
In conventional program optimization, transformations are applied throughout a program. In optimizing imperative programs, for example, complex transformations are applied to entire programs. In GHC-style compilation-by-transformation, small transformation steps are applied throughout programs. Another style of transformation is a mixture of these ideas. Instead of applying a complex transformation algorithm to a program we use staged, cascading transformations to accumulate small transformation steps for large effect. However, instead of applying transformations throughout the subject program, we often wish to apply them locally, i.e., only to selected parts of the subject program. This allows us to use transformations rules that would not be beneficial if applied everywhere.
One example of a strategy which achieves such a transformation is
strategies transformation = alltd( trigger-transformation ; innermost(A1 <+ ... <+ An) )
The strategy alltd(s)
descends into a term until a
subterm is encountered for which the transformation s
succeeds. In this case the strategy
trigger-transformation
recognizes a program fragment
that should be transformed. Thus, cascading transformations are
applied locally to terms for which the transformation is
triggered. Of course more sophisticated strategies can be used for
finding application locations, as well as for applying the rules
locally. Nevertheless, the key observation underlying this idiom
remains: Because the transformations to be applied are local,
special knowledge about the subject program at the point of
application can be used. This allows the application of rules that
would not be otherwise applicable.
While term rewrite rules can express individual transformation steps, the exhaustive applications of all rules to all subterms is not always desirable. The selection of rules to apply through the module system does not allow transformations to co-exist and may require very small-grained modules. The `functionalization' of rewrite rules leads to overhead in the form of traversal definitions and to the loss of separation between rules and strategy. The paradigm of rewriting with programmable strategies allows the separate definition of individual rewrite rules, which can be (re)used in different combinations with a choice of strategies to form a variety of transformations. Idioms such as cascading transformations, one-pass traversals, staged, and local transformations cater for different styles of applying strategies.
Next up: The next chapters give an in depth overview of the constructs for composing strategies.
Table of Contents
In the previous chapter we saw that pure term rewriting is not adequate for program transformation because of the lack of control over the application of rules. Attempts to encoding such control within the pure rewriting paradigm lead to functionalized control by means of extra rules and constructors at the expense of traversal overhead and at the loss of the separation of rules and strategies. By selecting the appropriate rules and strategy for a transformation, Stratego programmers can control the application of rules, while maintaining the separation of rules and strategies and keeping traversal overhead to a minimum.
We saw that many transformation problems can be solved by alternative strategies such as a one-pass bottom-up or top-down traversals. Others can be solved by selecting the rules that are applied in an innermost normalization, rather than all the rules in a specification. However, no fixed set of such alternative strategies will be sufficient for dealing with all transformation problems. Rather than providing one or a few fixed collection of rewriting strategies, Stratego supports the composition of strategies from basic building blocks with a few fundamental operators.
While we have seen rules and strategies in the previous chapters, we have been vague about what kinds of things they are. In this chapter we define the basic notions of rules and strategies, and we will see how new strategies and strategy combinators can be defined. The next chapters will then introduce the basic combinators used for composition of strategies.
A named rewrite rule is a declaration of the form
L : p1 -> p2
where L
is the rule name, p1
the left-hand
side term pattern, and p2
the right-hand side term
pattern.
A rule defines a transformation on terms.
A rule can be applied through its name to a
term.
It will transform the term if it matches with p1
, and
will replace the term with p2
instantiated with the
variables bound during the match to p1
.
The application fails if the term does not
match p1
.
Thus, a transformation is a partial
function from terms to terms
Let's look at an example. The SwapArgs
rule swaps the
subterms of the Plus
constructor. Note that it is
possible to introduce rules on the fly in the Stratego Shell.
stratego>
SwapArgs : Plus(e1,e2) -> Plus(e2,e1)
Now we create a new term, and apply the SwapArgs
rule
to it by calling its name at the prompt. (The build !t
of a term replaces the current term by t
, as will be
explained in Chapter 16.)
stratego>
!Plus(Var("a"),Int("3")) Plus(Var("a"),Int("3"))stratego>
SwapArgs Plus(Int("3"),Var("a"))
The application of SwapArgs
fails when applied to a
term to which the left-hand side does not match. For example, since
the pattern Plus(e1,e2)
does not match with a term
constructed with Times
the following application fails:
stratego>
!Times(Int("4"),Var("x")) Times(Int("4"),Var("x"))stratego>
SwapArgs command failed
A rule is applied at the root of a term, not at
one of its subterms. Thus, the following application fails even
though the term contains a Plus
subterm:
stratego>
!Times(Plus(Var("a"),Int("3")),Var("x")) Times(Plus(Var("a"),Int("3")),Var("x"))stratego>
SwapArgs command failed
Likewise, the following application only transforms the outermost
occurrence of Plus
, not the inner occurrence:
stratego>
!Plus(Var("a"),Plus(Var("x"),Int("42"))) Plus(Var("a"),Plus(Var("x"),Int("42")))stratego>
SwapArgs Plus(Plus(Var("x"),Int("42")),Var("a"))
Finally, there may be multiple rules with the same name. This has
the effect that all rules with that name will be tried in turn until
one succeeds, or all fail. The rules are tried in some undefined
order. This means that it only makes sense to define rules with the
same name if they are mutually exclusive, that is, do not have
overlapping left-hand sides. For example, we can extend the
definition of SwapArgs
with a rule for the
Times
constructor, as follows:
stratego>
SwapArgs : Times(e1, e2) -> Times(e2, e1)
Now the rule can be applied to terms with a Plus
and a Times
constructor, as
illustrated by the following applications:
stratego>
!Times(Int("4"),Var("x")) Times(Int("4"),Var("x"))stratego>
SwapArgs Times(Var("x"),Int("4"))stratego>
!Plus(Var("a"),Int("3")) Plus(Var("a"),Int("3"))stratego>
SwapArgs Plus(Int("3"),Var("a"))
Later we will see that a rule is nothing more than a syntactical convention for a strategy definition.
A rule defines a transformation, that is, a partial function from terms to terms. A strategy expression is a combination of one or more transformations into a new transformation. So, a strategy expression also defines a transformation, i.e., a partial function from terms to terms. Strategy operators are functions from transformations to transformations.
In the previous
chapter we saw some examples of strategy expressions.
Lets examine these examples in the light of our new definition.
First of all, rule names are basic strategy
expressions. If we import module prop-laws
, we have at
our disposal all rules it defines as basic strategies:
stratego>
import prop-lawsstratego>
!Impl(True(), Atom("p")) Impl(True, Atom("p"))stratego>
DefI Or(Not(True),Atom("p"))
Next, given a collection of rules we can create more complex
transformations by means of strategy operators. For example, the
innermost
strategy creates from a collection of rules a
new transformation that exhaustively applies those rules.
stratego>
!Eq(Atom("p"), Atom("q")) Eq(Atom("p"),Atom("q"))stratego>
innermost(DefI <+ DefE <+ DAOL <+ DAOR <+ DN <+ DMA <+ DMO) Or(Or(And(Not(Atom("p")),Not(Atom("q"))), And(Not(Atom("p")),Atom("p"))), Or(And(Atom("q"),Not(Atom("q"))), And(Atom("q"),Atom("p"))))
(Exercise: add rules to this composition that remove tautologies or
false propositions.)
Here we see that the rules are first combined using the choice
operator <+
into a composite transformation, which
is the argument of the innermost
strategy.
The innermost
strategy always succeeds (but may not
terminate), but this is not the case for all strategies.
For example bottomup(DefI)
will not succeed, since it
attempts to apply rule DefI
to all subterms, which is
clearly not possible.
Thus, strategies extend the property of rules that they are
partial functions from terms to terms.
Observe that in the composition innermost(...)
, the
term to which the transformation is applied is never mentioned.
The `current term', to which a transformation is applied is
often implicit in the definition of a strategy.
That is, there is no variable that is bound to the current term and
then passed to an argument strategy.
Thus, a strategy operator such as innermost
is a
function from transformations to transformations.
While strategies are functions, they are not necessarily
pure functions. Strategies in Stratego may
have side effects such as performing input/output operations.
This is of course necessary in the implementation of basic tool
interaction such as provided by io-wrap
, but is also
useful for debugging. For example, the debug
strategy
prints the current term, but does not transform it. We can use it to
visualize the way that innermost
transforms a term.
stratego>
!Not(Impl(Atom("p"), Atom("q"))) Not(Impl(Atom("p"),Atom("q")))stratego>
innermost(debug(!"in: "); (DefI <+ DefE <+ DAOL <+ DAOR <+ DN <+ DMA <+ DMO); debug(!"out: ")) in: p in: Atom("p") in: q in: Atom("q") in: Impl(Atom("p"),Atom("q")) out: Or(Not(Atom("p")),Atom("q")) in: p in: Atom("p") in: Not(Atom("p")) in: q in: Atom("q") in: Or(Not(Atom("p")),Atom("q")) in: Not(Or(Not(Atom("p")),Atom("q"))) out: And(Not(Not(Atom("p"))),Not(Atom("q"))) in: p in: Atom("p") in: Not(Atom("p")) in: Not(Not(Atom("p"))) out: Atom("p") in: p in: Atom("p") in: q in: Atom("q") in: Not(Atom("q")) in: And(Atom("p"),Not(Atom("q"))) And(Atom("p"),Not(Atom("q")))
This session nicely shows how innermost traverses the term it
transforms. The in:
lines show terms to which it
attempts to apply a rule, the out:
lines indicate when
this was successful and what the result of applying the rule was.
Thus, innermost
performs a post-order traversal
applying rules after transforming the subterms of a term.
(Note that when applying debug
to a string constant,
the quotes are not printed.)
Stratego programs are about defining transformations in the form of rules and strategy expressions that combine them. Just defining strategy expressions does not scale, however. Strategy definitions are the abstraction mechanism of Stratego and allow naming and parameterization of strategy expresssions for reuse.
A simple strategy definition names a strategy expression. For
instance, the following module defines a combination of rules
(dnf-rules
), and some strategies based on it:
module dnf-strategies imports libstrategolib prop-dnf-rules strategies dnf-rules = DefI <+ DefE <+ DAOL <+ DAOR <+ DN <+ DMA <+ DMO dnf = innermost(dnf-rules) dnf-debug = innermost(debug(!"in: "); dnf-rules; debug(!"out: ")) main = io-wrap(dnf)
Note how dnf-rules
is used in the definition of
dnf
, and dnf
itself in the definition of
main
.
In general, a definition of the form
f = s
introduces a new transformation f
, which can be invoked
by calling f
in a strategy expression, with the effect
of executing strategy expression s
.
The expression should have no free variables. That is, all strategie
called in s
should be defined strategies.
Simple strategy definitions just introduce names for strategy
expressions.
Still such strategies have an argument, namely the implicit current
term.
Strategy definitions with strategy and/or term parameters can be used to define transformation schemas that can instantiated for various situations.
A parameterized strategy definition of the form
f(x1,...,xn | y1,..., ym) = s
introduces a user-defined operator f
with
n
strategy arguments and
m
term arguments. Such a
user-defined strategy operator can be called as
f(s1,...,sn|t1,...,tm)
by providing it n
argument strategies and m
argument terms. The meaning
of such a call is the body s
of the definition in which
the actual arguments have been substituted for the formal arguments.
Strategy arguments and term arguments can be left out of calls and
definitions. That is, a call f(|)
without strategy and
term arguments can be written as f()
, or even just
f
. A call f(s1,..., sn|)
without term
arguments can be written as f(s1,...,sn)
The same holds
for definitions.
As we will see in the coming chapters, strategies such as
innermost
, topdown
, and
bottomup
are not built into the
language, but are defined using strategy definitions in
the language itself using more basic combinators, as illustrated by
the following definitions (without going into the exact meaning of
these definitions):
strategies try(s) = s <+ id repeat(s) = try(s; repeat(s)) topdown(s) = s; all(topdown(s)) bottomup(s) = all(bottomup(s)); s
Such parameterized strategy operators are invoked by providing
arguments for the parameters. Specifically, strategy arguments are
instantiated by means of strategy expressions. Wherever the argument
is invoked in the body of the definition, the strategy expression is
invoked.
For example, in the previous chapter we
saw the following instantiations of the topdown
,
try
, and repeat
strategies:
module prop-desugar // ... strategies desugar = topdown(try(DefI <+ DefE)) impl-nf = topdown(repeat(DefN <+ DefA2 <+ DefO1 <+ DefE))
There can be multiple definitions with the same name but different numbers of parameters. Such definitions introduce different strategy operators.
Strategy definitions at top-level are visible everywhere. Sometimes
it is useful to define a local strategy
operator. This can be done using a let expression of the form
let d* in s end
, where d*
is a list of
definitions.
For example, in the following strategy expression, the definition of
dnf-rules
is only visible in the instantiation of
innermost
:
let dnf-rules = DefI <+ DefE <+ DAOL <+ DAOR <+ DN <+ DMA <+ DMO in innermost(dnf-rules) end
The current version of Stratego does not support hidden strategy definitions at the module level. Such a feature is under consideration for a future version.
As we saw in Chapter 12, a Stratego program can introduce several rules with the same name. It is even possible to extend rules across modules. This is also possible for strategy definitions. If two strategy definitions have the same name and the same number of parameters, they effectively define a single strategy that tries to apply the bodies of the definitions in some undefined order. Thus, a definition of the form
strategies f = s1 f = s2
entails that a call to f
has the effect of first trying
to apply s1
, and if that fails applies s2
,
or the other way around. Thus, the definition above is either
translated to
strategies f = s1 <+ s2
or to
strategies f = s2 <+ s1
Stratego provides combinators for composing transformations and
basic operators for analyzing, creating and traversing
terms. However, it does not provide built-in support for other types
of computation such as input/output and process control. In order to
make such functionality available to Stratego programmers, the
language provides access to user-definable
primitive strategies through the
prim
construct. For example, the following call to
prim
invokes the SSL_printnl
function from
the native code of the C library:
stratego>
prim("SSL_printnl", stdout(), ["foo", "bar"])
foobar
""
In general, a call prim("f", s* | t*)
represents a call
to a primitive function f
with
strategy arguments s*
and term arguments
t*
.
Note that the `current' term is not passed automatically as
argument.
This mechanism allows the incorporation of mundane tasks such as arithmetic, I/O, and other tasks not directly related to transformation, but necessary for the integration of transformations with the other parts of a transformation system.
Primitive functions should take ATerms as arguments. It is not
possible to use `unboxed' values, i.e., raw native types. This
requires writing a wrapper function in C. For example, the addition
of two integers is defined via a call to a primitive
prim("SSL_addi",x,y)
, where the argument should
represent integer ATerms, not C integers.
Implementing Primitives. The Stratego Library provides all the primitives for I/O, arithmetic, string processing, and process control that are usually needed in Stratego programs. However, it is possible to add new primitives to a program as well. That requires linking with the compiled program a library that implements the function. See the documentation of strc for instructions.
The Stratego Compiler is a whole program compiler. That is, the compiler includes all definitions from imported modules (transitively) into the program defined by the main module (the one being compiled). This is the reason that the compiler takes its time to compile a program. To reduce the compilation effort and the size of the resulting programs it is possible to create separately compiled libraries of Stratego definitions. The strategies that such a library provides are declared as external definitions. A declaration of the form
external f(s1 ... sn | x1 ... xm)
states that there is an externally defined strategy operator
f
with n
strategy arguments and
m
term arguments. When compiling a program with
external definitions a library should be provided that implements
these definitions.
The Stratego Library is provided as a separately compiled library.
The libstrateglib
module that we have been using in the
example programs contains external definitions for all strategies in
the library, as illustrated by the following excerpt:
module libstrategolib // ... strategies // ... external io-wrap(s) external bottomup(s) external topdown(s) // ...
When compiling a program using the library we used the -la
stratego-lib
option to provide the implementation of those
definitions.
External Definitions cannot be Extended. Unlike definitions imported in the normal way, external definitions cannot be extended. If we try to compile a module extending an external definition, such as
module wrong imports libstrategolib strategies bottomup(s) = fail
compilation fails:
$
strc -i wrong.str
[ strc | info ] Compiling 'wrong.str'
error: redefining external definition: bottomup/1-0
[ strc | error ] Compilation failed (3.66 secs)
Creating Libraries.
It is possible to create your own Stratego libraries as
well. Currently that exposes you to a bit of C compilation giberish;
in the future this may be incorporated in the Stratego compiler.
Lets create a library for the rules and strategy definitions in the
prop-laws
module. We do this using the
--library
option to indicate that a library is being
built, and the -c
option to indicate that we are only
interested in the generated C code.
$
strc -i prop-laws.str -c -o libproplib.rtree --library [ strc | info ] Compiling 'proplib.str' [ strc | info ] Front-end succeeded : [user/system] = [4.71s/0.77s] [ strc | info ] Abstract syntax in 'libproplib.rtree' [ strc | info ] Concrete syntax in 'libproplib.str' [ strc | info ] Export of externals succeeded : [user/system] = [2.02s/0.11s] [ strc | info ] Back-end succeeded : [user/system] = [6.66s/0.19s] [ strc | info ] Compilation succeeded : [user/system] = [13.4s/1.08s]$
rm libproplib.str
The result is of this compilation is a module
libproplib
that contains the external definitions of
the module and those inherited from
libstrategolib
. (This module comes in to versions; one
in concrete syntax libproplib.str
and one in abstract
syntax libproplib.rtree
; for some obscure reason you
should throw away the .str
file.)
Furthermore, the Stratego Compiler produces a C program
libproplib.c
with the implementation of the
library. This C program should be turned into an object library
using libtool
, as follows:
$
libtool --mode=compile gcc -g -O -c libproplib.c -o libproplib.o -I <path/to/aterm-stratego/include> ...$
libtool --mode=link gcc -g -O -o libproplib.la libproplib.lo ...
The result is a shared library libproplib.la
that can
be used in other Stratego programs.
(TODO: the production of the shared library should really be
incorporated into strc.)
Using Libraries. Programmers that want to use your library can now import the module with external definitions, instead of the original module.
module dnf-tool imports libproplib strategies main = main-dnf
This program can be compiled in the usual way, adding the new library to the libraries that should be linked against:
$
strc -i dnf-tool.str -la stratego-lib -la ./libproplib.la$
cat test3.prop And(Impl(Atom("r"),And(Atom("p"),Atom("q"))),ATom("p"))$
./dnf-tool -i test3.prop Or(And(Not(Atom("r")),ATom("p")),And(And(Atom("p"),Atom("q")),ATom("p")))
To correctly deploy programs based on shared libraries requires some additional effort. Chapter 29 explains how to create deployable packages for your Stratego programs.
Strategies can be called dynamically by name, i.e., where the name
of the strategy is specified as a string. Such calls can be made
using the call
construct, which has the form:
call(f | s1, ..., sn | t1, ..., tn)
where f
is a term that should evaluate to a string,
which indicates the name of the strategy to be called, followed by a
list of strategy arguments, and a list of term arguments.
Dynamic calls allow the name of the strategy to be computed at run-time. This is a rather recent feature of Stratego that was motivated by the need for call-backs from a separately compiled Stratego library combined with the computation of dynamic rule names. Otherwise, there is not yet much experience with the feature.
In the current version of Stratego it is necessary to 'register' a
strategy to be able to call it dynamically. (In order to avoid
deletion in case it is not called explicitly somewhere in the
program.) Strategies are registered by means of a dummy strategy
definition DYNAMIC-CALLS
with calls to the strategies
that should called dynamically.
DYNAMICAL-CALLS = foo(id)
We have learned that rules and strategies define transformations, that is, functions from terms to terms that can fail, i.e., partial functions. Rule and strategy definitions introduce names for transformations. Parameterized strategy definitions introduce new strategy operators, functions that construct transformations from transformations.
Primitive strategies are transformations that are implemented in
some language other than Stratego (usually C), and are called
through the prim
construct.
External definitions define an interface to a separately compiled
library of Stratego definitions.
Dynamic calls allow the name of the strategy to be called to be
computed as a string.
Table of Contents
We have seen the use of strategies to combine rules into complex
transformations.
Rather than providing a fixed set of high-level strategy operators
such as bottomup
, topdown
, and
innermost
, Stratego provides a small set of basic
combinators, that can be used to create a wide variety of
strategies.
In Chapter 15 until Chapter 18 we will introduce these
combinators.
In this chapter we start with a set of combinators for sequential
composition and choice of strategies.
The most basic operations in Stratego are id
and
fail
.
The identity strategy id
always succeeds and behaves as
the identity function on terms.
The failure strategy fail
always fails.
The operations have no side effects.
stratego>
!Foo(Bar()) Foo(Bar)stratego>
id Foo(Bar)stratego>
fail command failed
The sequential composition s1 ; s2
of the strategies
s1
and s2
first applies the strategy
s1
to the subject term and then s2
to the
result of that first application. The strategy fails if either
s1
or s2
fails.
Properties.
Sequential composition is associative. Identity is a left and right
unit for sequential composition; since id
always
succeeds and leaves the term alone, it has no additional effect to
the strategy that it is composed with. Failure is a left zero for
sequential composition; since fail
always fails the
next strategy will never be reached.
(s1; s2) ; s3 = s1; (s2; s3) id; s = s s; id = s fail; s = fail
However, not for all strategies s
we have that failure
is a right zero for sequential composition:
s ; fail = fail (is not a law)
Although the composition s; fail
will always fail, the
execution of s
may have side effects that are not
performed by fail
. For example, consider printing a
term in s
.
Examples. As an example of the use of sequential composition consider the following rewrite rules.
stratego>
A : P(Z(),x) -> xstratego>
B : P(S(x),y) -> P(x,S(y))
The following session shows the effect of first applying
B
and then A
:
stratego>
!P(S(Z()), Z()) P(S(Z),Z)stratego>
B P(Z,S(Z))stratego>
A S(Z)
Using the sequential composition of the two rules, this effect can be achieved `in one step':
stratego>
!P(S(Z()),Z()) P(S(Z),Z)stratego>
B; A S(Z)
The following session shows that the application of a composition fails if the second strategy in the composition fails to apply to the result of the first:
stratego>
!P(S(Z()),Z()) P(S(Z),Z)stratego>
B; B command failed
Choosing between rules to apply is achieved using one of several choice combinators, all of which are based on the guarded choice combinator. The common approach is that failure to apply one strategy leads to backtracking to an alternative strategy.
The left choice or deterministic choice s1 <+ s2
tries to apply s1
and s2
in that
order. That is, it first tries to apply s1
, and if that
succeeds the choice succeeds. However, if the application of
s1
fails, s2
is applied to the
original term.
Properties.
Left choice is associative. Identity is a left zero for left choice;
since id
always succeeds, the alternative strategy will
never be tried. Failure is a left and right unit for left choice;
since fail
always fails, the choice will always
backtrack to the alternative strategy, and use of fail
as alternative strategy is pointless.
(s1 <+ s2) <+ s3 = s1 <+ (s2 <+ s3) id <+ s = id fail <+ s = s s <+ fail = s
However, identity is not a right zero for left choice. That is,
not for all strategies s
we have that
s <+ id = s (is not a law)
The expression s <+ id
always succeeds, even
(especially) in the case that s
fails, in which case
the right-hand side of the equation fails of course.
Local Backtracking. The left choice combinator is a local backtracking combinator. That is, the choice is committed once the left-hand side strategy has succeeded, even if the continuation strategy fails. This is expressed by the fact that the property
(s1 <+ s2); s3 = (s1; s3) <+ (s2; s3) (is not a law)
does not hold for all s1
,
s2
, and s3
.
The difference is illustrated by the following applications:
stratego>
!P(S(Z()),Z()) P(S(Z),Z)stratego>
(B <+ id); B command failedstratego>
!P(S(Z()),Z()) P(S(Z),Z)stratego>
(B <+ id) P(Z,S(Z))stratego>
B command failedstratego>
(B; B) <+ (id; B) P(Z,S(Z))
In the application of (B <+ id); B
, the first
application of B
succeeds after which the choice is
committed. The subsequent application of B
then fails.
This equivalent to first applying (B <+ id)
and then
applying B
to the result.
The application of (B; B) <+ (id; B)
, however, is
successful; the application of B; B
fails, after which
the choice backtracks to id; B
, which succeeds.
Choosing between Transformations. The typical use of left choice is to create a composite strategy trying one from several possible transformations. If the strategies that are composed are mutually exclusive, that is, don't succeed for the same terms, their sum is a transformation that (deterministically) covers a larger set of terms. For example, consider the following two rewrite rules:
stratego>
PlusAssoc : Plus(Plus(e1, e2), e3) -> Plus(e1, Plus(e2, e3))stratego>
PlusZero : Plus(Int("0"),e) -> e
These rules are mutually exclusive, since there is no term that
matches the left-hand sides of both rules. Combining the rules with
left choice into PlusAssoc <+ PlusZero
creates a
strategy that transforms terms matching both rules as illustrated by
the following applications:
stratego>
!Plus(Int("0"),Int("3")) Plus(Int("0"),Int("3"))stratego>
PlusAssoc command failedstratego>
PlusAssoc <+ PlusZero Int("3")stratego>
!Plus(Plus(Var("x"),Int("42")),Int("3")) Plus(Plus(Var("x"),Int("42")),Int("3"))stratego>
PlusZero command failedstratego>
PlusAssoc <+ PlusZero Plus(Var("x"),Plus(Int("42"),Int("3")))
Ordering Overlapping Rules.
When two rules or strategies are mutually exlusive the order of
applying them does not matter.
In cases where strategies are overlapping, that is, succeed for the
same terms, the order becomes crucial to determining the semantics
of the composition.
For example, consider the following rewrite rules reducing
applications of Mem
:
stratego>
Mem1 : Mem(x,[]) -> False()stratego>
Mem2 : Mem(x,[x|xs]) -> True()stratego>
Mem3 : Mem(x,[y|ys]) -> Mem(x,ys)
Rules Mem2
and Mem3
have overlapping
left-hand sides. Rule Mem2
only applies if the first
argument is equal to the head element of the list in the second
argument. Rule Mem3
applies always if the list in the
second argument is non-empty.
stratego>
!Mem(1, [1,2,3]) Mem(1, [1,2,3])stratego>
Mem2 Truestratego>
!Mem(1, [1,2,3]) Mem(1,[1,2,3])stratego>
Mem3 Mem(1,[2,3])
In such situations, depending on the order of the rules, differents
results are produced. (The rules form a non-confluent rewriting
system.)
By ordering the rules as Mem2 <+ Mem3
, rule
Mem2
is tried before Mem3
, and we have a
deterministic transformation strategy.
Try.
A useful application of <+
in combination with
id
is the reflexive closure of a strategy
s
:
try(s) = s <+ id
The user-defined strategy combinator try
tries to apply
its argument strategy s
, but if that fails, just
succeeds using id
.
Sometimes it is not desirable to backtrack to the alternative
specified in a choice. Rather, after passing a
guard, the choice should be committed. This can
be expressed using the guarded left choice
operator s1 < s2 + s3
.
If s1
succeeds s2
is applied, else
s3
is applied. If s2
fails, the complete
expression fails; no backtracking to s3
takes place.
Properties.
This combinator is a generalization of the left choice combinator
<+
.
s1 <+ s2 = s1 < id + s2
The following laws make clear that the `branches' of the choice are selected by the success or failure of the guard:
id < s2 + s3 = s2 fail < s2 + s3 = s3
If the right branch always fails, the construct reduces to the sequential composition of the guard and the left branch.
s1 < s2 + fail = s1; s2
Guarded choice is not associative:
(s1 < s2 + s3) < s4 + s5 = s1 < s2 + (s3 < s4 + s5) (not a law)
To see why consider the possible traces of these expressions. For example,
when s1
and s2
succeed subsequently, the left-hand
side expression calls s4
, while the right-hand side expression does
not.
However, sequential composition distributes over guarded choice from left and right:
(s1 < s2 + s3); s4 = s1 < (s2; s4) + (s3; s4) s0; (s1 < s2 + s3) = (s0; s1) < s2 + s3
Examples.
The guarded left choice operator is most useful for the
implementation of higher-level control-flow strategies.
For example, the negation not(s)
of a strategy s
, succeeds if s
fails, and
fails when it succeeds:
not(s) = s < fail + id
Since failure discards the effect of a (succesful) transformation,
this has the effect of testing whether s
succeeds. So
we have the following laws for not
:
not(id) = fail not(fail) = id
However, side effects performed by s
are not undone, of
course. Therefore, the following equation does
not hold:
not(not(s)) = s (not a law)
Another example of the use of guarded choice is the
restore-always
combinator:
restore-always(s, r) = s < r + (r; fail)
It applies a `restore' strategy r
after applying a
strategy s
, even if s
fails, and preserves
the success/failure behaviour of s
. Since
fail
discards the transformation effect of
r
, this is mostly useful for ensuring that some
side-effecting operation is done (or undone) after applying
s
.
For other applications of guarded choice, Stratego provides syntactic sugar.
The guarded choice combinator is similar to the traditional
if-then-else construct of programming languages. The difference is
that the `then' branch applies to the result of the application of
the condition.
Stratego's if s1 then s2 else s3 end
construct is more
like the traditional construct since both branches apply to the
original term. The condition strategy is only used to test if it
succeeds or fails, but it's transformation effect is
undone. However, the condition strategy s1
is still
applied to the current term.
The if s1 then s2 end
strategy is similar; if the
condition fails, the strategy succeeds.
Properties.
The if-then-else-end
strategy is just syntactic sugar
for a combination of guarded choice and the where
combinator:
if s1 then s2 else s3 end = where(s1) < s2 + s3
The strategy where(s)
succeeds if s
succeeds, but returns the original subject term. The implementation
of the where
combinator will be discussed in Chapter 16.
The following laws show that the branches are selected by success or
failure of the condition:
if id then s2 else s3 end = s2 if fail then s2 else s3 end = s3
The if-then-end
strategy is an abbreviation for the
if-then-else-end
with the identity strategy as right
branch:
if s1 then s2 end = where(s1) < s2 + id
Examples.
The inclusive or or(s1, s2)
succeeds if one of the strategies s1
or s2
succeeds, but guarantees that both are applied, in the order
s1
first, then s2
:
or(s1, s2) = if s1 then try(where(s2)) else where(s2) end
This ensures that any side effects are always performed, in contrast
to s1 <+ s2
, where s2
is only executed
if s1
fails. (Thus, left choice implements a short
circuit Boolean or.)
Similarly, the following and(s1, s2)
combinator is the
non-short circuit version of Boolean conjunction:
and(s1, s2) = if s1 then where(s2) else where(s2); fail end
The switch
construct is an n-ary branching construct
similar to its counter parts in other programming languages. It is
defined in terms of guarded choice.
The switch
construct has the following form:
switch s0 case s1 : s1' case s2 : s2' ... otherwise : sdef end
The switch first applies the s0
strategy to the current
term t
resulting in a term t'
. Then it
tries the cases in turn applying each si
to
t'
. As soon as this succeeds the corresponding case is
selected and si'
is applied to the t
, the
term to which the switch was applied. If none of the cases applies,
the default strategy sdef
from the
otherwise
is applied.
Properties. The switch construct is syntactic sugar for a nested if-then-else:
{x : where(s0 => x); if <s1> x then s1' else if <s2> x then s2' else if ... then ... else sdef end end end}
This translation uses a couple of Stratego constructs that we haven't discussed so far.
Examples. TODO
The deterministic left choice operator prescribes that the left alternative should be tried before the right alternative, and that the latter is only used if the first fails. There are applications where it is not necessary to define the order of the alternatives. In those cases non-deterministic choice can be used.
The non-deterministic choice operator s1 + s2
chooses
one of the two strategies s1
or s2
to
apply, such that the one it chooses succeeds. If both strategies
fail, then the choice fails as well.
Operationally the choice operator first tries one strategy, and, if
that fails, tries the other. The order in which this is done is
undefined, i.e., arbitrarily chosen by the compiler.
The +
combinator is used to model modular composition
of rewrite rules and strategies with the same name. Multiple
definitions with the same name, are merged into a single definition
with that name, where the bodies are composed with
+
. The following transformation illustrates this:
f = s1 f = s2 ==> f = s1 + s2
This transformation is somewhat simplified; the complete transformation needs to take care of local variables and parameters.
While the +
combinator is used internally by the
compiler for this purpose, programmers are advised
not to use this combinator, but rather use the
left choice combinator <+
to avoid surprises.
Repeated application of a strategy can be achieved with recursion.
There are two styles for doing this; with a recursive definition or
using the fixpoint operator rec
. A recursive
definition is a normal strategy definition with a recursive call in
its body.
f(s) = ... f(s) ...
Another way to define recursion is using the fixpoint operator
rec x(s)
, which recurses on applications of
x
within s
. For example, the definition
f(s) = rec x(... x ...)
is equivalent to the one above.
The advantage of the rec
operator is that it allows
the definition of an unnamed strategy expression to be recursive.
For example, in the definition
g(s) = foo; rec x(... x ...); bar
the strategy between foo
and bar
is a
recursive strategy that does not recurse to
g(s)
.
Originally, the rec
operator was the only way to
define recursive strategies. It is still in the language in the
first place because it is widely used in many existing programs,
and in the second place because it can be a concise expression of a
recursive strategy, since call parameters are not included in the
call. Furthermore, all free variables remain in scope.
Examples.
The repeat
strategy applies a transformation
s
until it fails. It is defined as a recursive
definition using try
as follows:
try(s) = s <+ id repeat(s) = try(s; repeat(s))
An equivalent definition using rec
is:
repeat(s) = rec x(try(s; x))
The following Stratego Shell session illustrates how it works. We first define the strategies:
stratego>
try(s) = s <+ idstratego>
repeat(s) = try(s; repeat(s))stratego>
A : P(Z(),x) -> xstratego>
B : P(S(x),y) -> P(x,S(y))
Then we observe that the repeated application of the individual
rules A
and B
reduces terms:
stratego>
!P(S(Z()),Z()) P(S(Z),Z)stratego>
B P(Z,S(Z))stratego>
A S(Z)
We can automate this using the repeat
strategy, which
will repeat the rules as often as possible:
stratego>
!P(S(Z()),Z()) P(S(Z),Z)stratego>
repeat(A <+ B) S(Z)stratego>
!P(S(S(S(Z()))),Z()) P(S(S(S(Z))),Z)stratego>
repeat(A <+ B) S(S(S(Z)))
To illustrate the intermediate steps of the transformation we can
use debug
from the Stratego Library.
stratego>
import liblibstratego>
!P(S(S(S(Z()))),Z()) P(S(S(S(Z))),Z)stratego>
repeat(debug; (A <+ B)) P(S(S(S(Z))),Z) P(S(S(Z)),S(Z)) P(S(Z),S(S(Z))) P(Z,S(S(S(Z)))) S(S(S(Z))) S(S(S(Z)))
A Library of Iteration Strategies.
Using sequential composition, choice, and recursion a large
variety of iteration strategies can be defined. The following
definitions are part of the Stratego Library (in module
strategy/iteration
).
repeat(s) = rec x(try(s; x)) repeat(s, c) = (s; repeat(s, c)) <+ c repeat1(s, c) = s; (repeat1(s, c) <+ c) repeat1(s) = repeat1(s, id) repeat-until(s, c) = s; if c then id else repeat-until(s, c) end while(c, s) = if c then s; while(c, s) end do-while(s, c) = s; if c then do-while(s, c) end
The following equations describe some relations between these strategies:
do-while(s, c) = repeat-until(s, not(c)) do-while(s, c) = s; while(c, s)
We have seen that rules and strategies can be combined into more
complex strategies by means of strategy combinators.
Cumulative effects are obtained by sequential composition of
strategies using the s1 ; s2
combinator.
Choice combinators allow a strategy to decide between alternative
transformations. While Stratego provides a variety of choice
combinators, they are all based on the guarded choice combinator
s1 < s2 + s3
.
Repetition of transformations is achieved using recursion, which can
be achieved through recursive definitions or the rec
combinator.
Next up: Chapter 16 shows the stuff that rewrite rules are made of.
Table of Contents
In previous chapters we have presented rewrite rules as basic transformation steps. However, rules are not really atomic transformation actions. To see this, consider what happens when the rewrite rule
DAOL : And(Or(x, y), z) -> Or(And(x, z), And(y, z))
is applied. First it matches the subject term against the pattern
And(Or(x, y), z)
in the left-hand side. This means that
a substitution for the variables x
, y
, and
z
is sought, that makes the pattern equal to the
subject term. If the match fails, the rule fails. If the match
succeeds, the pattern Or(And(x, z), And(y, z))
on the
right-hand side is instantiated with the bindings found during the
match of the left-hand side. The instantiated term then replaces the
original subject term. Furthermore, the rule limits the scope of the
variables occurring in the rule. That is, the variables
x
, y
, z
are local to this
rule. After the rule is applied the bindings to these variables are
invisible again.
Thus, rather than considering rules as the atomic actions of transformation programs, Stratego provides their constituents, that is building terms from patterns and matching terms against patterns, as atomic actions, and makes these available to the programmer. In this chapter, you will learn these basic actions and their use in the composition of more complex operations such as various flavours of rewrite rules.
The build operation !p
replaces the subject term with
the instantiation of the pattern p
using the bindings
from the environment to the variables occurring in p
.
For example, the strategy !Or(And(x, z), And(y, z))
replaces the subject term with the instantiation of
Or(And(x, z), And(y, z))
using bindings to variables
x
, y
and z
.
stratego>
!Int("10") Int("10")stratego>
!Plus(Var("a"), Int("10")) Plus(Var("a"), Int("10"))
It is possible to build terms with variables. We call this
building a term pattern. A pattern is a term with
meta-variables. The current term is replaced
by an instantiation of pattern p
.
stratego>
:binding e e is bound to Var("b")stratego>
!Plus(Var("a"),e) Plus(Var("a"),Var("b"))stratego>
!e Var("b")
Pattern matching allows the analysis of terms. The simplest case
is matching against a literal term.
The match operation ?t
matches the subject term
against the term t
.
Plus(Var("a"),Int("3"))stratego>
?Plus(Var("a"),Int("3"))stratego>
?Plus(Int("3"),Var("b")) command failed
Matching against a term pattern with
variables binds those variables to (parts of) the current term.
The match strategy ?
compares the current term (x
t
) to
variable x
. It binds variable
x
to term t
in the environment. A variable can only be bound once, or to the
same term.
Plus(Var("a"),Int("3"))stratego>
?estratego>
:binding e e is bound to Plus(Var("a"),Int("3"))stratego>
!Int("17")stratego>
?e command failed
The general case is matching against an arbitrary term pattern.
The match strategy ?
compares the current term to a pattern
p
p
. It will add bindings for the
variables in pattern p
to the
environment. The wildcard _
in a match will match any
term.
Plus(Var("a"),Int("3"))stratego>
?Plus(e,_)stratego>
:binding e e is bound to Var("a") Plus(Var("a"),Int("3"))
Patterns may be non-linear. Multiple occurences of the same variable can occur and each occurence matches the same term.
Plus(Var("a"),Int("3"))stratego>
?Plus(e,e) command failedstratego>
!Plus(Var("a"),Var("a"))stratego>
?Plus(e,e)stratego>
:binding e e is bound to Var("a")
Non-linear pattern matching is a way to test equality of terms. Indeed the equality predicates from the Stratego Library are defined using non-linear pattern matching:
equal = ?(x, x) equal(|x) = ?x
The equal
strategy tests whether the current term is a
a pair of the same terms.
The equal(|x)
strategy tests whether the current term
is equal to the argument term.
stratego>
equal = ?(x, x)stratego>
!("a", "a") ("a", "a")stratego>
equal ("a", "a")stratego>
!("a", "b") ("a", "b")stratego>
equal command failedstratego>
equal(|x) = ?xstratego>
!Foo(Bar()) Foo(Bar)stratego>
equal(|Foo(Baz())) command failedstratego>
equal(|Foo(Bar())) Foo(Bar)
Match and build are first-class citizens in Stratego, which means that they can be used and combined just like any other strategy expressions. In particular, we can implement rewrite rules using these operations, since a rewrite rule is basically a match followed by a build. For example, consider the following combination of match and build:
Plus(Var("a"),Int("3"))
stratego>
?Plus(e1, e2); !Plus(e2, e1)
Plus(Int("3"),Var("a"))
This combination first recognizes a term, binds variables to the pattern in the match, and then replaces the current term with the instantiation of the build pattern. Note that the variable bindings are propagated from the match to the build.
Stratego provides syntactic sugar for various combinations of match and build. We'll explore these in the rest of this chapter.
An anonymous rewrite rule (p1 ->
p2)
transforms a term matching p1
into an
instantiation of p2
.
Such a rule is equivalent to the sequence ?p1; !p2
.
Plus(Var("a"),Int("3"))
stratego>
(Plus(e1, e2) -> Plus(e2, e1))
Plus(Int("3"),Var("a"))
Once a variable is bound it cannot be rebound to a different
term. Thus, once we have applied an anonymous rule once, its
variables are bound and the next time it is applied it only
succeeds for the same term. For example, in the next session the
second application of the rule fails, because e2
is
bound to Int("3")
and does not match with
Var("b")
.
stratego>
!Plus(Var("a"),Int("3")) Plus(Var("a"),Int("3"))stratego>
(Plus(e1,e2) -> Plus(e2,e1)) Plus(Int("3"),Var("a"))stratego>
:binding e1 e1 is bound to Var("a")stratego>
:binding e2 e2 is bound to Int("3")stratego>
!Plus(Var("a"),Var("b")) Plus(Var("a"),Var("b"))stratego>
(Plus(e1,e2) -> Plus(e2,e1)) command failed
To use a variable name more than once Stratego provides
term variable scope.
A scope {x1,...,xn : s}
locally undefines the
variables xi
. That is, the binding to a variable
xi
outside the scope is not visible inside it, nor is
the binding to xi
inside the scope visible outside
it.
For example, to continue the session above, if we wrap the
anonymous swap rule in a scope for its variables, it can be
applied multiple times.
stratego>
!Plus(Var("a"),Int("3")) Plus(Var("a"),Int("3"))stratego>
{e3,e4 : (Plus(e3,e4) -> Plus(e4,e3))} Plus(Var("a"),Int("3"))stratego>
:binding e3 e3 is not bound to a termstratego>
!Plus(Var("a"),Var("b")) Plus(Var("a"),Var("b"))stratego>
{e3,e4 : (Plus(e3,e4) -> Plus(e4,e3))} Plus(Var("b"),Var("a"))
Of course we can name such a scoped rule using a strategy definition, and then invoke it by its name:
stratego>
SwapArgs = {e1,e2 : (Plus(e1,e2) -> Plus(e2,e1))}stratego>
!Plus(Var("a"),Int("3")) Plus(Var("a"),Int("3"))stratego>
SwapArgs Plus(Int("3"),Var("a"))
When using match and build directly in a strategy definition, rather than in the form of a rule, the definition contains free variables. Strictly speaking such variables should be declared using a scope, that is one should write
SwapArgs = {e1,e2 : (Plus(e1,e2) -> Plus(e2,e1))}
However, since declaring all variables at the top of a definition is destracting and does not add much to the definition, such a scope declaration can be left out. Thus, one can write
SwapArgs = (Plus(e1,e2) -> Plus(e2,e1))
instead. The scope is automatically inserted by the compiler. This implies that there is no global scope for term variables. Of course, variables in inner scopes should be declared where necessary. In particular, note that variable scope is not inserted for strategy definitions in a let binding, such as
let SwapArgs = (Plus(e1,e2) -> Plus(e2,e1)) in ... end
While the variables are bound in the enclosing definition, they are
not restricted to SwapArgs
in this case, since in a let
you typically want to use bindings to variables in the enclosing
code.
Often it is useful to apply a strategy only to test whether some
property holds or to compute some auxiliary result. For this
purpose, Stratego provides the where(s)
combinator,
which applies s
to the current term, but restores that
term afterwards. Any bindings to variables are kept, however.
Plus(Int("14"),Int("3"))stratego>
where(?Plus(Int(i),Int(j)); <addS>(i,j) => k) Plus(Int("14"),Int("3"))stratego>
:binding i i is bound to "14"stratego>
:binding k k is bound to "17"
With the match and build constructs where(s)
is in fact
just syntactic sugar for {x: ?x; s; !x}
with
x
a fresh variable not occurring in s
.
Thus, the current subject term is saved by
binding it to a new variable x
, then the strategy
s
is applied, and finally, the original term is
restored by building x
.
We saw the use of where
in the definition of
if-then-else
in Chapter 15.
A simple rewrite rule succeeds if the match of the left-hand side
succeeds. Sometimes it is useful to place additional requirements on
the application of a rule, or to compute some value for use in the
right-hand side of the rule. This can be achieved with
conditional rewrite rules.
A conditional rule L: p1 -> p2 where s
is a simple rule
extended with an additional computation s
which should
succeed in order for the rule to apply.
The condition can be used to test properties of terms in the
left-hand side, or to compute terms to be used in the right-hand
side. The latter is done by binding such new terms to variables used
in the right-hand side.
For example, the EvalPlus
rule in the following session
uses a condition to compute the sum of i
and
j
:
stratego>
EvalPlus: Plus(Int(i),Int(j)) -> Int(k) where !(i,j); addS; ?kstratego>
!Plus(Int("14"),Int("3")) Plus(Int("14"),Int("3"))stratego>
EvalPlus Int("17")
A conditional rule can be desugared similarly to an unconditional rule. That is, a conditional rule of the form
L : p1 -> p2 where s
is syntactic sugar for
L = ?p1; where(s); !p2
Thus, after the match with p1
succeeds the strategy
s
is applied to the subject term. Only if the
application of s
succeeds, is the right-hand side
p2
built. Note that since s
is applied
within a where
, the build !p2
is applied
to the original subject term; only variable
bindings computed within s
can be used in
p2
.
As an example, consider the following constant folding rule, which reduces an addition of two integer constants to the constant obtained by computing the addition.
EvalPlus : Add(Int(i),Int(j)) -> Int(k) where !(i,j); addS; ?k
The addition is computed by applying the primitive strategy
add
to the pair of integers (i,j)
and
matching the result against the variable k
, which is
then used in the right-hand side. This rule is desugared to
EvalPlus = ?Add(Int(i),Int(j)); where(!(i,j); addS; ?k); !Int(k)
Sometimes it is useful to define a rule anonymously within a strategy expression. The syntax for anonymous rules with scopes is a bit much since it requires enumerating all variables. A `lambda' rule of the form
\ p1 -> p2 where s \
is an anonymous rewrite rule for which the variables in the
left-hand side p1
are local to the rule, that is,
it is equivalent to an expression of the form
{x1,...,xn : (p1 -> p2 where s)}
with x1
,...,xn
the variables of
p1
.
This means that any variables used in s
and
p2
that do not occur in
p1
are bound in the context of the rule.
A typical example of the use of an anonymous rule is
stratego>
![(1,2),(3,4),(5,6)] [(1,2),(3,4),(5,6)]stratego>
map(\ (x, y) -> x \ ) [1,3,5]
One frequently occuring scenario is that of applying a strategy to a term and then matching the result against a pattern. This typically occurs in the condition of a rule. In the constant folding example above we saw this scenario:
EvalPlus : Add(Int(i),Int(j)) -> Int(k) where !(i,j); addS; ?k
In the condition, first the term (i,j)
is built, then
the strategy addS
is applied to it, and finally the
result is matched against the pattern k
.
To improve the readability of such expressions, the following two
constructs are provided. The operation <s> p
captures the notion of applying a strategy to a
term, i.e., the scenario !p; s
. The operation s
=> p
capture the notion of applying a strategy to the current
subject term and then matching the result against the pattern
p
, i.e., s; ?p
. The combined operation
<s> p1 => p2
thus captures the notion of applying a
strategy to a term p1
and matching the result against
p2
, i.e, !t1; s; ?t2
. Using this notation
we can improve the constant folding rule above as
EvalPlus : Add(Int(i),Int(j)) -> Int(k) where <add>(i,j) => k
Applying Strategies in Build. Sometimes it useful to apply a strategy directly to a subterm of a pattern, for example in the right-hand side of a rule, instead of computing a value in a condition, binding the result to a variable, and then using the variable in the build pattern. The constant folding rule above, for example, could be further simplified by directly applying the addition in the right-hand side:
EvalPlus : Add(Int(i),Int(j)) -> Int(<add>(i,j))
This abbreviates the conditional rule above. In general, a strategy application in a build pattern can always be expressed by computing the application before the build and binding the result to a new variable, which then replaces the application in the build pattern.
Another example is the following definition of the
map(s)
strategy, which applies a strategy to each term
in a list:
map(s) : [] -> [] map(s) : [x | xs] -> [<s> x | <map(s)> xs]
Term wrapping and projection are concise idioms for constructing terms that wrap the current term and for extracting subterms from the current term.
One often write rules of the form \ x -> Foo(Bar(x))\
,
i.e. wrapping a term pattern around the current term. Using rule
syntax this is quite verbose. The syntactic abstraction of
term wraps, allows the concise specification of
such little transformations as !Foo(Bar(<id>))
.
In general, a term wrap is a build strategy !p[<s>]
containing one or more strategy applications <s>
that are not applied to a term.
When executing the the build operation, each occurrence of such a
strategy application <s>
is replaced with the term
resulting from applying s
to the current subject term,
i.e., the one that is being replaced by the build.
The following sessions illustrate some uses of term wraps:
3stratego>
!(<id>,<id>) (3,3)stratego>
!(<Fst; inc>,<Snd>) (4,3)stratego>
!"foobar" "foobar"stratego>
!Call(<id>, []) Call("foobar", [])stratego>
mod2 = <mod>(<id>,2)stratego>
!6 6stratego>
mod2 0
As should now be a common pattern, term projects are implemented by
translation to a combination of match and build expressions.
Thus, a term wrap !p[<s>]
is translated to a strategy
expression
{x: where(s => x); !p[x]}
where x
is a fresh variable not occurring in
s
.
In other words, the strategy s
is applied to the
current subject term, i.e., the term to which
the build is applied.
As an example, the term wrap !Foo(Bar(<id>))
is
desugared to the strategy
\{x: where(id => x); !Foo(Bar(x))}
which after simplification is equivalent to \{x: ?x;
!Foo(Bar(x))\
}, i.e., exactly the original lambda rule
\ x -> Foo(Bar(x))\
.
Term projections are the match dual of term wraps.
Term projections can be used to project a
subterm from a term pattern. For example, the expression
?And(<id>,x)
matches terms of the form
And(t1,t2)
and reduces them to the first subterm
t1
.
Another example is the strategy
map(?FunDec(<id>,_,_))
which reduces a list of function declarations to a list of the
names of the functions, i.e., the first arguments of the
FunDec
constructor.
Here are some more examples:
[1,2,3]stratego>
?[_|<id>] [2,3]stratego>
!Call("foobar", []) Call("foobar", [])stratego>
?Call(<id>, []) "foobar"
Term projections can also be used to apply additional constraints
to subterms in a match pattern. For example, ?Call(x,
<?args; length => 3>)
matches only with function calls
with three arguments.
A match expression ?p[<s>]
is desugared as
{x: ?p[x]; <s> x}
That is, after the pattern p[x]
matches, it is reduced
to the subterm bound to x
to which s
is
applied. The result is also the result of the projection.
When multiple projects are used within a match the outcome is
undefined, i.e., the order in which the projects will be performed
can not be counted on.
Table of Contents
In Chapter 13 we saw a number of idioms of strategic rewriting, which all involved tree traversal. In the previous chapters we saw how strategies can be used to control transformations and how rules can be broken down into the primitive actions match, build and scope. The missing ingredient are combinators for defining traversals.
There are many ways to traverse a tree. For example, a bottom-up
traversal, visits the subterms of a node before it visits the node
itself, while a top-down traversal visits nodes before it visits
children. One-pass traversals traverse the tree one time, while
fixed-point traversals, such as innermost
, repeatedly
traverse a term until a normal form is reached.
It is not desirable to provide built-in implementations for all traversals needed in transformations, since such a collection would necessarily be imcomplete. Rather we would like to define traversals in terms of the primitive ingredients of traversal. For example, a top-down, one-pass traversal strategy will first visit a node, and then descend to the children of a node in order to recursively traverse all subterms. Similarly, the bottom-up, fixed-point traversal strategy innermost, will first descend to the children of a node in order to recursively traverse all subterms, then visit the node itself, and possibly recursively reapply the strategy.
Traversal in Stratego is based on the observation that a full term traversal is a recursive closure of a one-step descent, that is, an operation that applies a strategy to one or more direct subterms of the subject term. By separating this one-step descent operator from recursion, and making it a first-class operation, many different traversals can be defined.
In this chapter we explore the ways in which Stratego supports the definition of traversal strategies. We start with explicitly programmed traversals using recursive traversal rules. Next, congruences operators provide a more concise notation for such data-type specific traversal rules. Finally, generic traversal operators support data type independent definitions of traversals, which can be reused for any data type. Given these basic mechanisms, we conclude with an an exploration of idioms for traversal and standard traversal strategies in the Stratego Library.
In Chapter 16 we saw
the following definition of the map
strategy, which
applies a strategy to each element of a list:
map(s) : [] -> [] map(s) : [x | xs] -> [<s> x | <map(s)> xs]
The definition uses explicit recursive calls to the strategy in the
right-hand side of the second rule.
What map
does is to traverse the
list in order to apply the argument strategy to all elements.
We can use the same technique to other term structures as well.
We will explore the definition of traversals using the propositional formulae from Chapter 13, where we introduced the following rewrite rules:
module prop-rules imports libstrategolib prop rules DefI : Impl(x, y) -> Or(Not(x), y) DefE : Eq(x, y) -> And(Impl(x, y), Impl(y, x)) DN : Not(Not(x)) -> x DMA : Not(And(x, y)) -> Or(Not(x), Not(y)) DMO : Not(Or(x, y)) -> And(Not(x), Not(y)) DAOL : And(Or(x, y), z) -> Or(And(x, z), And(y, z)) DAOR : And(z, Or(x, y)) -> Or(And(z, x), And(z, y)) DOAL : Or(And(x, y), z) -> And(Or(x, z), Or(y, z)) DOAR : Or(z, And(x, y)) -> And(Or(z, x), Or(z, y))
In Chapter 13 we saw how a
functional style of rewriting could be encoded using extra
constructors. In Stratego we can achieve a similar approach by using
rule names, instead of extra constructors. Thus, one way to achieve
normalization to disjunctive normal form, is the use of an
explicitly programmed traversal, implemented using recursive rules,
similarly to the map
example above:
module prop-dnf4 imports libstrategolib prop-rules strategies main = io-wrap(dnf) rules dnf : True -> True dnf : False -> False dnf : Atom(x) -> Atom(x) dnf : Not(x) -> <dnfred> Not (<dnf>x) dnf : And(x, y) -> <dnfred> And (<dnf>x, <dnf>y) dnf : Or(x, y) -> Or (<dnf>x, <dnf>y) dnf : Impl(x, y) -> <dnfred> Impl(<dnf>x, <dnf>y) dnf : Eq(x, y) -> <dnfred> Eq (<dnf>x, <dnf>y) strategies dnfred = try(DN <+ (DefI <+ DefE <+ DMA <+ DMO <+ DAOL <+ DAOR); dnf)
The dnf
rules recursively apply themselves to the
direct subterms and then apply dnfred
to actually apply
the rewrite rules.
We can reduce this program by abstracting over the base cases. Since
there is no traversal into True
, False
,
and Atom
s, these rules can be be left out.
module prop-dnf5 imports libstrategolib prop-rules strategies main = io-wrap(dnf) rules dnft : Not(x) -> <dnfred> Not (<dnf>x) dnft : And(x, y) -> <dnfred> And (<dnf>x, <dnf>y) dnft : Or(x, y) -> Or (<dnf>x, <dnf>y) dnft : Impl(x, y) -> <dnfred> Impl(<dnf>x, <dnf>y) dnft : Eq(x, y) -> <dnfred> Eq (<dnf>x, <dnf>y) strategies dnf = try(dnft) dnfred = try(DN <+ (DefI <+ DefE <+ DMA <+ DMO <+ DAOL <+ DAOR); dnf)
The dnf
strategy is now defined in terms of the
dnft
rules, which implement traversal over the
constructors. By using try(dnft)
, terms for which no
traversal rule has been specified are not transformed.
We can further simplify the definition by observing that the
application of dnfred
does not necessarily have to take
place in the right-hand side of the traversal rules.
module prop-dnf6 imports libstrategolib prop-rules strategies main = io-wrap(dnf) rules dnft : Not(x) -> Not (<dnf>x) dnft : And(x, y) -> And (<dnf>x, <dnf>y) dnft : Or(x, y) -> Or (<dnf>x, <dnf>y) dnft : Impl(x, y) -> Impl(<dnf>x, <dnf>y) dnft : Eq(x, y) -> Eq (<dnf>x, <dnf>y) strategies dnf = try(dnft); dnfred dnfred = try(DN <+ (DefI <+ DefE <+ DMA <+ DMO <+ DAOL <+ DAOR); dnf)
In this program dnf
first calls dnft
to
transform the subterms of the subject term, and then calls
dnfred
to apply the transformation rules (and possibly
a recursive invocation of dnf
).
The program above has two problems. First, the traversal behaviour
is mostly uniform, so we would like to specify that more
concisely. We will address that concern below. Second, the traversal
is not reusable, for example, to define a conjunctive normal form
transformation. This last concern can be addressed by factoring out
the recursive call to dnf
and making it a parameter of
the traversal rules.
module prop-dnf7 imports libstrategolib prop-rules strategies main = io-wrap(dnf) rules proptr(s) : Not(x) -> Not (<s>x) proptr(s) : And(x, y) -> And (<s>x, <s>y) proptr(s) : Or(x, y) -> Or (<s>x, <s>y) proptr(s) : Impl(x, y) -> Impl(<s>x, <s>y) proptr(s) : Eq(x, y) -> Eq (<s>x, <s>y) strategies dnf = try(proptr(dnf)); dnfred dnfred = try(DN <+ (DefI <+ DefE <+ DMA <+ DMO <+ DAOL <+ DAOR); dnf) cnf = try(proptr(cnf)); cnfred cnfred = try(DN <+ (DefI <+ DefE <+ DMA <+ DMO <+ DOAL <+ DOAR); cnf)
Now the traversal rules are reusable and used in two different
transformations, by instantiation with a call to the particular
strategy in which they are used (dnf
or
cnf
).
But we can do better, and also make the composition of this strategy reusable.
module prop-dnf8 imports libstrategolib prop-rules strategies main = io-wrap(dnf) rules proptr(s) : Not(x) -> Not (<s>x) proptr(s) : And(x, y) -> And (<s>x, <s>y) proptr(s) : Or(x, y) -> Or (<s>x, <s>y) proptr(s) : Impl(x, y) -> Impl(<s>x, <s>y) proptr(s) : Eq(x, y) -> Eq (<s>x, <s>y) strategies propbu(s) = proptr(propbu(s)); s strategies dnf = propbu(dnfred) dnfred = try(DN <+ (DefI <+ DefE <+ DMA <+ DMO <+ DAOL <+ DAOR); dnf) cnf = propbu(cnfred) cnfred = try(DN <+ (DefI <+ DefE <+ DMA <+ DMO <+ DOAL <+ DOAR); cnf)
That is, the propbu(s)
strategy defines a complete
bottom-up traversal over propostion terms, applying the strategy
s
to a term after transforming its subterms. The
strategy is completely independent of the dnf
and
cnf
transformations, which instantiate the strategy
using the dnfred
and cnfred
strategies.
Come to think of it, dnfred
and cnfred
are
somewhat useless now and can be inlined directly in the
instantiation of the propbu(s)
strategy:
module prop-dnf9 imports libstrategolib prop-rules strategies main = io-wrap(dnf) rules proptr(s) : Not(x) -> Not (<s>x) proptr(s) : And(x, y) -> And (<s>x, <s>y) proptr(s) : Or(x, y) -> Or (<s>x, <s>y) proptr(s) : Impl(x, y) -> Impl(<s>x, <s>y) proptr(s) : Eq(x, y) -> Eq (<s>x, <s>y) strategies propbu(s) = proptr(propbu(s)); s strategies dnf = propbu(try(DN <+ (DefI <+ DefE <+ DMA <+ DMO <+ DAOL <+ DAOR); dnf)) cnf = propbu(try(DN <+ (DefI <+ DefE <+ DMA <+ DMO <+ DOAL <+ DOAR); cnf))
Now we have defined a transformation independent traversal strategy that is specific for proposition terms.
Next we consider cheaper ways for defining the traversal rules, and then ways to get completely rid of them.
The definition of the traversal rules above frequently occurs in the
definition of transformation strategies.
Congruence operators provide a convenient abbreviation of precisely
this operation.
A congruence operator applies a strategy to each direct subterm of a
specific constructor.
For each n-ary constructor c declared in a signature, there is a
corresponding congruence operator c(s1 ,
..., sn)
, which applies to terms of the form c(t1 ,
..., tn)
by applying the argument strategies to the
corresponding argument terms.
A congruence fails if the application of one the argument strategies
fails or if constructor of the operator and that of the term do not
match.
Example. For example, consider the following signature of expressions:
module expressions signature sorts Exp constructors Int : String -> Exp Var : String -> Exp Plus : Exp * Exp -> Exp Times : Exp * Exp -> Exp
The following Stratego Shell session applies the congruence
operators Plus
and Times
to a term:
stratego>
import expressionsstratego>
!Plus(Int("14"),Int("3")) Plus(Int("14"),Int("3"))stratego>
Plus(!Var("a"), id) Plus(Var("a"),Int("3"))stratego>
Times(id, !Int("42")) command failed
The first application shows how a congruence transforms a specific subterm, that is the strategy applied can be different for each subterm. The second application shows that a congruence only succeeds for terms constructed with the same constructor.
The import
at the start of the session is necessary to
declare the constructors used; the definitions of congruences are
derived from constructor declarations.
Forgetting this import would lead to a complaint about an undeclared
operator:
stratego>
!Plus(Int("14"),Int("3")) Plus(Int("14"),Int("3"))stratego>
Plus(!Var("a"), id) operator Plus/(2,0) not defined command failed
Defining Traversals with Congruences.
Now we return to our dnf
/cnf
example, to
see how congruence operators can help in their implementation.
Since congruence operators basically define a one-step traversal for
a specific constructor, they capture the traversal rules defined above.
That is, a traversal rule such as
proptr(s) : And(x, y) -> And(<s>x, <s>y)
can be written by the congruence And(s,s)
.
Applying this to the prop-dnf
program we can replace
the traversal rules by congruences as follows:
module prop-dnf10 imports libstrategolib prop-rules strategies main = io-wrap(dnf) strategies proptr(s) = Not(s) <+ And(s, s) <+ Or(s, s) <+ Impl(s, s) <+ Eq(s, s) propbu(s) = proptr(propbu(s)); s strategies dnf = propbu(try(DN <+ (DefI <+ DefE <+ DMA <+ DMO <+ DAOL <+ DAOR); dnf)) cnf = propbu(try(DN <+ (DefI <+ DefE <+ DMA <+ DMO <+ DOAL <+ DOAR); cnf))
Observe how the five traversal rules have been reduced to five congruences which fit on a single line.
Traversing Tuples and Lists.
Congruences can also be applied to tuples,
(s1,s2,...,sn)
, and lists,
[s1,s2,...,sn]
. A special list congruence is
[]
which 'visits' the empty list.
As an example, consider again the definition of map(s)
using recursive traversal rules:
map(s) : [] -> [] map(s) : [x | xs] -> [<s> x | <map(s)> xs]
Using list congruences we can define this strategy as:
map(s) = [] <+ [s | map(s)]
The []
congruence matches an empty list. The [s |
map(s)]
congruence matches a non-empty list, and applies
s
to the head of the list and map(s)
to
the tail. Thus, map(s)
applies s
to each
element of a list:
stratego>
import liblibstratego>
![1,2,3] [1,2,3]stratego>
map(inc) [2,3,4]
Note that map(s)
only succeeds if s
succeeds for each element of the list.
The fetch
and filter
strategies are
variations on map
that use the failure of
s
to list elements.
fetch(s) = [s | id] <+ [id | fetch(s)]
The fetch
strategy traverses a list
until it finds a element for which
s
succeeds and then stops. That element is the only one
that is transformed.
filter(s) = [] + ([s | filter(s)] <+ ?[ |<id>]; filter(s))
The filter
strategy applies s
to each
element of a list, but only keeps the elements for which it
succeeds.
stratego>
import liblibstratego>
even = where(<eq>(<mod>(<id>,2),0))stratego>
![1,2,3,4,5,6,7,8] [1,2,3,4,5,6,7,8]stratego>
filter(even) [2,4,6,8]
Format Checking. Another application of congruences is in the definition of format checkers. A format checker describes a subset of a term language using a recursive pattern. This can be used to verify input or output of a transformation, and for documentation purposes. Format checkers defined with congruences can check subsets of signatures or regular tree grammars. For example, the subset of terms of a signature in a some normal form.
As an example, consider checking the output of the dnf
and cnf
transformations.
conj(s) = And(conj(s), conj(s)) <+ s disj(s) = Or (disj(s), disj(s)) <+ s // Conjunctive normal form conj-nf = conj(disj(Not(Atom(x)) <+ Atom(x))) // Disjunctive normal form disj-nf = disj(conj(Not(Atom(x)) <+ Atom(x)))
The strategies conj(s)
and disj(s)
check
that the subject term is a conjunct or a disjunct, respectively,
with terms satisfying s
at the leaves.
The strategies conj-nf
and disj-nf
check
that the subject term is in conjunctive or disjunctive normal form,
respectively.
Using congruence operators we constructed a generic,
i.e. transformation independent, bottom-up traversal for proposition
terms. The same can be done for other data types. However, since the
sets of constructors of abstract syntax trees of typical programming
languages can be quite large, this may still amount to quite a bit
of work that is not reusable across data types;
even though a strategy such as `bottom-up traversal', is basically
data-type independent.
Thus, Stratego provides generic traversal by means of several
generic one-step descent operators. The
operator all
, applies a strategy to all direct
subterms. The operator one
, applies a strategy to one
direct subterm, and the operator some
, applies a
strategy to as many direct subterms as possible, and at least one.
The all(s)
strategy transforms a constructor
application by applying the parameter strategy s
to
each direct subterm. An application of all(s)
fails if
the application to one of the subterms fails.
The following example shows how all
(1) applies to any
term, and (2) applies its argument strategy uniformly to all direct
subterms. That is, it is not possible to do something special for a
particular subterm (that's what congruences are for).
stratego>
!Plus(Int("14"),Int("3")) Plus(Int("14"),Int("3"))stratego>
all(!Var("a")) Plus(Var("a"),Var("a"))stratego>
!Times(Var("b"),Int("3")) Times(Var("b"),Int("3"))stratego>
all(!Var("z")) Times(Var("z"),Var("z"))
The all(s)
operator is really the ultimate replacement
for the traversal rules that we saw above. Instead of specifying a
rule or congruence for each constructor, the single application of
the all
operator takes care of traversing all
constructors.
Thus, we can replace the propbu
strategy by a
completely generic definition of bottom-up traversal. Consider again
the last definition of propbu
:
proptr(s) = Not(s) <+ And(s, s) <+ Or(s, s) <+ Impl(s, s) <+ Eq(s, s) propbu(s) = proptr(propbu(s)); s
The role of proptr(s)
in this definition can be
replaced by all(s)
, since that achieves exactly the
same, namely applying s
to the direct subterms of
constructors:
propbu(s) = all(propbu(s)); s
However, the strategy now is completely generic, i.e. independent of
the particular structure it is applied to. In the Stratego Library
this strategy is called bottomup(s)
, and defined as
follows:
bottomup(s) = all(bottomup(s)); s
It first recursively transforms the subterms of the subject term and
then applies s
to the result.
Using this definition, the normalization of propositions now reduces
to the following module, which is only concerned with the selection
and composition of rewrite rules:
module prop-dnf11 imports libstrategolib prop-rules strategies main = io-wrap(dnf) strategies dnf = bottomup(try(DN <+ (DefI <+ DefE <+ DMA <+ DMO <+ DAOL <+ DAOR); dnf)) cnf = bottomup(try(DN <+ (DefI <+ DefE <+ DMA <+ DMO <+ DOAL <+ DOAR); cnf))
In fact, these definitions still contain a reusable pattern. With a little squinting we see that the definitions match the following pattern:
dnf = bottomup(try(dnf-rules; dnf)) cnf = bottomup(try(cnf-rules; cnf))
In which we can recognize the definition of innermost reduction, which the Stratego Library defines as:
innermost(s) = bottomup(try(s; innermost(s)))
The innermost
strategy performs a bottom-up traversal
of a term. After transforming the subterms of a term it tries to
apply the transformation s
. If succesful the result is
recursively transformed with an application of
innermost
.
This brings us to the final form for the proposition normalizations:
module prop-dnf12 imports libstrategolib prop-rules strategies main = io-wrap(dnf) strategies dnf = innermost(DN <+ DefI <+ DefE <+ DMA <+ DMO <+ DAOL <+ DAOR) cnf = innermost(DN <+ DefI <+ DefE <+ DMA <+ DMO <+ DOAL <+ DOAR)
Different transformations can be achieved by using a selection of rules and a strategy, which is generic, yet defined in Stratego itself using strategy combinators.
The one(s)
strategy transforms a constructor
application by applying the parameter strategy s
to
exactly one direct subterm. An application of one(s)
fails if the application to all of the subterms fails.
The following Stratego Shell session illustrates the behaviour of
the combinator:
stratego>
!Plus(Int("14"),Int("3")) Plus(Int("14"),Int("3"))stratego>
one(!Var("a")) Plus(Var("a"),Int("3"))stratego>
one(\ Int(x) -> Int(<addS>(x,"1")) \ ) Plus(Var("a"),Int("4"))stratego>
one(?Plus(_,_)) command failed
A frequently used application of one
is the
oncetd(s)
traversal, which performs a left to right
depth first search/transformation that stops as soon as
s
has been successfuly applied.
oncetd(s) = s <+ one(oncetd(s))
Thus, s
is first applied to the root of the subject
term. If that fails, its direct subterms are searched one by one
(from left to right), with a recursive call to
oncetd(s)
.
An application of oncetd
is the
contains(|t)
strategy, which checks whether the subject
term contains a subterm that is equal to t
.
contains(|t) = oncetd(?t)
Through the depth first search of oncetd
, either an
occurrence of t
is found, or all subterms are verified
to be unequal to t
.
Here are some other one-pass traversals using the one
combinator:
oncebu(s) = one(oncebu(s)) <+ s spinetd(s) = s; try(one(spinetd(s))) spinebu(s) = try(one(spinebu(s))); s
Exercise: figure out what these strategies do.
Here are some fixe-point traversals, i.e., traversals that apply their argument transformation exhaustively to the subject term.
reduce(s) = repeat(rec x(one(x) + s)) outermost(s) = repeat(oncetd(s)) innermostI(s) = repeat(oncebu(s))
The difference is the subterm selection strategy. Exercise: create rewrite rules and terms that demonstrate the differences between these strategies.
The some(s)
strategy transforms a constructor
application by applying the parameter strategy s
to as
many direct subterms as possible and at least one. An application of
some(s)
fails if the application to all of the subterms
fails.
Some one-pass traversals based on some
:
sometd(s) = s <+ some(sometd(s)) somebu(s) = some(somebu(s)) <+ s
A fixed-point traversal with some
:
reduce-par(s) = repeat(rec x(some(x) + s))
Above we have seen the basic mechanisms for defining traversals in Stratego: custom traversal rules, data-type specific congruence operators, and generic traversal operators. Term traversals can be categorized into classes according to how much of the term they traverse and to which parts of the term they modify. We will consider a number of idioms and standard strategies from the Stratego Library that are useful in the definition of traversals.
One class of traversal strategies performs a full
traversal, that is visits and transforms every subterm of
the subject term.
We already saw the bottomup
strategy defined as
bottomup(s) = all(bottomup(s)); s
It first visits the subterms of the subject term, recursively
transforming its subterms, and then applies the
transformation s
to the result.
A related strategy is topdown
, which is defined as
topdown(s) = s; all(topdown(s))
It first transforms the subject therm and then visits the subterms of the result.
A combination of topdown
and bottomup
is
downup
, defined as
downup(s) = s; all(downup(s)); s
It applies s
on the way down the tree, and again on the
way up.
A variation is downup(2,0)
downup(s1, s2) = s1; all(downup(s1, s2)); s2
which applies one strategy on the way down and another on the way up.
Since the parameter strategy is applied at every subterm, these
traversals only succeed if it succeeds everywhere. Therefore, these
traversals are typically applied in combination with
try
or repeat
.
topdown(try(R1 <+ R2 <+ ...))
This has the effect that the rules are tried at each subterm. If none of the rules apply the term is left as it was and traversal continues with its subterms.
Choosing a Strategy. The strategy to be used for a particular transformation depends on the rules and the goal to be achieved.
For example, a constant folding transformation for proposition
formulae can be defined as a bottom-up traversal that tries to apply
one of the truth-rules T
at each subterm:
T : And(True(), x) -> x T : And(x, True()) -> x T : And(False(), x) -> False() T : And(x, False()) -> False() T : Or(True(), x) -> True() T : Or(x, True()) -> True() T : Or(False(), x) -> x T : Or(x, False()) -> x T : Not(False()) -> True() T : Not(True()) -> False() eval = bottomup(try(T))
Bottomup is the strategy of choice here because it evaluates
subterms before attempting to rewrite a term.
An evaluation strategy using topdown
eval2 = topdown(try(T)) // bad strategy
does not work as well, since it attempts to rewrite terms before their subterms have been reduced, thus missing rewriting opportunities. The following Stratego Shell session illustrates this:
stratego>
!And(True(), Not(Or(False(), True()))) And(True,Not(Or(False,True)))stratego>
eval Falsestratego>
!And(True(), Not(Or(False(), True()))) And(True,Not(Or(False,True)))stratego>
eval2 Not(True)
Exercise: find other terms that show the difference between these strategies.
On the other hand, a desugaring transformation for propositions,
which defines implication and equivalence in terms of other
connectives is best defined as a topdown
traversal
which tries to apply one of the rules DefI
or
DefE
at every subterm.
DefI : Impl(x, y) -> Or(Not(x), y) DefE : Eq(x, y) -> And(Impl(x, y), Impl(y, x)) desugar = topdown(try(DefI <+ DefE))
Since DefE
rewrites Eq
terms to terms
involving Impl
, a strategy with bottomup
does not work.
desugar2 = bottomup(try(DefI <+ DefE)) // bad strategy
Since the subterms of a node are traversed
before the node itself is visited, this
transformation misses the desugaring of the implications
(Impl
) originating from the application of the
DefE
rule.
The following Shell session illustrates this:
stratego>
!Eq(Atom("p"), Atom("q")) Eq(Atom("p"),Atom("q"))stratego>
desugar And(Or(Not(Atom("p")),Atom("q")),Or(Not(Atom("q")),Atom("p")))stratego>
!Eq(Atom("p"), Atom("q")) Eq(Atom("p"),Atom("q"))stratego>
desugar2 And(Impl(Atom("p"),Atom("q")),Impl(Atom("q"),Atom("p")))
Repeated Application.
In case one rule produces a term to which another desugaring rule
can be applied, the desugaring strategy should repeat the
application of rules to each subterm.
Consider the following rules and strategy for desugaring
propositional formulae to implicative normal form (using only
implication and False
).
DefT : True() -> Impl(False(), False()) DefN : Not(x) -> Impl(x, False()) DefA2 : And(x, y) -> Not(Impl(x, Not(y))) DefO1 : Or(x, y) -> Impl(Not(x), y) DefE : Eq(x, y) -> And(Impl(x, y), Impl(y, x)) impl-nf = topdown(repeat(DefT <+ DefN <+ DefA2 <+ DefO1 <+ DefE))
Application of the rules with try
instead of
repeat
impl-nf2 = topdown(try(DefT <+ DefN <+ DefA2 <+ DefO1 <+ DefE)) // bad strategy
is not sufficient, as shown by the following Shell session:
stratego>
!And(Atom("p"),Atom("q")) And(Atom("p"),Atom("q"))stratego>
impl-nf Impl(Impl(Atom("p"),Impl(Atom("q"),False)),False)stratego>
!And(Atom("p"),Atom("q")) And(Atom("p"),Atom("q"))stratego>
impl-nf2 Not(Impl(Atom("p"),Impl(Atom("q"),False)))
Note that the Not
is not desugared with
impl-nf2
.
Paramorphism. A variation on bottomup is a traversal that also provides the original term as well as the term in which the direct subterms have been transformed. (Also known as a paramorphism?)
bottomup-para(s) = <s>(<id>, <all(bottomup-para(s))>)
This is most useful in a bottom-up traversal; the original term is always available in a top-down traversal.
Exercise: give an example application of this strategy
Cascading transformations are transformations upon
transformations. While the full traversals discussed above walk over
the tree once, cascading transformations apply multiple `waves' of
transformations to the nodes in the tree.
The prototypical example is the innermost
strategy,
which exhaustively applies a transformation, typically a set of
rules, to a tree.
simplify = innermost(R1 <+ ... <+ Rn)
The basis of innermost
is a bottomup
traversal that tries to apply the transformation at each node after
visiting its subterms.
innermost(s) = bottomup(try(s; innermost(s)))
If the transformation s
succeeds, the result term is
transformed again with a recursive call to innermost
.
Application of innermost
exhaustively applies
one set of rules to a tree.
Using sequential composition we can apply several
stages of reductions.
A special case of such a staged transformation,
is known as sequence of normal forms (in the
TAMPR system):
simplify = innermost(A1 <+ ... <+ Ak) ; innermost(B1 <+ ... <+ Bl) ; ... ; innermost(C1 <+ ... <+ Cm)
At each stage the term is reduced with respect to a different set of rules.
Of course it is possible to mix different types of transformations in such a stage pipeline, for example.
simplify = topdown(try(A1 <+ ... <+ Ak)) ; innermost(B1 <+ ... <+ Bl) ; ... ; bottomup(repeat(C1 <+ ... <+ Cm))
At each stage a different strategy and different set of rules can be used. (Of course one may use the same strategy several times, and some of the rule sets may overlap.)
While completely generic strategies such as bottomup
and innermost
are often useful, there are also
situations where a mixture of generic and data-type specific
traversal is necessary. Fortunately, Stratego allows you to mix
generic traversal operators, congruences, your own traversal and
regular rules, any way you see fit.
A typical pattern for such strategies first tries a number of special cases that deal with traversal themselves. If none of the special cases apply, a generic traversal is used, followed by application of some rules applicable in the general case.
transformation = special-case1 <+ special-case2 <+ special-case3 <+ all(transformation); reduce reduce = ...
Constant Propagation. A typical example is the following constant propagation strategy. It uses the exceptions to the basic generic traversal to traverse the tree in the order of the control-flow of the program that is represented by the term. This program makes use of dynamic rewrite rules, which are used to propagate context-sensitive information through a program. In this case, the context-sensitive information concerns the constant values of some variables in the program, which should be propagated to the uses of those variables. Dynamic rules will be explained in Chapter 20; for now we are mainly concerned with the traversal strategy.
module propconst imports liblib signature constructors Var : String -> Exp Plus : Exp * Exp -> Exp Assign : String * Exp -> Stat If : Exp * Stat * Stat -> Stat While : Exp * Stat -> Stat strategies propconst = PropConst <+ propconst-assign <+ propconst-if <+ propconst-while <+ all(propconst); try(EvalBinOp) EvalBinOp : Plus(Int(i), Int(j)) -> Int(k) where <addS>(i,j) => k EvalIf : If(Int("0"), s1, s2) -> s2 EvalIf : If(Int(i), s1, s2) -> s1 where <not(eq)>(i, "0") propconst-assign = Assign(?x, propconst => e) ; if <is-value> e then rules( PropConst : Var(x) -> e ) else rules( PropConst :- Var(x) ) end propconst-if = If(propconst, id, id) ; (EvalIf; propconst <+ (If(id, propconst, id) /PropConst\ If(id,id,propconst))) propconst-while = While(id,id) ; (/PropConst\* While(propconst, propconst)) is-value = Int(id)
The main strategy of the constant propagation transformation , follows the pattern described
above; a number of special case alternatives followed by a generic
traversal alternative. The special cases are defined in their own
definitions.
Generic traversal is followed by the constant folding rule
EvalBinOp
.
The first special case is an application of the dynamic rule
PropConst
, which replaces a constant valued variable by
its constant value .
This rule is defined by the second special case strategy,
propconst-assign
. It first traverses the
right-hand side of an assignment with an Assign
congruence operator, and a recursive call to
propconst
. Then, if the expression evaluated to a
constant value, a new PropConst
rule is
defined. Otherwise, any old instance of PropConst
for
the left-hand side variable is undefined.
The third special case for If
uses congruence operators
to order the application of propconst
to its subterms
. The first congruence
applies propconst
to the condition expression. Then an
application of the rule EvalIf
attempts to eliminate
one of the branches of the statement, in case the condition
evaluated to a constant value. If that is not possible the branches
are visited by two more congruence operator applications joined by a
dynamic rule intersection operator, which distributes the constant
propagation rules over the branches and merges the rules afterwards,
keeping only the consistent ones.
Something similar happens in the case of While
statements .
For details concerning dynamic rules, see Chapter 20.
To see what propconst
achieves, consider the following
abstract syntax tree (say in file foo.prg
).
Block([ Assign("x", Int("1")), Assign("y", Int("42")), Assign("z", Plus(Var("x"), Var("y"))), If(Plux(Var("a"), Var("z")), Assign("b", Plus(Var("x"), Int("1"))), Block([ Assign("z", Int("17")), Assign("b", Int("2")) ])), Assign("c", Plus(Var("b"), Plus(Var("z"), Var("y")))) ])
We import the module in the Stratego Shell, read the abstract syntax
tree from file, and apply the propconst
transformation
to it:
stratego>
import libstrategolibstratego>
import propconststratego>
<ReadFromFile> "foo.prg" ...stratego>
propconst Block([Assign("x",Int("1")),Assign("y",Int("42")),Assign("z",Int("43")), If(Plux(Var("a"),Int("43")),Assign("b",Int("2")),Block([Assign("z", Int("17")),Assign("b",Int("2"))])),Assign("c",Plus(Int("2"),Plus( Var("z"),Int("42"))))])
Since the Stratego Shell does not (yet) pretty-print terms, the result is rather unreadable. We can remedy this by writing the result of the transformation to a file, and pretty-printing it on the regular command-line with pp-aterm.
stratego>
<ReadFromFile> "foo.prg" ...stratego>
propconst; <WriteToTextFile> ("foo-pc.prg", <id>) ...stratego>
:quit ...$
pp-aterm -i foo-pc.prg Block( [ Assign("x", Int("1")) , Assign("y", Int("42")) , Assign("z", Int("43")) , If( Plux(Var("a"), Int("43")) , Assign("b", Int("2")) , Block( [Assign("z", Int("17")), Assign("b", Int("2"))] ) ) , Assign( "c" , Plus(Int("2"), Plus(Var("z"), Int("42"))) ) ] )
Compare the result to the original program and try to figure out what has happened and why that is correct. (Assuming the `usual' semantics for this type of imperative language.)
Generic Strategies with Exceptional Cases.
Patterns for mixing specific and generic traversal can be captured
in parameterized strategies such as the following. They are
parameterized with the usual transformation parameter s
and with a higher-order strategy operator stop
, which
implements the special cases.
topdownS(s, stop: (a -> a) * b -> b) = rec x(s; (stop(x) <+ all(x))) bottomupS(s, stop: (a -> a) * b -> b) = rec x((stop(x) <+ all(x)); s) downupS(s, stop: (a -> a) * b -> b) = rec x(s; (stop(x) <+ all(x)); s) downupS(s1, s2, stop: (a -> a) * b -> b) = rec x(s1; (stop(x) <+ all(x)); s2)
While normal strategies (parameters) are functions from terms to
terms, the stop
parameter is a function from strategies
to strategies. Such exceptions to the default have to be declared
explicitly using a type annotation.
Note that the bottomupS
strategy is slightly different
from the pattern of the propconst
strategy; instead of
applying s
only after the generic
traversal case, it is here applied in all cases.
However, the added value of these strategies is not very high. The
payoff in the use of generic strategies is provided by the basic
generic traversal operators, which provide generic behaviour for all
constructors. The stop
callback can make it harder to
understand the control-flow structure of a strategy; use with care
and don't overdo it.
While it is possible to construct your own strategies by mixing traversal elements and rules, in general, it is a good idea to try to get a clean separation between pure rewrite rules and a (simple) strategy that applies them.
The full traversals introduced above mostly visit all nodes in the tree. Now we consider traversals that visit only some of the nodes of a tree.
The oncet
and oncebu
strategies apply the
argument strategy s
at one position in the tree. That
is, application is tried at every node along the traversal until it
succeeds.
oncetd(s) = s <+ one(oncetd(s)) oncebu(s) = one(oncebu(s)) <+ s
The sometd
and somebu
strategies are
variations on oncet
and oncebu
that apply
s
at least once at some positions, but possibly many
times. As soon as one is found, searching is stopped, i.e., in the
top-down case searching in subtrees is stopped, in bottom-up case,
searching in upper spine is stopped.
sometd(s) = s <+ some(sometd(s)) somebu(s) = some(somebu(s)) <+ s
Similar strategies that find as many applications as possible, but at
least one, can be built using some
:
manybu(s) = rec x(some(x); try(s) <+ s) manytd(s) = rec x(s; all(try(x)) <+ some(x))
somedownup(s) = rec x(s; all(x); try(s) <+ some(x); try(s))
The alltd(s)
strategy stops as soon as it has found a
subterm to which s
can be succesfully applied.
alltd(s) = s <+ all(alltd(s))
If s
does not succeed, the strategy is applied
recursively at all direct subterms. This means that s
is applied along a frontier of the subject term. This strategy is
typically used in substitution operations in which subterms are
replaced by other terms. For example, the strategy
alltd(?Var(x); !e)
replaces all occurrences of
Var(x)
by e
.
Note that alltd(try(s))
is not a useful strategy.
Since try(s)
succeeds at the root of the term, no
traversal is done.
A typical application of alltd
is the definition of
local transformations, that only apply to some specific subterm.
transformation = alltd( trigger-transformation ; innermost(A1 <+ ... <+ An) )
Some relatives of alltd
that add a strategy to apply on
the way up.
alldownup2(s1, s2) = rec x((s1 <+ all(x)); s2) alltd-fold(s1, s2) = rec x(s1 <+ all(x); s2)
Finally, the following strategies select the leaves of a tree, where the determination of what is a leaf is upto a parameter strategy.
leaves(s, is-leaf, skip: a * (a -> a) -> a) = rec x((is-leaf; s) <+ skip(x) <+ all(x)) leaves(s, is-leaf) = rec x((is-leaf; s) <+ all(x))
A spine of a term is a chain of nodes from the root to some
subterm. spinetd
goes down one spine and applies
s
along the way to each node on the spine. The
traversal stops when s
fails for all children of a
node.
spinetd(s) = s; try(one(spinetd(s))) spinebu(s) = try(one(spinebu(s))); s spinetd'(s) = s; (one(spinetd'(s)) + all(fail)) spinebu'(s) = (one(spinebu'(s)) + all(fail)); s
Apply s
everywhere along al spines where s
applies.
somespinetd(s) = rec x(s; try(some(x))) somespinebu(s) = rec x(try(some(x)); s) spinetd'(s) = rec x(s; (one(x) + all(fail))) spinebu'(s) = rec x((one(x) + all(fail)); s)
While these strategies define the notion of applying along a spine, they are rarely used. In practice one would use more specific traversals with that determine which subterm to include in the search for a path.
TODO: examples
TODO: format checking
TODO: matching of complex patterns
TODO: contextual rules (local traversal)
We have seen that tree traversals can be defined in several ways.
Recursive traversal rules allow finegrained specification of a
traversal, but usually require too much boilerplate code.
Congruence operators provide syntactic sugar for traversal rules
that apply a strategy to each direct subterm of a term.
The generic traversal operators all
, one
,
and some
allow consise, data-type independent
implementation of traversals.
A host of traversal strategies can be obtained by a combination of
the strategy combinators from the previous chapters with these
traversal operators.
Table of Contents
In Chapter 17 we have seen combinators for composing type preserving strategies. That is, structural transformations in which basic transformation rules don't change the type of a term. Such strategies are typically applied in transformations, which change the structure of a term, but not its type. Examples are simplification and optimization. In this chapter we consider the class of type unifying strategies, in which terms of different types are mapped onto one type. The application area for this type of strategy is analysis of expresssions with examples such as free variables collection and call-graph extraction.
We consider the following example problems:
term-size: Count the number of nodes in a term
occurrences: Count number of occurrences of a subterm in a term
collect-vars: Collect all variables in expression
free-vars: Collect all free variables in expression
These problems have in common that they reduce a structure to a single value or to a collection of derived values. The structure of the original term is usually lost.
We start with examining these problems in the context of lists, and then generalize the solutions we find there to arbitrary terms using generic term deconstruction, which allows concise implementation of generic type unifying strategies, similarly to the generic traversal strategies of Chapter 17.
We start with considering type-unifying operations on lists.
Sum.
Reducing a list to a value can be conveniently expressed by means of
a fold, which has as parameters operations for reducing the list
constructors.
The foldr/2
strategy reduces a list by replacing each
Cons
by an application of s2
, and the
empty list by s1
.
foldr(s1, s2) = []; s1 <+ \ [y|ys] -> <s2>(y, <foldr(s1, s2)> ys) \
Thus, when applied to a list with three terms the result is
<foldr(s1,s2)> [t1,t2,t3] => <s2>(t1, <s2>(t2, <s2>(t3, <s1> [])))
A typical application of foldr/2
is sum
,
which reduces a list to the sum of its elements. It sums the
elements of a list of integers, using 0
for the empty
list and add
to combine the head of a list and the
result of folding the tail.
sum = foldr(!0, add)
The effect of sum
is illustrated by the following
application:
<foldr(!0,add)> [1,2,3] => <add>(1, <add>(2, <add>(3, <!0> []))) => 6
Note the build operator for replacing the empty list with
0
; writing foldr(0, add)
would be wrong,
since 0
by itself is a congruence operator, which
basically matches the subject term with the
term 0
(rather than replacing it).
Size.
The foldr/2
strategy does not touch the elements of a
list. The foldr/3
strategy is a combination of fold
and map that extends foldr/2
with a parameter that is
applied to the elements of the list.
foldr(s1, s2, f) = []; s1 <+ \ [y|ys] -> <s2>(<f>y, <foldr(s1,s2,f)>ys) \
Thus, when applying it to a list with three elements, we get:
<foldr(s1,s2)> [t1,t2,t3] => <s2>(<f>t1, <s2>(<f>t2, <s2>(<f>t3, <s1> [])))
Now we can solve our first example problem
term-size
. The size of a list is its
length, which corresponds to the sum of the
list with the elements replaced by 1
.
length = foldr(!0, add, !1)
Number of occurrences.
The number of occurrences in a list of terms that satisfy some
predicate, entails only counting those elements in the list for
which the predicate succeeds. (Where a predicate is implemented with
a strategy that succeeds only for the elements in the domain of the
predicate.)
This follows the same pattern as counting the length of a list, but
now only counting the elements for which s
succeeds.
list-occurrences(s) = foldr(!0, add, s < !1 + !0)
Using list-occurrences
and a match strategy we can
count the number of variables in a list:
list-occurrences(?Var(_))
Collect.
The next problem is to collect all terms for
which a strategy succeeds. We have already seen how to do this for
lists. The filter
strategy reduces a list to the
elements for which its argument strategy succeeds.
filter(s) = [] <+ [s | filter(s)] <+ ?[ |<filter(s)>]
Collecting the variables in a list is a matter of filtering with the
?Var(_)
match.
filter(?Var(_))
The final problem, collecting the free variables in a term, does not really have a counter part in lists, but we can mimick this if we consider having two lists; where the second list is the one with the bound variables that should be excluded.
(filter(?Var(_)),id); diff
This collects the variables in the first list and subtracts the variables in the second list.
We have seen how to do typical analysis transformations on lists. How can we generalize this to arbitrary terms? The general idea of a folding operator is that it replaces the constructors of a data-type by applying a function to combine the reduced arguments of constructor applications. For example, the following definition is a sketch for a fold over abstract syntax trees:
fold-exp(binop, assign, if, ...) = rec f( fold-binop(f, binop) <+ fold-assign(f, assign) <+ fold-if(f, if) <+ ... ) fold-binop(f, s) : BinOp(op, e1, e2) -> <s>(op, <f>e1, <f>e2) fold-assign(f, s) : Assign(e1, e2) -> <s>(<f>e1, <f>e2) fold-if(f, s) : If(e1, e2, e3) -> <s>(<f>e1, <f>e2, <f>e3)
For each constructor of the data-type the fold has an argument strategy and a rule that matches applications of the constructor, which it replaces with an application of the strategy to the tuple of subterms reduced by a recursive invocation of the fold.
Instantation of this strategy requires a rule for each constructor
of the data-type. For instance, the following instantiation defines
term-size
using fold-exp
by providing
rules that sum up the sizes of the subterms and add one
(inc
) to account for the node itself.
term-size = fold-exp(BinOpSize, AssignSize, IfSize, ...) BinOpSize : (Plus(), e1, e2) -> <add; inc>(e1, e2) AssignSize : (e1, e2) -> <add; inc>(e1, e2) IfSize : (e1, e2, e3) -> <add; inc>(e1, <add>(e2, e3))
This looks suspiciously like the traversal rules in Chapter 17. Defining folds in this manner has several limitations. In the definition of fold, one parameter for each constructor is provided and traversal is defined explicitly for each constructor. Furthermore, in the instantiation of fold, one rule for each constructor is needed, and the default behaviour is not generically specified.
One solution would be to use the generic traversal strategy
bottomup
to deal with fold:
fold-exp(s) = bottomup(s) term-size = fold-exp(BinOpSize <+ AssignSize <+ IfSize <+ ...) BinOpSize : BinOp(Plus(), e1, e2) -> <add>(1, <add>(e1, e2)) AssignSize : Assign(e1, e2) -> <add>(e1, e2) IfSize : If(e1, e2, e3) -> <add>(e1, <add>(e2, e3))
Although the recursive application to subterms is now defined generically , one still has to specify rules for the default behaviour.
Instead of having folding rules that are specific to a data type, such as
BinOpSize : BinOp(op, e1, e2) -> <add>(1, <add>(e1, e2)) AssignSize : Assign(e1, e2) -> <add>(1, <add>(e1, e2))
we would like to have a generic definition of the form
CSize : c(e1, e2, ...) -> <add>(e1, <add>(e2, ...))
This requires generic decomposition of a constructor application
into its constructor and the list with children. This can be done
using the #
operator. The match strategy
?p1#(p2)
decomposes a constructor application into its
constructor name and the list of direct subterms. Matching such a
pattern against a term of the form C(t1,...,tn)
results
in a match of "C"
against p1
and a match
of [t1,...,tn]
against p2
.
Plus(Int("1"), Var("2"))stratego>
?c#(xs)stratego>
:binding c variable c bound to "Plus"stratego>
:binding xs variable xs bound to [Int("1"), Var("2")]
Crush.
Using generic term deconstruction we can now generalize the type
unifying operations on lists to arbitrary terms.
In analogy with the generic traversal operators we need a generic
one-level reduction operator.
The crush/3
strategy reduces a constructor application
by folding the list of its subterms using foldr/3
.
crush(nul, sum, s) : c#(xs) -> <foldr(nul, sum, s)> xs
Thus, crush
performs a fold-map over the direct
subterms of a term. The following application illustrates what
<crush(s1, s2, f)> C(t1, t2) => <s2>(<f>t1, <s2>(<f>t2, <s1>[]))
The following Shell session instantiates this application in two ways:
stratego>
import libstrategolibstratego>
!Plus(Int("1"), Var("2")) Plus(Int("1"),Var("2"))stratego>
crush(id, id, id) (Int("1"),(Var("2"),[]))stratego>
!Plus(Int("1"), Var("2")) Plus(Int("1"),Var("2"))stratego>
crush(!Tail(<id>), !Sum(<Fst>,<Snd>), !Arg(<id>)) Sum(Arg(Int("1")),Sum(Arg(Var("2")),Tail([])))
The crush
strategy is the tool we need to implement
solutions for the example problems above.
Size.
Counting the number of direct subterms of a term is similar to
counting the number of elements of a list. The definition of
node-size
is the same as the definition of
length
, except that it uses crush
instead
of foldr
:
node-size = crush(!0, add, !1)
Counting the number of subterms (nodes) in a term is a similar problem. But, instead of counting each direct subterm as 1, we need to count its subterms.
term-size = crush(!1, add, term-size)
The term-size
strategy achieves this simply with a
recursive call to itself.
stratego>
<node-size> Plus(Int("1"), Var("2")) 2stratego>
<term-size> Plus(Int("1"), Var("2")) 5
Occurrences.
Counting the number of occurrences of a certain term in another
term, or more generally, counting the number of subterms that
satisfy some predicate is similar to counting the term size.
However, only those terms satisfying the predicate should be
counted.
The solution is again similar to the solution for lists, but now
using crush
.
om-occurrences(s) = s < !1 + crush(!0, add, om-occurrences(s))
The om-occurrences
strategy counts the
outermost subterms satisfying
s
. That is, the strategy stops counting as soon as it
finds a subterm for which s
succeeds.
The following strategy counts all occurrences:
occurrences(s) = <add>(<s < !1 + !0>, <crush(!0, add, occurrences(s))>)
It counts the current term if it satisfies s
and adds
that to the occurrences in the subterms.
stratego>
<om-occurrences(?Int(_))> Plus(Int("1"), Plus(Int("34"), Var("2"))) 2stratego>
<om-occurrences(?Plus(_,_))> Plus(Int("1"), Plus(Int("34"), Var("2"))) 1stratego>
<occurrences(?Plus(_,_))> Plus(Int("1"), Plus(Int("34"), Var("2"))) 2
Collect.
Collecting the subterms that satisfy a
predicate is similar to counting, but now a
list of subterms is produced.
The collect(s)
strategy collects all
outermost occurrences satisfying
s
.
collect(s) = ![<s>] <+ crush(![], union, collect(s))
When encountering a subterm for which s
succeeds, a
singleton list is produced. For other terms, the matching subterms
are collected for each direct subterm, and the resulting lists are
combined with union
to remove duplicates.
A typical application of collect
is the collection of
all variables in an expression, which can be defined as follows:
get-vars = collect(?Var(_))
Applying get-vars
to an expression AST produces the
list of all subterms matching Var(_)
.
The collect-all(s)
strategy collects
all occurrences satisfying s
.
collect-all(s) = ![<s> | <crush(![], union, collect(s))>] <+ crush(![], union, collect(s))
If s
succeeds for the subject term combines the subject
term with the collected terms from the subterms.
Free Variables. Collecting the variables in an expression is easy, as we saw above. However, when dealing with languages with variable bindings, a common operation is to extract only the free variables in an expression or block of statements. That is, the occurrences of variables that are not bound by a variable declaration. For example, in the expression
x + let var y := x + 1 in f(y, a + x + b) end
the free variables are {x, a, b}
, but not
y
, since it is bound by the declaration in the let.
Similarly, in the function definition
function f(x : int) = let var y := h(x) in x + g(z) * y end
the only free variable is z
since x
and
y
are declared.
Here is a free variable extraction strategy for Tiger expressions.
It follows a similar pattern of mixing generic and data-type specific
operations as we saw in Chapter 17.
The crush
alternative takes care of the non-special
constructors, while ExpVars
and FreeVars
deal with the special cases, i.e. variables and variable binding
constructs:
free-vars = ExpVars <+ FreeVars(free-vars) <+ crush(![], union, free-vars) ExpVars : Var(x) -> [x] FreeVars(fv) : Let([VarDec(x, t, e1)], e2) -> <union>(<fv> e1, <diff>(<fv> e2, [x])) FreeVars(fv) : Let([FunctionDec(fdecs)], e2) -> <diff>(<union>(<fv> fdecs, <fv>e2), fs) where <map(?FunDec(<id>,_,_,_))> fdecs => fs FreeVars(fv) : FunDec(f, xs, t, e) -> <diff>(<fv>e, xs) where <map(Fst)> xs => xs
The FreeVars
rules for binding constructs use their
fv
parameter to recursively get the free variables from
subterms, and they subtract the bound variables from any free
variables found using diff
.
We can even capture the pattern exhibited here in a generic collection algorithm with support for special cases:
collect-exc(base, special : (a -> b) * a -> b) = base <+ special(collect-exc(base, special)) <+ crush(![], union, collect-exc(base, special))
The special
parameter is a strategy parameterized with
a recursive call to the collection strategy.
The original definition of free-vars
above, can now be
replaced with
free-vars = collect-exc(ExpVars, FreeVars)
It can also be useful to construct terms
generically. For example, in parse tree implosion, application nodes
should be reduced to constructor applications. Hence build operators
can also use the #
operator. In a strategy
!p1#(p2)
, the current subject term is replaced by a
constructor application, where the constructor name is provided by
p1
and the list of subterms by p2
. So, if
p1
evaluates to "C"
and p2
evaluates to [t1,...,tn]
, the expression
!p1#(p2)
build the term C(t1,...,tn)
.
Imploding Parse Trees. A typical application of generic term construction is the implosion of parse trees to abstract syntax trees performed by implode-asfix. Parse trees produced by sglr have the form:
appl(prod(sorts, sort, attrs([cons("C")])),[t1,...,tn])
That is, a node in a parse tree consists of an encoding of the
original production from the syntax definition, and a list with
subtrees. The production includes a constructor annotation
cons("C")
with the name of the abstract syntax tree
constructor. Such a tree node should be imploded to an abstract
syntax tree node of the form C(t1,...,tn)
.
Thus, this requires the construction of a term with constructor
C
given the string with its name.
The following implosion strategy achieves this using generic term
construction:
implode = appl(id, map(implode)); Implode Implode : appl(prod(sorts, sort, attrs([cons(c)])), ts) -> c#(ts)
The Implode
rule rewrites an appl
term to
a constructor application, by extracing the constructor name from
the production and then using generic term construction to apply the
constructor.
Note that this is a gross over simplification of the actual implementation of implode-asfix. See the source code for the full strategy.
Generic term construction and deconstruction support the definition of generic analysis and generic translation problems. The generic solutions for the example problems term size, number of occurrences, and subterm collection demonstrate the general approach to solving these types of problems.
Table of Contents
Stratego programs can be used to analyze, generate, and transform object programs. In this process object programs are structured data represented by terms. Terms support the easy composition and decomposition of abstract syntax trees. For applications such as compilers, programming with abstract syntax is adequate; only small fragments, i.e., a few constructors per pattern, are manipulated at a time. Often, object programs are reduced to a core language that only contains the essential constructs. The abstract syntax can then be used as an intermediate language, such that multiple languages can be expressed in it, and meta-programs can be reused for several source languages.
However, there are many applications of program transformation in which the use of abstract syntax is not adequate since the distance between the concrete programs that we understand and the abstract syntax trees used in specifications is too large. Even with pattern matching on algebraic data types, the construction of large code fragments in a program generator can become painful. For example, even the following tiny program pattern is easier to read in the concrete variant
let d* in let var x ta := (e1*) in e2* end end
than the abstract variant
Let(d*, [Let([VarDec(x, ta, Seq(e1*))], e2*)])
While abstract syntax is manageable for fragments of this size (and sometimes even more concise!), it becomes unpleasant to use when larger fragments need to be specified.
Besides the problems of understandability and complexity, there are other reasons why the use of abstract syntax may be undesirable. Desugaring to a core language is not always possible. For example, in the renovation of legacy code the goal is to repair the bugs in a program, but leave it intact otherwise. This entails that a much larger abstract syntax needs to be dealt with. Another occasion that calls for the use of concrete syntax is the definition of transformation or generation rules by users (programmers) rather than by compiler writers (meta-programmers). Other application areas that require concrete syntax are application generation and structured document (XML) processing.
Hence, it is desirable to have a meta-language that lets us write object-program fragments in the concrete syntax of the object language. This requires the extension of the meta-language with the syntax of the object language of interest, such that expressions in that language are interpreted as terms. In this chapter it is shown how the Stratego language based on abstract syntax terms is extended to support the use of concrete object syntax for terms.
To appreciate the need for concrete syntax in program transformation, it is illuminating to constrast the use of concrete syntax with the traditional use of abstract syntax in a larger example. Program instrumentation is the extension of a program in a systematic way in order to obtain measurements during run-time. Instrumentation is used, for example, in debugging to get information about the run-time behaviour of a program, and in profiling to collect statistics about about run-time and call frequency of program elements. Here we consider a simple instrumentation scheme that instruments Tiger functions with calls to trace functions.
The following Stratego fragment shows rewrite rules that instrument
a function f
such that it prints f entry
on entry of the function and f exit
at the exit. The
actual printing is delegated to the functions enterfun
and exitfun
. Functions are instrumented differently
than procedures, since the body of a function is an expression
statement and the return value is the value of the expression. It is
not possible to just glue a print statement or function call at the
end of the body. Therefore, a let expression is introduced, which
introduces a temporary variable to which the body expression of the
function is assigned. The code for the functions
enterfun
and exitfun
is generated by rule
IntroducePrinters
. Note that the declarations of the
Let
generated by that rule have been omitted.
instrument = topdown(try(TraceProcedure + TraceFunction)) ; IntroducePrinters ; simplify TraceProcedure : FunDec(f, x*, NoTp, e) -> FunDec(f, x*, NoTp, Seq([Call(Var("enterfun"),[String(f)]), e, Call(Var("exitfun"),[String(f)])])) TraceFunction : FunDec(f, x*, Tp(tid), e) -> FunDec(f, x*, Tp(tid), Seq([Call(Var("enterfun"),[String(f)]), Let([VarDec(x,Tp(tid),NilExp)], [Assign(Var(x), e), Call(Var("exitfun"),[String(f)]), Var(x)])])) IntroducePrinters : e -> Let(..., e)
The next program program fragment implements the same instrumentation transformation, but now it uses concrete syntax.
TraceProcedure : |[ function f(x*) = e ]| -> |[ function f(x*) = (enterfun(s); e; exitfun(s)) ]| where !f => s TraceFunction : |[ function f(x*) : tid = e ]| -> |[ function f(x*) : tid = (enterfun(s); let var x : tid in x := e; exitfun(s); x end) ]| where new => x ; !f => s IntroducePrinters : e -> |[ let var ind := 0 function enterfun(name : string) = ( ind := +(ind, 1); for i := 2 to ind do print(" "); print(name); print(" entry\\n")) function exitfun(name : string) = ( for i := 2 to ind do print(" "); ind := -(ind, 1); print(name); print(" exit\\n")) in e end ]|
It is clear that the concrete syntax version is much more concise
and easier to read. This is partly due to the fact that the concrete
version is shorter than the abstract version: 225 bytes vs 320 bytes
after eliminating all non-significant whitespace. However, the
concrete version does not use much fewer lines. A more important
reason for the increased understandability is that in order to read
the concrete version it is not necessary to mentally translate the
abstract syntax constructors into concrete ones. The implementation
of IntroducePrinters
is only shown in concrete syntax
since its encoding in abstract syntax leads to unreadable code for
code fragments of this size.
Note that these rewrite rules cannot be applied as such using simple innermost rewriting. After instrumenting a function declaration, it is still a function declaration and can thus be instrumented again. Therefore, we use a single pass topdown strategy for applying the rules.
The example gives rise to several observations. The concrete syntax
version can be read without knowledge of the abstract syntax. On
the other hand, the abstract syntax version makes the tree structure
of the expressions explicit. The abstract syntax version is much
more verbose and is harder to read and write. Especially the
definition of large code fragments such as in rule
IntroducePrinters
is unattractive in abstract syntax.
The abstract syntax version implements the concrete syntax version. The concrete syntax version has all properties of the abstract syntax version: pattern matching, term structure, can be traversed, etc.. In short, the concrete syntax is just syntactic sugar for the abstract syntax.
Extension of the Meta Language. The instrumentation rules make use of the concrete syntax of Tiger. However, program transformation should not be restricted to transformation of Tiger programs. Rather we would like to be able to handle arbitrary object languages. Thus, the object language or object languages that are used in a module should be a parameter to the compiler. The specification of instrumentation is based on the real syntax of Tiger, not on some combinators or infix expressions. This entails that the syntax of Stratego should be extended with the syntax of Tiger.
Meta-Variables. The patterns in the transformation rules are not just fragments of Tiger programs. Rather some elements of these fragments are considered as meta-variables. For example in the term
|[ function f(x*) = e ]|
the identifiers f
, x*
, and e
are not intended to be Tiger variables, but rather meta-variables,
i.e., variables at the level of the Stratego specification, ranging
over identifiers, lists of function arguments, and expresssions,
respectively.
Antiquotation.
Instead of indicating meta-variables implicitly we could opt for an
antiquotation mechanism that lets us splice in meta-level
expressions into a concrete syntax fragment. For example, using
~
and ~*
as antiquotation operators, a
variant of rule TraceProcedure
becomes:
TraceProcedure : |[ function ~f(~* x*) = ~e ]| -> |[ function ~f(~* x*) = (enterfun(~String(f)); ~e; exitfun(~String(f))) ]|
With such antiquotation operators it becomes possible to directly embed meta-level computations that produce a piece of code within a syntax fragment.
In the previous section we have seen how the extension of Stratego with concrete syntax for terms improves the readability of meta-programs. In this section we describe the techniques used to achieve this extension.
To embed the syntax of an object language in the meta language the
syntax definitions of the two languages should be combined and the
object language sorts should be injected into the appropriate meta
language sorts. In the Stratego setting this is achieved as follows.
The syntax of a Stratego module m
is declared in the
m.meta
file, which declares the name of an SDF
module. For instance, for modules using Tiger concrete syntax, i.e.,
using the extension of Stratego with Tiger, the .meta
would contain
Meta([Syntax("StrategoTiger")])
thus declaring SDF module StrategoTiger.sdf
as defining
the extension.
The SDF module combines the syntax of Stratego and the syntax of the object language(s) by importing the appropriate SDF modules. The syntax definition of Stratego is provided by the compiler. The syntax definition of the object language is provided by the user. For example, the following SDF module shows a fragment of the syntax of Stratego:
module Stratego-Terms exports context-free syntax Int -> Term {cons("Int")} String -> Term {cons("Str")} Var -> Term {cons("Var")} Id "(" {Term ","}* ")" -> Term {cons("Op")} Term "->" Term -> Rule {cons("RuleNoCond")} Term "->" Term "where" Strategy -> Rule {cons("Rule")}
The following SDF module StrategoTiger
, defines
the extension of Stratego with Tiger as object language.
module StrategoTiger imports Stratego [ Term => StrategoTerm Var => StrategoVar Id => StrategoId StrChar => StrategoStrChar ] imports Tiger Tiger-Variables exports context-free syntax "|[" Dec "]|" -> StrategoTerm {cons("ToTerm"),prefer} "|[" TypeDec "]|" -> StrategoTerm {cons("ToTerm"),prefer} "|[" FunDec "]|" -> StrategoTerm {cons("ToTerm"),prefer} "|[" Exp "]|" -> StrategoTerm {cons("ToTerm"),prefer} "~" StrategoTerm -> Exp {cons("FromTerm"),prefer} "~*" StrategoTerm -> {Exp ","}+ {cons("FromTerm")} "~*" StrategoTerm -> {Exp ";"}+ {cons("FromTerm")} "~int:" StrategoTerm -> IntConst {cons("FromTerm")}
The module illustrates several remarkable aspects of the embedding of object languages in meta languages using SDF.
A combined syntax definition is created by just importing appropriate syntax definitions. This is possible since SDF is a modular syntax definition formalism. This is a rather unique feature of SDF and essential to this kind of language extension. Since only the full class of context-free grammars, and not any of its subclasses such as LL or LR, are closed under composition, modularity of syntax definitions requires support from a generalized parsing technique. SDF2 employs scannerless generalized-LR parsing.
The syntax definitions for two languages may partially overlap,
e.g., define the same sorts. SDF2 supports renaming of sorts to
avoid name clashes and ambiguities resulting from them. In the
StrategoTiger module several sorts from the Stratego syntax
definition (Term
, Id
, Var
,
and StrChar
) are renamed since the Tiger definition
also defines these names. In practice, instead of doing this
renaming for each language extension, module
StrategoRenamed
provides a syntax definition of
Stratego in which all sorts have been renamed.
The embedding of object language expressions in the meta-language is implemented by adding appropriate injections to the combined syntax definition. For example, the production
"|[" Exp "]|" -> StrategoTerm {cons("ToTerm"),prefer}
declares that a Tiger expression (Exp
) between
|[
and ]|
can be used everywhere where a
StrategoTerm
can be used. Furthermore, abstract syntax
expressions (including meta-level computations) can be spliced into
concrete syntax expressions using the ~
splice
operators. To distinguish a term that should be interpreted as a
list from a term that should be interpreted as a list
element, the convention is to use a
~*
operator for splicing a list.
The declaration of these injections can be automated by generating an appropriate production for each sort as a transformation on the SDF definition of the object language. It is, however, useful that the embedding can be programmed by the meta-programmer to have full control over the selection of the sorts to be injected, and the syntax used for the injections.
Using the injection of meta-language StrategoTerm
s into
object language Exp
ressions it is possible to
distinguish meta-variables from object language identifiers. Thus,
in the term |[ var ~x := ~e]|
, the expressions
~x
and ~e
indicate meta-level terms, and
hence x
and e
are meta-level variables.
However, it is attractive to write object patterns with as few squigles as possible. This can be achieved through another feature of SDF, i.e., variable declarations. The following SDF module declares syntax schemata for meta variables.
module Tiger-Variables exports variables [s][0-9]* -> StrConst {prefer} [xyzfgh][0-9]* -> Id {prefer} [e][0-9]* -> Exp {prefer} "ta"[0-9]* -> TypeAn {prefer} "x"[0-9]* "*" -> {FArg ","}+ {prefer} "d"[0-9]* "*" -> Dec+ {prefer} "e"[0-9]* "*" -> {Exp ";"}+ {prefer}
According to this declaration x
, y
, and
g10
are meta-variables for identifiers and
e
, e1
, and e1023
are
meta-variables of sort Exp
. The last three productions
declare variables over lists using the convention that these are
distinquished from other variables with an asterisk. Thus,
x*
and x1*
range over lists of function
arguments. The prefer attribute ensures that these identifiers are
preferred over normal Tiger identifiers.
Parsing a module according to the combined syntax and mapping the parse tree to abstract syntax results in an abstract syntax tree that contains a mixture of meta- and object-language abstract syntax. Since the meta-language compiler only deals with meta-language abstract syntax, the embedded object-language abstract syntax needs to be expressed in terms of meta abstract syntax. For example, parsing the following Stratego rule
|[ x := let d* in ~* e* end ]| -> |[ let d* in x := (~* e*) end ]|
with embedded Tiger expressions, results in the abstract syntax tree
Rule(ToTerm(Assign(Var(meta-var("x")), Let(meta-var("d*"),FromTerm(Var("e*"))))), ToTerm(Let(meta-var("d*"), [Assign(Var(meta-var("x")), Seq(FromTerm(Var("e*"))))])))
containing Tiger abstract syntax constructors (e.g.,
Let
, Var
, Assign
) and
meta-variables (meta-var
). The transition from meta
language to object language is marked by the ToTerm
constructor, while the transition from meta-language to
object-language is marked by the constructor FromTerm
.
Such mixed abstract syntax trees can be normalized by `exploding' all embedded abstract syntax to meta-language abstract syntax. Thus, the above tree should be exploded to the following pure Stratego abstract syntax:
Rule(Op("Assign",[Op("Var",[Var("x")]), Op("Let",[Var("d*"),Var("e*")])]), Op("Let",[Var("d*"), Op("Cons",[Op("Assign",[Op("Var",[Var("x")]), Op("Seq",[Var("e*")])]), Op("Nil",[])])]))
Observe that in this explosion all embedded constructor applications
have been translated to the form Op(C,[t1,...,tn])
. For
example, the Tiger `variable' constructor Var(_)
becomes Op("Var",[_])
, while the Stratego meta-variable
Var("e*")
remains untouched, and meta-var
s
become Stratego Var
s. Also note how the list in the
second argument of the second Let
is exploded to a
Cons
/Nil
list.
The resulting term corresponds to the abstract syntax for the rule
Assign(Var(x),Let(d*,e*)) -> Let(d*,[Assign(Var(x),Seq(e*))])
written with abstract syntax notations for terms.
The explosion of embedded abstract syntax does not depend on the
object language; it can be expressed generically, provided that
embeddings are indicated with the FromTerm
constructor.
Disambiguating Quotations. Sometimes the fragments used within quotations are too small for the parser to be able to disambiguate them. In those cases it is useful to have alternative versions of the quotation operators that make the sort of the fragment explicit. A useful, although somewhat verbose, convention is to use the sort of the fragment as operator:
"exp" "|[" Exp "]|" -> StrategoTerm {cons("ToTerm")}
Other Quotation Conventions.
The convention of using |[...]|
and ~
as
quotation and anti-quotation delimiters is inspired by the notation
used in texts about semantics. It really depends on the application,
the languages involved, and the `audience' what kind of delimiters
are most appropriate.
The following notation was inspired by active web pages is
developed. For instance, the following quotation
%>...<%
and antiquotation <%...%>
delimiters are defined for use of XML in Stratego programs:
context-free syntax "%>" Content "<%" -> StrategoTerm {cons("ToTerm"),prefer} "<%=" StrategoTerm "%>" -> Content {cons("FromTerm")} "<%" StrategoStrategy "%>" -> Content {cons("FromApp")}
Desugaring Patterns. Some meta-programs first desugar a program before transforming it further. This reduces the number of constructs and shapes a program can have. For example, the Tiger binary operators are desugared to prefix form:
DefTimes : |[ e1 * e2 ]| -> |[ *(e1, e2) ]| DefPlus : |[ e1 + e2 ]| -> |[ +(e1, e2) ]|
or in abstract syntax
DefPlus : Plus(e1, e2) -> BinOp(PLUS, e1, e2)
This makes it easy to write generic transformations for binary operators. However, all subsequent transformations on binary operators should then be done on these prefix forms, instead of on the usual infix form. However, users/meta-programmers think in terms of the infix operators and would like to write rules such as
Simplify : |[ e + 0 ]| -> |[ e ]|
However, this rule will not match since the term to which it is applied has been desugared. Thus, it might be desirable to desugar embedded abstract syntax with the same rules with which programs are desugared. This phenomenon occurs in many forms ranging from removing parentheses and generalizing binary operators as above, to decorating abstract syntax trees with information resulting from static analysis such as type checking.
This chapter is work in progress. Not all parts have been finished yet. The latest revision of this manual may contain more material. Refer to the online version.
In the previous chapters we have shown how programmable rewriting strategies can provide control over the application of transformation rules, thus addresing the problems of confluence and termination of rewrite systems. Another problem of pure rewriting is the context-free nature of rewrite rules. A rule has access only to the term it is transforming. However, transformation problems are often context-sensitive. For example, when inlining a function at a call site, the call is replaced by the body of the function in which the actual parameters have been substituted for the formal parameters. This requires that the formal parameters and the body of the function are known at the call site, but these are only available higher-up in the syntax tree. There are many similar problems in program transformation, including bound variable renaming, typechecking, data flow transformations such as constant propagation, common-subexpression elimination, and dead code elimination. Although the basic transformations in all these applications can be expressed by means of rewrite rules, these require contextual information.
In Stratego context-sensitive rewriting can be achieved without the added complexity of local traversals and without complex data structures, by the extension of rewriting strategies with scoped dynamic rewrite rules. Dynamic rules are otherwise normal rewrite rules that are defined at run-time and that inherit information from their definition context. As an example, consider the following strategy definition as part of an inlining transformation:
DefineUnfoldCall = ?|[ function f(x) = e1 ]| ; rules( UnfoldCall : |[ f(e2 ) ]| -> |[ let var x := e2 in e1 end ]| )
The strategy DefineUnfoldCall
matches a function
definition and defines the rewrite rule UnfoldCall
,
which rewrites a call to the specific function f , as encountered in
the definition, to a let expression binding the formal parameter
x
to the actual parameter e2
in the body
of the function e1
. Note that the variables
f
, x
, and e1
are bound in
the definition context of UnfoldCall
. The
UnfoldCall
rule thus defined at the function definition
site, can be used at all function call sites. The storage and
retrieval of the context information is handled transparently by the
underlying language implementation and is of no concern to the
programmer.
An overview with semantics and examples of dynamic rewrite rules in Stratego is available in the following publications:
M. Bravenboer, A. van Dam, K. Olmos, and E. Visser. Program Transformation with Scoped Dynamic Rewrite Rules. Fundamenta Informaticae, 69:1--56, 2005.
An extended version is available as technical report UU-CS-2005-005.
K. Olmos and E. Visser. Composing Source-to-Source Data-Flow Transformations with Rewriting Strategies and Dependent Dynamic Rewrite Rules. In R. Bodik, editor, 14th International Conference on Compiler Construction (CC'05), volume 3443 of Lecture Notes in Computer Science, pages 204--220. Springer-Verlag, April 2005.
An extended version is available as technical report UU-CS-2005-006
Since these publications provide a fairly complete and up-to-date picture of dynamic rules, incorporation into this manual is not as urgent as other parts of the language.