Home

RECODER Technical Manual

This manual describes the core functional system of ECODE v0.7. Potential users should read this manual in order to understand the architecture of the system.

The RECODER libraries provide you with a powerful platform for all kinds of Java source-to-source transformations. However, the use of such a complex system requires quite some knowledge in the fields of programming languages and compiler technology. You have been warned...

For optimal online reading, activate Cascading Style Sheets and set the width of this frame to about 800 pixels.

Example
This manual contains small examples illustrating frequently occuring tasks.

Last changes to this document: Apr 30th 2001
Author: Andreas Ludwig

Introduction

RECODER is a facility to support static meta programming of Java program sources. The system allows to parse and analyze Java programs, transform the sources and write the results back into source code form.

To do so, RECODER derives a meta model of the entities encountered in Java source code and class files. This model contains a detailed syntactic program model that can be unparsed with only minimal losses. While the syntactic model provides only the containment relation between elements, the complete model adds further relations, such as type-of, or refers-to, as well as some implicitly defined elements, such as packages or primitive types.

One might insist that only the derived entities and relations belong to the meta level of the model, but to simplify we treat the syntactic and derived elements as parts of one model only, which we will call meta model, or program model.

In order to derive this semantic information, RECODER runs a type and name analysis which resolves references to logical entities. The refers-to relation can be made bidirectional for full cross referencing which is necessary for efficient global transformations.

While a use of RECODER for purpose of analyses only builds up the meta model and do not change anything, e.g. software metric tools, static Metaprograms use the meta model information to control transformations of the source code model, which in turn may change further model parts. Metaprogram applications use the RECODER pretty printer to reproduce the source files afterwards. The pretty printer will attempt to retain the code formatting and to integrate new code fragments seamlessly.

Currently, RECODER has several restrictions, none of which is fundamental, but they are important to know:


Subsystem Responsibilities

The core of RECODER is its program model. The model itself can be regarded as residing in a database. RECODER offers a series of service modules that build and update the model automatically. Users will extend the metaprogramming library and access the services as well as the modules. The figure shows how the different subsystems interact and how they influence the model.


Package Responsibilities

recoder service configurations and base types
recoder.abstraction interface-level semantic model
recoder.bytecode java byte code elements
recoder.io repository and io services
recoder.java java source elements
recoder.kit metaprogramming library
recoder.service program analysis services
recoder.convenience auxiliaries specific to RECODER
recoder.list type-safe lists
recoder.parser* the generated Java parser
recoder.util* auxiliaries not specific to RECODER

For quick navigation in the API, the table on the right hand side shows the contents of the most important RECODER top-level packages.

This manual does not cover the packages marked with a *. These packages are subject to further documentation, primarily contained in the API documentation.

The following sections describe the core model elements and services. The appendix of this document describes the list data structures and some other important auxiliaries.


The Program Model

The Base Elements

The figure shows the base elements of the RECODER meta model, some sample elements and services that deal with the interface-level abstractions in different representations.

The topmost type for semantic model data is recoder.ModelElement which can represent arbitrary model data such as the majority of syntax nodes, annotations, or syntactic footprints of design patterns. recoder.NamedModelElements are model elements that feature a meaningful name for each instance.

The topmost type for syntactic Java source data is recoder.java.SourceElement representing any node in a syntax tree such as comments, which do not necessarily carry semantic information. Byte code elements are also syntactic. They are subtypes of recoder.bytecode.ByteCodeElement.

recoder.abstraction.ProgramModelElements are part of the abstract model and represent entities visible at the interface level (hence, they are all NamedModelElements). Currently, all ByteCodeElements covered by RECODER are ProgramModelElements, but not all ProgramElements (a Plus operator is not contained in the abstract model).

While the meta model is central to RECODER, the abstract interface-level model is central to the meta model, and will be detailed in the following section.

The Abstract Model

The core part of the RECODER meta model is located in recoder.abstraction and primarily consists of entities that occur in an API documentation: Types, Variables, Methods, Packages, with some additional abstractions such as Member or ClassTypeContainer. These entities are inherited from ProgramModelElement.

While many ProgramModelElements have a syntactic representations, the recoder.abstraction package also contains entities that have no syntactic representation at all, but are implicitly defined. Examples are ArrayType, DefaultConstructor, or the aforementioned Package.

The figure shows the elements of this abstract model and their associations.

Element Representations

Program model elements can have different data layouts and hence require different access schemes. RECODER therefore provides a set of modules that handle the accesses and rules for each kind of representation.

Each program model element has an instance of a ProgramModelInfo service assigned that fits to the representation of that element. The service knows how to access and interpret the data layout of those elements. A program model element provides an interface for certain queries, and most of them will be passed to the corresponding service, which happens entirely in the background. Further queries of the ProgramModelInfo deal with type widening, subtype relations, member visibilities and overloaded methods.

In most cases, querying the program model elements directly will be completely sufficient for users. However, there are more model elements than subtypes of program model elements only, which basically cover the interface level. For instance, the SourceInfo service as an extension of ProgramModelInfo can compute the Type of any Expression, which is a syntactic model not contained in the program model subhierarchy. The SourceInfo covers most of the model queries that a typical user will need.

The figure illustrates the different information services and the core interfaces for each representation. Different representations of ClassType are shown as an example.

In addition to the services which derive information about model elements there are repositories for each type of element representation: The recoder.io.SourceFileRepository can read, store, and write recoder.java.CompilationUnits, while the recoder.io.ClassFileRepository can read and store recoder.bytecode.ClassFiles. The repositories can be used to find an entry point to the meta model by means of a physical file name.

Implicit elements such as array types or packages are registered or can be created at the recoder.service.NameInfo service. This service also knows all class types by their logical name, and is similar to the definition table of a common compiler. The NameInfo is important to collate reference of the form "a.b.c" and can also be used to find an entry point to the meta model by means of logical names.

The following table lists all relevant element types, the proper information and repository services.

RepresentationElement SupertypeDedicated Info ServiceDedicated Repository
(Abstract) recoder.abstraction.
ProgramModelElement
recoder.service.
ProgramModelInfo
-
Source Code recoder.java.
ProgramElement
recoder.service.
SourceInfo
recoder.io.
SourceFileRepository
Byte Code recoder.bytecode.
ByteCodeElement
recoder.service.
ByteCodeInfo
recoder.io.
ClassFileRepository
Implicitly Defined recoder.abstraction.
NullType,
Package,
DefaultConstructor,
ArrayType
recoder.service.
ImplicitElementInfo
recoder.service.
ImplicitElementInfo,
NameInfo

Examples for Information Service Uses

The following examples illustrate the usage of the services. Some examples use concrete program elements - assume for now that these are available from somewhere (navigation in syntax trees is described later on).

Example: Find the return type of a Method m
m.getReturnType() will derive the required Type. Internally, the work is done by the corresponding ProgramModelInfo service. A method included in a ClassFile will have the return type in a fully qualified textual representation inside a class file (e.g. "java.lang.String"), while a method included in a CompilationUnit might have a short name of a type only (e.g. "String"). The ByteCodeInfo or SourceInfo will handle the appropriate extractions.

If m is an instance of MethodDeclaration, the corresponding service is the SourceInfo which could also be asked directly for the type of any ProgramElement by getType((MethodDeclaration)m). The result will be the same.

Example: Find the containing ClassType of a Method m
As a Member, a Method provides a query getContainingClassType() that does the job. If m is a MethodDeclaration, this query will report a TypeDeclaration.

Example: Find all Methods that are defined in a ClassType c, including inherited ones
c.getMethods() will only report a list of methods that are added or redefined in c, but c.getAllMethods() will include all methods available. The methods are in a order that corresponds to a tolopological order of the inheritance graph with leaf c - the methods of c.getMethods() will come first, the methods of java.lang.Object will be at the end of the list.

Example: Find the Method that a MethodReference r refers to.
As r is a ProgramElement, the SourceInfo is responsible for this: getMethod(r) should correctly follow the method hiding rules imposed by overloading as well as inheritance. Note that the result may be a MethodDeclaration, but could also be a ByteCodeElement.

Example: Get a ClassType with the fully qualified name n
getClassType(n) in the NameInfo will look up known types with name n and will attempt to load a definition if this is not successful. This process might trigger a lot of parsing and analyses in the background.

Example: Check if a Method m is accessible from within a ClassType c
Any ProgramModelInfo can check visibilities for accesses from within classes: isVisibleFor(m, c).

A less trivial example: Find out which exceptions of a Method are no RuntimeExceptions
ClassTypeList getExceptionsToCatch(Method m) {
    ClassType rtException =
        getNameInfo().getClassType("java.lang.RuntimeException");
    ClassTypeList exceptions = m.getExceptions();
    if (exceptions == null || exceptions.isEmpty()) {
        return ClassTypeList.EMPTY_LIST;
    }
    ClassTypeMutableList result = new ClassTypeArrayList();
    for (int i = exceptions.size() - 1; i >= 0; i -= 1) {
        ClassType e = exceptions.getClassType(i);
	if (!e.getProgramModelInfo().isSubtype(e, rtException)) {
	    result.add(e);
        }
    }
    return result;
}

The Source Code Model

While some source code elements occur as program model elements already and are hence part of the abstract model, the syntax trees offer much more detailed ways of traversal and manipulation.

The syntactic model is a syntax forest (a set of syntax trees) and has much more elements than the abstract core which only covers commonalities with the byte code interface information. Users must know how to navigate in and how to manipulate syntax trees, and how to deal with references in particular.

Syntax Trees

Syntax trees consist of two disjoint types of ProgramElements: TerminalProgramElements, and NonTerminalProgramElements. Only the later may have children, though they do not have to (in fact, each non terminal has children, but they are not modelled explicitly, such as keywords, commas or semicolons). The root of a complete syntax tree is a CompilationUnit which are managed by the SourceFileRepository service.
Incomplete syntax trees with arbitrary roots may exist temporarily but are not part of the regular model.

Retrieving Syntax Trees

There are three ways to actually create a compilation unit object:

  1. Use the NameInfo service and provide a fully qualified logical name for the primary class type of a compilation unit. The unit is loaded in order to create the requested class type and is analyzed automatically.

  2. Use the SourceFileRepository and provide a location or file name of a compilation unit. The unit is loaded and will be analyzed on demand. The service also offers a convenience method that allows to load all files in a directory recursively, or via a file filter.

  3. Call the parse methods of the ProgramFactory directly. This is only viable in special situations, as the system is not aware of the unit. Syntax elements created in that way should never interfere with the main project data in order to avoid unwanted duplications. It is okay, however, to parse syntax tree from template files or likewise and register the possibly manipulated trees manually, when the template files are not part of the search path.

If the project has already been set up, the list of known compilation units is available from the SourceFileRepository.

Example: Parse a CompilationUnit from a file with name filename and have it registered.
SourceFileRepository sfr = serviceConfig.getSourceFileRepository();
try {
   CompilationUnit u = sfr.getCompilationUnitFromFile(filename);
} catch (ParserException pe) {
   // do something?
}
The resulting compilation unit is not parsed again if the given file has already been parsed before, for instance due to a type query that lead to an automatic retrieval.

Traversing Syntax Trees

Traversal of syntax trees is a common task. RECODER allows to traverse a tree in depth first order and also to walk up the tree following parent links.

Traversing Children of a NonTerminalProgramElement

Nonterminal syntax nodes contain links to their child nodes. It is possible to access certain children directly, or to just traverse all children in syntactic order, regardless of their role.

Example: Traverse all children of a NonTerminalProgramElement nt in syntactic order.
for (int i = 0, s = nt.getChildCount(); i < s; i += 1) {
   ProgramElement child = nt.getChildAt(i);
   // do something with the child
}
These access methods simulate array accesses.

Example: Access all parameters of a MethodDeclaration m.
ParameterDeclarationList plist = m.getParameters();
if (plist != null) {
    // traverse list
}
Note that there is also a second way to traverse the parameters: As a method declaration is a ParameterContainer, the class additionally offers a direct access interface via getParameterDeclarationCount() and getParameterDeclarationAt(int index). Both are possible ways to read the parameters, but changes are possible only via the list versions.

Traversing Trees

Traversing whole (sub)trees can be done by manual implementation of an appropriate depth or breadth first search using the child traversal procedure. RECODER already provides convenient iterators for this task: The recoder.convenience.TreeWalker reports program elements in depth first order.

Example: Visit all nodes in a syntax tree with root root in depth first order.
TreeWalker walker = new TreeWalker(root);
while (walker.next()) {
    visit(walker.getProgramElement());
}
The iterator differs slightly from standard implementations as the proceeding and termination check is combined, and the current element can be queried multiple times between movements.

There are also two specialized implementations of the TreeWalker: The ForestWalker performs a depth first iteration over a list of compilation units, and the CustomTreeWalker allows to report ascending visits and to control recursion.

Traversing Parents of a ProgramElement

The RECODER syntax trees feature parent links for efficient upward traversal. This is very convenient as it allows to leave out context parameters of transformations - the context is easily accessible via the parent traversal.

The parent of a program element p is obtained by p.getASTParent() and is a NonTerminalProgramElement. Most program elements feature additional parent queries returning more specialized parent types; for instance, MethodDeclaration.getParent() returns a TypeDeclaration.

A program element may be the parent of different children types. RECODER introduces additional interfaces when the class type of a program element alone is not sufficient to describe this role. For instance, a MethodDeclaration can contain the following child types:

The taxonomy of parent properties does not cover any combination of parent-child roles - for instance, there are StatementContainers that allow a single StatementBlock only, while others allow a single Statement, or a list of Statements. However, the existing interfaces reduce the amount of possible parent types significantly.

In some cases, there are several specialized parent types reflecting different roles an element can play. For instance, a MethodReference may be used as a pure expression (its ExpressionContainer is set), as a pure statement (its StatementContainer is set), or as a prefix of an access path (its ReferenceSuffix is set). Only one specialized parent link may be set (!= null) at a time. The generic getASTParent method will report the unique parent link, but will only return it as a NonTerminalProgramElement.

Example: Find the CompilationUnit u of a ProgramElement p.
NonTerminalProgramElement q = p, r = q.getASTParent()
while (r != null) {
   q = r;
   r = q.getASTParent();
}
u = (q instanceof CompilationUnit) ? (CompilationUnit)q : null;
If p is part of a complete syntax tree with valid parent links, only a compilation unit may have a null parent. Note that this useful function is also already part of the library: u = recoder.kit.UnitKit.getCompilationUnit(p).

Whitespace

Correct treatment of whitespaces (blanks, linefeeds) is important for transformation systems in order to guarantee properly formatted output. Logical formatting of the code such as additional line feeds should be retained to facilitate recognition of transformed code.

Common abstract syntax representations usually ignore "implicit" tokens such as commas and semicolons. Thus, format information between those implicit tokens cannot be detected. RECODER uses a compromise and stores information about one "primary" implicit token per non terminal, e.g. the keyword, or an opening bracket. Further tokens such as commas are not made explicit and whitespaces in between them might be changed by the pretty printer. Fortunately, this will usually result in improved code quality.

Each SourceElement has three types of positional information (SourceElement.Position) attached: The absolute start position of the token, the absolute ending position, and a relative position. While absolute positions are valid after parsing and before any modifications have occured, relative positions remain stable during transformations. RECODER 0.6 does not set all relative positions of all implicit tokens, e.g. dots between names are still left out; end coordinates are not set at all yet.

The PrettyPrinter can reset absolute positions if the OVERWRITE_PARSE_POSITIONS property is set. RECODER 0.6 does not yet implement this behavior correctly. The pretty printer will set relative positions to proper values if their position is set to Position.UNDEFINED, which is the case for all newly created elements. The combination of defined and undefined relative positions allows to embed new code seamlessly. The pretty printer also ensures that a minimum relative position is obeyed in order to produce correct concrete syntax: e.g. an unmodified declaration might have zero blanks assigned in front of the type reference; after addition of a modifier, at least one blank will be inserted.

After cloning or parsing of a code fragment, absolute and relative positions have been defined. For seamless embedding, these might not be wanted and can be unset: MiscKit.unindent will do this for a subtree.

Comments

Java offers three kinds of comments: The SingleLineComment //..., the ordinary multi line Comment /* ... */, and the special DocComment /** ... */ in front of named member declarations. RECODER assigns comments to one adjacent program element, following an assignment heuristics considering empty lines (as sketched in the table below). Each ProgramElement can contain a list of comments, which in turn might preceed or follow this element (Comment.isPrefixed()). The pretty printer knows how to handle both cases.

######
// comment

######
######

// comment
######
######

// comment

######
Should belong to the upper code part. Should belong to the lower code part. Might belong to the lower code part.

As all heuristics it may fail, thus associating a comment with the wrong element. This has no visible consequences unless the element is moved away - then, the comment moves with the (wrong) element.

As of RECODER 0.6, missing implicit tokens impose further, technical problems. As ending positions are still missing, comments that follow their logical parent element might not be assigned to this element when there are no tokens to be attached to. For instance, comments in empty statement blocks are assigned to the next visible element rather then the empty block, as the closing brackets are not registered. Ending positions will allow to fix these remaining issues.

Comment update is a major problem and there is little hope to deal with this. However, when restructuring a program, the documentation must be brought up to date anyway. Also, most programmers use sparse comments only...

Future RECODER versions might evaluate DocComments for the tags @see and @link in order to update entity names as part of cross reference information. To do so, comments must become proper ModelElements, including the package.html files.

Declarations, References, and Program Model Elements

For each program model element of the abstract model, there is at most one syntactic declaration, and arbitrarily many references to that element. The source code model provides a hierarchy of recoder.java.Declarations in the recoder.java.declaration package and recoder.java.References in the recoder.java.reference package that correspond to the respective abstract model entities.

Program Model ElementDeclarationReference SourceInfo Reference
Resolution Method
Package - PackageReference getPackage
Type - TypeReference getType
ArrayType - ArrayReference -
ClassType TypeDeclaration TypeReference getType
Method MethodDeclaration MethodReference getMethod
Constructor ConstructorDeclaration ConstructorReference getConstructor
Variable VariableSpecification VariableReference getVariable
Field FieldSpecification FieldReference getField

Note that the declarations of variables and fields are not VariableDeclarations but VariableSpecifications which are children of the former. This is a consequence of the C-like syntax that allows to write code like "int i, j, k;".

Resolving Declarations

Accessing the program model element of a declaration could not be easier: The corresponding source elements already implement the abstract element interface, so the Method of a MethodDeclaration md is the object itself: Method m = (Method)md; There are additional casting methods in the SourceInfo service for sake of consistency.

Resolving References

Accessing the program model element of a reference is handled by the SourceInfo service. The service will automatically resolve inheritance and overloading issues properly.

Note that references such as a.b.c in a parsed tree cannot be resolved by considering the local context only. For instance, a or b could be packages, types, or variables. After parsing, RECODER will insert an UncollatedReferenceQualifier for unknown references. This reference must be resolved using the SourceInfo method resolveURQ. In the cross reference configuration, RECODER will resolve these references automatically when creating cross reference information.

Retrieving References

Finding the declaration fitting to a reference is easy. The opposite direction is also available: The CrossReferenceSourceInfo as an extension of SourceInfo provides all known references to a given program model element. Obviously, cross referencing will work for known sources only and requires a closed world. The cross referencer will not find references inside class files, nor will it find references in source files that have not been touched yet.

Example: Find all MethodReferences to a Method m
getReferences(m) defined by the CrossReferenceSourceInfo service returns a list of MethodReferences that are currently known.

Note that references such as a.b.c in a parsed tree cannot be resolved by considering the local context only. For instance, a could denote a package, type, or variable. The parser will insert UncollatedReferenceQualifiers for these problematic references. The SourceInfo contains a resolveURQ method to assign this type of references. Depending on the configuration of RECODER, all references might be resolved automatically, so URQs only occur in new syntax trees.

The CrossReferenceSourceInfo service can also deliver known subtypes of given types using the cross reference information.

Projects

RECODER provides a set of services bundled in a service configuration. Configurations ensure a consistent set of service implementations suited for a particular task. For instance, the closed world assumption is necessary for consistent transformation, but is not suited for demand-driven analysis.

Service Configurations

The DefaultServiceConfiguration contains only a standard SourceInfo service without cross reference information and is suited for pure one-pass analysis purposes only. Uncollated references will not become resolved automatically, although this is still possible by manual calls to resolveURQ.

The CrossReferenceServiceConfiguration will automatically analyse all references in the program model and add the cross reference information. This is the configuration of choice for transformational tasks.

ProjectSettings

The recoder.io.ProjectSettings service is a global repository for important settings such as

All supported logical and physical property names are defined and documented in recoder.io.PropertyNames.

The project settings service can also locate the proper class file archives containing at least the java.lang system classes, and is able to load and write property files containing the project information.

Project Files

Project files contain all properties of the ProjectSettings in textual form. A typical project file lists all source files:

#RECODER Project File
input.path=converter/original
output.path=converter/modified
units=ConversionPanel.java,DecimalField.java,Unit.java,Converter.java,\
ConverterRangeModel.java,FormattedDocument.java,FollowerRangeModel.java

Project files allow to make project settings persistent in an easy way.

Example Analysis Applications

Example: A small program that returns all supertypes of a given class type.
Imports are left out for brevity:
public class Demo { 
    public static void main(String[] args) { 
        System.out.println(recoder.convenience.Format.toString("%N",
	    new recoder.CrossReferenceServiceConfiguration().getNameInfo().
	        getClassType(args[0]).getAllSupertypes())); 
    } 
}
After compilation, execution of
java Demo java.lang.String
should produce
(java.lang.String, java.io.Serializable, java.lang.Comparable, java.lang.Object)

Note that the String type itself is also part of the supertypes list, which is convenient for type checking.

Transforming Programs

Program transformation applications are the primary intent of RECODER. The architecture attempts to facilitate the task of writing transformations; transformations operate on the abstract syntax of a program and do not have to maintain derived data (e.g. changing the name of a type if its name has changed) or the concrete syntax (e.g. adding a comma if a supertype has been added).

The following sections will show how to modify syntax trees, how to print the results, and finally several sections about the proper use and definition of program transformations.

Modifying Syntax Trees

Concrete syntax of the language is controled by the ProgramFactory service. This module provides a lot of factory methods for all source elements (one method for each constructor available), direct access to the parser, and a factory method for the pretty printer backend. The ProgramFactory is also used internally by the SourceFileRepository.

Building a syntax tree is possible by

These alternatives differ in the following ways:

New CompilationUnit nodes are not added to a common root node but registered to the system via the ChangeHistory service as special case of an ordinary transformation report. This protocol is described later.

Example: Insert a TypeDeclaration d into a CompilationUnit u.
TypeDeclarationMutableList list = u.getDeclarations();
if (list == null) {
   list = new TypeDeclarationArrayList();
   u.setDeclarations(list);
}
list.add(d);
d.setParent(u);
In general, consistency of syntax trees is not checked agains the language rules, so there might be temporarily invalid trees. Therefore, child lists of certain roles might be undefined.
An alternative to the dedicated call d.setParent(u) which can differ for other child roles, u.makeParentRoleValid() could be called which would (re)set the parent links of all children of u.

Removal of subtrees is easy to do: Simply remove a child node from it's parents list, or set the child attribute to null. There is no obligation to update the parent link, but transformations usually will have to give a report on the changes they have performed in order to obtain a conforming model. We will get back to that soon.

Printing Syntax Trees

Creating concrete syntax out of an abstract syntax tree is the job of a PrettyPrinter. Pretty printers are created byte the ProgramFactory service and are initialized with the current ProjectSettings.

Top-level printing of compilation units is triggered by the SourceFileRepository which can print out all units, or only all changed units, or single units that it knows about. The destination is a file corresponding to the full path name of a unit and the output path as defined in the ProjectSettings. It is not possible to write back subtrees or to other destinations using this interface.

Each SourceElement also defines a convenient toSource() method which creates a string with a dump of the subtree represented by the given root element. The output has no initial indentation (it is "trimmed").

Example: Write a Statement s to the console.
ProgramFactory pf = serviceConfig.getProgramFactory();
PrettyPrinter pp = pf.getPrettyPrinter(new PrintWriter(System.out));
s.accept(pp);
Usually one would probably prefer the short version for this task: System.out.println(s.toSource())

The Change History

The ChangeHistory service is the central agenda mechanism of RECODER. It propagates changes of the syntactic model to services that use this information to update the model elements they maintain. Changes are reported by the automatic class definition loader contained in the SourceFileRepository and by transformations.

In terms of design patterns, the change history serves as a mediator and a subject of an observer pattern. Currently changes are interpreted by the SourceFileRepository and the SourceInfo. RECODER 0.6 does not yet use the change information to full extend; the update mechanism is not very fine grained. Model updates are currently very expensive until the report handlers are refined.

The change history only propagates changes on demand. This allows to bundle change reports which reduces the amount of update phases. Services that require up to date model information will request a model update from the change history (via an updateModel() call). Transformations will not have to do trigger the updates by themselves; instead, updating service queries will perform this call transparently.

The change history maintains a queue of change reports for propagations. In parallel, the service maintains a stack of changes that is used for rollbacks.


Program Element Visibility

Usually program transformations will generate small amount of new code. This code obviously is not yet supported by the services. It is therefore important to know if a syntax element is part of the known model or temporarily "invisible":

A program element is visible for the RECODER services if and only if it is either

Obviously, service queries only work for visible elements. To make a new syntax element visible, transformations must report them as "attached" to the change history.

Change Reports

Change reports describe syntactic changes of syntax trees in terms of two atomic transformations: Attachment of a new subtree, and Detachment of a subtree. It is possible to describe any syntactic change using a series of these atomic transformations. Such a series is the syntactic "footprint" of a metaprogram. Transformations must send change reports for every visible change they have performed, before a model update becomes necessary.

In case of a model update, all incoming change reports will be propagated to the respective services who will traverse the reported trees and update their caches.

Change reports are also used for rollbacks. If a rollback is requested, the change history will revert and execute all changes to that point (a detach becomes an attach, and vice versa) and update all data structures accordingly. To allow nested rollbacks, transformations must inform the change history when they begin to make changes. Undo operations are possible until a commit is performed.

Changes may stem from different transformations and may be redundant. For traversal purposes, not all changed trees must be visited - it is sufficient to descent the largest subtrees only. Change reports that deal with subtrees contained in other changed trees will be marked as "minor" and can be safely skipped. This does not apply to undo operations, however.

Example: Add a new CompilationUnit to the system.
Assume we are within a Transformation subclass and the new CompilationUnit is u. Then, attach(u) will create a change report scheduling the compilation unit as new element. The SourceFileRepository will recognize it correctly during the next model update - the new unit is now visible.

Transformation Objects

Transformations in RECODER follow the Command design pattern and are materialized as objects. This allows explicit management of transformation, to do rollbacks, and to access intermediate results from within other objects. The base class of any transformation is recoder.kit.Transformation.

Transformations know the CrossReferenceServiceConfiguration they are running in, and offer a lot of convenience functionality. They provide quick access methods to all services (such as getSourceInfo()), as well as routines to detach, replace, and attach nearly all possible combinations of program elements. These helper methods take over the following steps:

There about six dozend different attach versions defined (such as attach(Else,If)), and all possible ambiguities are resolved by chosing unambiguous names to avoid overloading pitfalls. For instance, there are four different methods to attach a MethodReference to a For loop: These methods are in fact atomic transformations which produce the corresponding change reports. It is useful to distinguish these low-level operations from the higher level transformations which are materialized as objects.

A high-level transformation is executed in several phases and must follow a proper protocol which will be described in detail when the requirements are clear.

Transformation Types

While in general, a transformation can use any service queries, some transformations are syntactic, such as the atomic transformations.

Sometimes, it is useful to modify newly created, invisible syntax trees. Transformations that modify these are also invisible and may be marked as such. Invisible transformations must be syntactic and do not send change reports.

To support invisible, syntactic transformations, the Transformation class offers versions of detach, replace, and the attach variants, which perform the changes but do not send change reports. They are static methods and have a do prefix: doAttach, doDetach, doReplace. These variants are also valuable for small invisible transformation steps as part of a visible transformation, e.g. to construct a small tree "on the fly" before attaching the result and thereby making it visible.

Composing Transformations

Meaningful transformations are usually chains of existing transformations. Unfortunately, composing transformations is not trivial. For total correctness (that is, partial correctness plus termination), dependencies between transformations must be taken into account: each change of the model might invalidate or restrict known results. This influences results of old meta model queries as well as results of syntactic traversals. Syntax trees might have removed, added, or just moved with new identities.

For consistency and efficiency, queries and modifications should be kept separate as much as possible to avoid unwanted side-effects and to reduce the number of model updates. The Transformation framework supports this by separating the analysis and the transformation phase.

Proper Behavior

In order to receive all infrastructural benefit, transformations must obey certain protocol rules and should follow some conveniences. A transformation operates in three phases:

  1. A new transformation object is created.
    • The current cross reference service configuration (or a subtype thereof) must be passed as an argument.
    • Further necessary initial arguments should be checked for consistency and stored in attributes until they are needed.
    • If initial arguments could be meaningful for external transformations, they should be made accessible.
  2. The analyze method should derive all data necessary to perform the changes.
    • No visible element may be changed.
    • A ProblemReport must be set (to the report field) and returned.
      The field is used to double check the protocol: it must be set to a NoProblem instance when transform is called. Callers use the ProblemReport to display warnings or choose alternative strategies. There are three constants for positive reports: IDENTITY (the transformation phase will not change anything), EQUIVALENCE (the resulting program shows the same functional observable behavior, excluding introspective data), NO_PROBLEM (no particular guarantees).
    • Relevant results of the analysis phase should be made accessible in order to allow reuse by callers. If other transformations are used, they can be made accessible to save additional access methods.
  3. The transform method performs the syntactic changes.
    • As the first action, the change history should be informed about the beginning of the transformation. The easiest way to do this is to call the corresponding method in the abstract super class: super.transform().
    • During the transform phase, no model update may happen, that is, no updating query may be performed. It is admissible to perform syntactic navigation.
    • The method may perform any syntactic change, but must report any visible changes in the correct order before leaving. This is done automatically when the convenience methods of Transformation are used.
    • Detached subtrees may never be manipulated or attached again! To move a subtree, a clone should be attached instead, using the deepClone() method.
      Detached subtrees are no longer part of the visible model, however references to the root are still stored and might be processed later on, either for model updates or rollback operations. Therefore, when the subtree may not be changed - visibly or invisibly.
    • Changes should be made explicitly accessible for callers giving the proper motivation - what has been changed, and why? It is important for calling transformations to know which elements have been removed, added, or are replaced by clones, in order to properly update cached results obtained before the transformation is executed. If other transformations are used, they can be made accessible to save additional access methods.

Documenting Transformations

Documentation of a transformation should inform about the purpose of the transformation (its pragmatics), the syntactic changes, and the semantic consequences: Is the result identical, or does it preserve the observable behavior, or is it just compileable, or is it only parseable, or none thereof?

Using Kits

The kit auxiliary classes support the Transformation classes. The contain auxiliary queries and factory methods for often needed syntax trees. Kit classes are roughly grouped by the most important model element dealt with: UnitKit, TypeKit, MethodKit, and so on. The MiscKit deals with general syntax trees.

To give an impression of the contents of the kits, the following table lists a few auxiliaries:

MethodKit.getGettersGuesses access methods for fields
UnitKit.getCompilationUnit Walks up to the compilation unit of a program element
MiscKit.contains Checks if a program element is contained in a subtree
MiscKit.unindent Removes indentation information from a subtree (useful after partial parsing)
TypeKit.createTypeReference Creates the shortest reference to a type for a given context
VariableKit.getNewVariableName Creates admissible variable names mangling corresponding type names

Appendix: Auxiliaries

The following sections describe auxiliaries that are necessary to work with RECODER.

Type-safe Lists

The RECODER API tries to cover type constraints in detail. As the current version of Java is lacking generic types, this is hard to do with containers of unclassified objects. Containers that occur in public signatures are therefore expanded for concrete subtypes. What would be otherwise declared as a List of Identifier, or List<Identifier> becomes a IdentifierList in that approach. In contrast to the standard Java collection API, the RECODER lists distinguish between mutable and read-only lists: You can browse through a IdentifierList, but you can change a IdentifierMutableList only.

Lists of elements that stand in a subtype relation also stand in a subtype relation: An IdentifierList is a ProgramElementList. To prevent dangerous manipulations, this is not true for the mutable versions: An IdentifierMutableList is not a ProgramElementMutableList.

List elements are addressed by indices rather than iterators or explicit nodes. This saves storage and provides a slim interface. The default implementations end with ~ArrayList and use the common array doubling technique. If elements must be added and the number is known a-priori (e.g. before concatenating a set of lists), it is wise to increase the capacity of the array.

The following examples illustrate the usage of the RECODER lists.

Example: Traverse a MethodList list.
for (int i = 0, s = list.size(); i < s; i += 1) {
   dosomething(list.getMethod(i));
}
The reverse order can be more convenient: for (int i = list.size() - 1; i >= 0; i -= 1) ...

Example: Create a list of methods for at least n elements.
MethodMutableList list = new MethodArrayList(n);

The standard implementation is a growable array. This version of the constructor will pre-allocate enough space for n elements.

Example: Remove method m from a MethodMutableList list and append a new element n.
int pos = list.indexOf(m);
if (pos >= 0) {
   list.remove(pos);
}
list.add(n);
Note that the indexOf method will use the equals method of the elements; here, it will look for an object == m.

Generating Formatted Debug Output

The recoder.util.Debug auxiliary class contains methods for error logging and assertion checking.

Conveniently formatted output is produced by recoder.convenience.Format. The Format.toString method takes a customizable format strings, such as "%c \"%N\" @%p in %f", to produce suitable output, such as MethodDeclaration "example.Main.main" @6/78 in example/Main.java. The recoder.convenience.Formats interface provides some predefined format strings.

Appendix: RECODER Services

The following sections briefly describe the available RECODER services.

recoder.ProgramFactory

* Creates all source elements for a language.
* Parses source elements from Readers or Strings.
* Creates pretty printers for a given Writer.

This service provides factory methods creating source elements for a given target language - at the moment this is Java; dialects might be added later. There is one factory method for each public constructor of each element. Further methods allow to parse of complete or partial program fragments from Readers, or directly from strings. This service also creates pretty printers that reproduce the concrete syntax.

recoder.io.SourceFileRepository

* Retrieves compilation units from given locations.
* Retrieves compilation units by logical class name.
* Gets a list of known compilation units.
* Writes back single, or all, or all changed units.

This service delivers syntax trees from data locations, files, or by automatic lookup in the search path obeying the order of the path and the priority of different file types (e.g. class files versus java sources). Usually, the repository is called by an analysis service, but a direct use is also possible, for instance to preload units. The service keeps track of the origins and changes of a compilation unit and is able to write it back to file. Compilation units are cached and will not be parsed twice.

recoder.io.ClassFileRepository

* Retrieves class files from given locations.
* Retrieves class files by logical class name.
* Gets a list of known class files.

This service delivers class files from data locations. Usually, the repository is not called directly, but this is also possible. The service keeps track of the origins of a class file and will cache the results.

recoder.io.ProjectSettings

* Provides project configuration information.
* Imports and exports project files.

This service delivers project configuration information such as the search path for source and class files, and pretty printing styles. It can import and export these properties as standard project files.

recoder.service.NameInfo

* Find packages, types and variable or fields by name.
* Manage predefined types and packages.

This service knows all packages, types and variables/fields by name. In addition, the name info can deliver representations of primitive types and important predefined types and packages. This interfaces comes close to the definition table of a compiler.

recoder.service.SourceInfo

* Analyzes compilation units.
* Provides containment relations.
* Resolves names in a context.
* Computes types of entities.
* Resolves inheritance, overloading, visibilities.

This service can analyze program elements and keeps track of compilation units that have been analyzed by the service. All methods deal with program elements, sometimes in combination with names, and deliver semantic properties, or relations between them. While hidden from the public interface, the service knows about scopes. It can resolve inheritance and containment, find out the meaning of uncollated names in a path and associate program elements with their corresponding semantic entity, types being the most important thereof. The service also relates a program element with its properties, hiding the exact nature of a semantic entity. If this service cannot handle a query because it does not correspond to a syntactic entity, the service delegates to the responsible one.

recoder.service.CrossReferenceSourceInfo

* Gets all known references to ProgramModelElements.
* Resolve all references it encounters automatically.

The cross reference source info is an extension of the standard source info and is part of the CrossReferenceServiceConfiguration. This service collates references automatically and provides queries that deliver all references to these entities under a closed world assumption. This service cannot report references that are not yet part of the model, obviously.

recoder.service.ByteCodeInfo

* Analyzes class files.
* Provides containment relations.
* Computes types of entities.
* Resolves inheritance, overloading, visibilities.

This service corresponds to the SourceInfo, but deals with Java byte code instead. The biggest syntactic unit is a ClassFile in contrast to a CompilationUnit. At the moment, RECODER does not allow to synthesize byte code and does not scan the byte code at instruction level, so many queries of the source level have no relevant counterpart in the byte code info yet.

recoder.service.ImplicitElementInfo

* Manages implicitly defined elements.
* Provides containment relations.

This service corresponds to the SourceInfo, but deals with implicitly defined elements such as array types or packages.

recoder.service.ConstantEvaluator

* Evaluate Java compile-time constant expressions.

This service allows to evaluate expressions that are defined as "compile-time-constant". This is a very conservative version of constants and particularly important for some special cases concerning the type of a conditional operator result (?:).

recoder.service.ChangeHistory

* Queue atomic syntactic changes.
* Propagate change events on demand.
* Manages user-level transformation transactions.

This service keeps track of syntactic changes in a worklist (queue) and can notify services on demand. This function is also invoked when classes are loaded on demand; hence the service is the backbone of the whole system. The ChangeHistory also keeps logical blocks of changes on a stack. These describe user-level transformations and can be committed or rolled back like usual nested transactions.

Appendix: Glossary

Concrete Syntax
The common sequential representation of a program source.
Abstract Syntax (Tree) (AST)
Tree describing the hierarchical structure of program sources; ASTs often omit details such as whitespace or separators.
Program Model
A model of a program. An AST is a syntactic program model.
Metamodel
A model of a model; the metamodel describes entities of a given base model.
Program Analysis
An algorithm that derives information about a program, usually by inspection of its sources.
Semantic Analysis
Program analysis that checks if a program source conforms to the language specification.
Program Transformation
Syntactic change of a program, usually described in terms of the abstract syntax.
Metaprogram
A program changing another program; an implementation of a program transformation using an analysis to drive the transformation. Static metaprograms operate offline, while dynamic metaprograms change programs during runtime.