This manual describes the core functional system of RECODER v0.7. Potential users should read this manual in order to understand the architecture of the system.
The RECODER libraries provide you with a powerful platform for all kinds of Java source-to-source transformations. However, the use of such a complex system requires quite some knowledge in the fields of programming languages and compiler technology. You have been warned...
For optimal online reading, activate Cascading Style Sheets and set the width of this frame to about 800 pixels.
| Example |
|---|
| This manual contains small examples illustrating frequently occuring tasks. |
Last changes to this document: Apr 30th 2001
Author: Andreas Ludwig
RECODER is a facility to support static meta programming of Java program sources. The system allows to parse and analyze Java programs, transform the sources and write the results back into source code form.
To do so, RECODER derives a meta model of the entities encountered in
Java source code and class files. This model contains a detailed
syntactic program model that can be unparsed with only minimal
losses. While the syntactic model provides only the containment relation
between elements, the complete model adds further relations, such as type-of,
or refers-to, as well as some implicitly defined elements, such as
packages or primitive types.
One might insist that only the derived entities and relations belong to the meta level of the model, but to simplify we treat the syntactic and derived elements as parts of one model only, which we will call meta model, or program model.
In order to derive this semantic information, RECODER runs a type and name analysis which resolves references to logical entities. The refers-to relation can be made bidirectional for full cross referencing which is necessary for efficient global transformations.
While a use of RECODER for purpose of analyses only builds up the meta model and do not change anything, e.g. software metric tools, static Metaprograms use the meta model information to control transformations of the source code model, which in turn may change further model parts. Metaprogram applications use the RECODER pretty printer to reproduce the source files afterwards. The pretty printer will attempt to retain the code formatting and to integrate new code fragments seamlessly.
Currently, RECODER has several restrictions, none of which is fundamental, but they are important to know:
javadoc. Depending on the formating and the
amount of source documentation, the memory needed is about 7-12 times the
size of the sources.
The core of RECODER is its program model. The model itself can be regarded as residing in a database. RECODER offers a series of service modules that build and update the model automatically. Users will extend the metaprogramming library and access the services as well as the modules. The figure shows how the different subsystems interact and how they influence the model.
|
recoder |
service configurations and base types |
recoder.abstraction |
interface-level semantic model |
recoder.bytecode |
java byte code elements |
recoder.io |
repository and io services |
recoder.java |
java source elements |
recoder.kit |
metaprogramming library |
recoder.service |
program analysis services |
recoder.convenience |
auxiliaries specific to RECODER |
recoder.list |
type-safe lists |
recoder.parser* |
the generated Java parser |
recoder.util* |
auxiliaries not specific to RECODER |
For quick navigation in the API, the table on the right hand side shows the contents of the most important RECODER top-level packages.
This manual does not cover the packages marked with a *. These packages are subject to further documentation, primarily contained in the API documentation.
The following sections describe the core model elements and services. The appendix of this document describes the list data structures and some other important auxiliaries.
The figure shows the base elements of the RECODER meta model, some sample elements and services that deal with the interface-level abstractions in different representations.
The topmost type for semantic model data is recoder.ModelElement
which can represent arbitrary model data such as the majority of syntax
nodes, annotations, or syntactic footprints of design patterns.
recoder.NamedModelElements are model elements that feature a
meaningful name for each instance.
The topmost type for syntactic Java source data is
recoder.java.SourceElement representing any node in a
syntax tree such as comments, which do not necessarily carry semantic
information. Byte code elements are also syntactic. They are subtypes of
recoder.bytecode.ByteCodeElement.
recoder.abstraction.ProgramModelElements are part of the
abstract model and represent entities visible at the interface level
(hence, they are all NamedModelElements).
Currently, all ByteCodeElements covered by RECODER are
ProgramModelElements, but not all ProgramElements
(a Plus operator is not contained in the abstract model).
While the meta model is central to RECODER, the abstract interface-level model is central to the meta model, and will be detailed in the following section.
The core part of the RECODER meta model is located in
recoder.abstraction and primarily consists of entities that
occur in an API documentation: Types, Variables,
Methods, Packages,
with some additional abstractions such as Member or
ClassTypeContainer.
These entities are inherited from ProgramModelElement.
While many ProgramModelElements have a syntactic representations, the
recoder.abstraction package also contains entities that
have no syntactic representation at all, but are implicitly defined.
Examples are ArrayType, DefaultConstructor, or
the aforementioned Package.
The figure shows the elements of this abstract model and their associations.
Program model elements can have different data layouts and hence require different access schemes. RECODER therefore provides a set of modules that handle the accesses and rules for each kind of representation.
Each program model element has an instance of a
ProgramModelInfo service assigned that fits to the
representation of that element. The service knows how to access and interpret
the data layout of those elements. A program model element
provides an interface for certain queries, and most of them will be passed
to the corresponding service, which happens entirely in the background.
Further queries of the ProgramModelInfo deal with type widening,
subtype relations, member visibilities and overloaded methods.
In most cases, querying the program model elements directly will be completely
sufficient for users. However, there are more model elements than subtypes
of program model elements only, which basically cover the interface level.
For instance, the SourceInfo service as an extension of
ProgramModelInfo can compute the Type of any
Expression, which is a syntactic model not contained in the
program model subhierarchy.
The SourceInfo covers most of the model queries that a typical
user will need.
The figure illustrates the different information services and the core
interfaces for each representation. Different representations of
ClassType are shown as an example.
In addition to the services which derive information about model
elements there are repositories for each type of element representation:
The recoder.io.SourceFileRepository can read, store, and write
recoder.java.CompilationUnits, while the
recoder.io.ClassFileRepository can read and store
recoder.bytecode.ClassFiles.
The repositories can be used to find an entry
point to the meta model by means of a physical file name.
Implicit elements such as array types or packages are registered or
can be created at the recoder.service.NameInfo service.
This service also knows all class types by their logical name, and
is similar to the definition table of a common compiler.
The NameInfo is important to collate reference of the
form "a.b.c" and can also be used to find an entry point
to the meta model by means of logical names.
The following table lists all relevant element types, the proper information and repository services.
| Representation | Element Supertype | Dedicated Info Service | Dedicated Repository |
|---|---|---|---|
| (Abstract) | recoder.abstraction. ProgramModelElement |
recoder.service. ProgramModelInfo |
- |
| Source Code | recoder.java. ProgramElement |
recoder.service. SourceInfo |
recoder.io. SourceFileRepository |
| Byte Code | recoder.bytecode. ByteCodeElement |
recoder.service. ByteCodeInfo |
recoder.io. ClassFileRepository |
| Implicitly Defined | recoder.abstraction. NullType, Package, DefaultConstructor, ArrayType |
recoder.service. ImplicitElementInfo |
recoder.service. ImplicitElementInfo, NameInfo |
The following examples illustrate the usage of the services. Some examples use concrete program elements - assume for now that these are available from somewhere (navigation in syntax trees is described later on).
Example:
Find the return type of a Method m
|
|---|
m.getReturnType() will derive the required Type.
Internally, the work is done by the corresponding ProgramModelInfo
service. A method included in a ClassFile will have the return
type in a fully qualified textual representation inside a class file (e.g.
"java.lang.String"), while a method included in a
CompilationUnit might have a short name of a type only
(e.g. "String"). The ByteCodeInfo or
SourceInfo will handle the appropriate extractions.
If |
Example:
Find the containing ClassType of a Method m
|
|---|
As a Member, a Method provides a query
getContainingClassType() that does the job.
If m is a MethodDeclaration, this
query will report a TypeDeclaration.
|
Example:
Find all Methods that are defined in a ClassType c,
including inherited ones
|
|---|
c.getMethods() will only report a list of methods
that are added or redefined in c, but
c.getAllMethods() will include all methods available.
The methods are in a order that corresponds to a tolopological
order of the inheritance graph with leaf c - the methods
of c.getMethods() will come first, the methods of
java.lang.Object will be at the end of the list.
|
Example:
Find the Method that a MethodReference r
refers to.
|
|---|
As r is a ProgramElement, the SourceInfo is
responsible for this: getMethod(r) should correctly
follow the method hiding rules imposed by overloading as well as
inheritance. Note that the result may be a MethodDeclaration,
but could also be a ByteCodeElement.
|
Example:
Get a ClassType with the fully qualified name n
|
|---|
getClassType(n) in the NameInfo will look up
known types with name n and will attempt to load a definition
if this is not successful. This process might trigger a lot of parsing and
analyses in the background.
|
Example:
Check if a Method m is accessible from within
a ClassType c
|
|---|
Any ProgramModelInfo can check visibilities for accesses
from within classes: isVisibleFor(m, c).
|
A less trivial example:
Find out which exceptions of a Method are no
RuntimeExceptions
|
|---|
ClassTypeList getExceptionsToCatch(Method m) {
ClassType rtException =
getNameInfo().getClassType("java.lang.RuntimeException");
ClassTypeList exceptions = m.getExceptions();
if (exceptions == null || exceptions.isEmpty()) {
return ClassTypeList.EMPTY_LIST;
}
ClassTypeMutableList result = new ClassTypeArrayList();
for (int i = exceptions.size() - 1; i >= 0; i -= 1) {
ClassType e = exceptions.getClassType(i);
if (!e.getProgramModelInfo().isSubtype(e, rtException)) {
result.add(e);
}
}
return result;
}
|
While some source code elements occur as program model elements already and are hence part of the abstract model, the syntax trees offer much more detailed ways of traversal and manipulation.
The syntactic model is a syntax forest (a set of syntax trees) and has much more elements than the abstract core which only covers commonalities with the byte code interface information. Users must know how to navigate in and how to manipulate syntax trees, and how to deal with references in particular.
Syntax trees consist of two disjoint types of ProgramElements:
TerminalProgramElements, and
NonTerminalProgramElements. Only the later may have
children, though they do not have to (in fact, each non terminal has
children, but they are not modelled explicitly, such as keywords, commas
or semicolons).
The root of a complete syntax tree is a CompilationUnit which
are managed by the SourceFileRepository service.
Incomplete syntax trees with arbitrary roots may exist temporarily but are not
part of the regular model.
There are three ways to actually create a compilation unit object:
NameInfo service and provide a fully qualified logical
name for the primary class type of a compilation unit. The unit is loaded
in order to create the requested class type and is analyzed automatically.
SourceFileRepository and provide a location or
file name of a compilation unit. The unit is loaded and will be analyzed
on demand.
The service also offers a convenience method that allows to load all
files in a directory recursively, or via a file filter.
ProgramFactory directly.
This is only viable in special situations, as the system is not aware of
the unit.
Syntax elements created in that way should never interfere with
the main project data in order to avoid unwanted duplications. It is
okay, however, to parse syntax tree from template files or likewise
and register the possibly manipulated trees manually, when the
template files are not part of the search path.
If the project has already been set up, the list of known compilation
units is available from the SourceFileRepository.
Example:
Parse a CompilationUnit from a file with name
filename and have it registered.
|
|---|
SourceFileRepository sfr = serviceConfig.getSourceFileRepository();
try {
CompilationUnit u = sfr.getCompilationUnitFromFile(filename);
} catch (ParserException pe) {
// do something?
}
The resulting compilation unit is not parsed again if the given file
has already been parsed before, for instance due to a type query that
lead to an automatic retrieval.
|
Traversal of syntax trees is a common task. RECODER allows to traverse a tree in depth first order and also to walk up the tree following parent links.
Nonterminal syntax nodes contain links to their child nodes. It is possible to access certain children directly, or to just traverse all children in syntactic order, regardless of their role.
Example:
Traverse all children of a NonTerminalProgramElement nt
in syntactic order.
|
|---|
for (int i = 0, s = nt.getChildCount(); i < s; i += 1) {
ProgramElement child = nt.getChildAt(i);
// do something with the child
}
These access methods simulate array accesses.
|
Example:
Access all parameters of a MethodDeclaration m.
|
|---|
ParameterDeclarationList plist = m.getParameters();
if (plist != null) {
// traverse list
}
Note that there is also a second way to traverse the parameters:
As a method declaration is a ParameterContainer, the class
additionally offers a direct access interface via
getParameterDeclarationCount() and
getParameterDeclarationAt(int index).
Both are possible ways to read the parameters, but changes are possible
only via the list versions.
|
Traversing whole (sub)trees can be done by manual implementation of
an appropriate depth or breadth first search using the child traversal
procedure. RECODER already provides convenient iterators for this task:
The recoder.convenience.TreeWalker reports program elements
in depth first order.
Example:
Visit all nodes in a syntax tree with root root in depth first
order.
|
|---|
TreeWalker walker = new TreeWalker(root);
while (walker.next()) {
visit(walker.getProgramElement());
}
The iterator differs slightly from standard implementations as the
proceeding and termination check is combined, and the current element
can be queried multiple times between movements.
|
There are also two specialized implementations of the TreeWalker:
The ForestWalker performs a depth first iteration over a list of
compilation units, and the CustomTreeWalker allows to report
ascending visits and to control recursion.
The RECODER syntax trees feature parent links for efficient upward traversal. This is very convenient as it allows to leave out context parameters of transformations - the context is easily accessible via the parent traversal.
The parent of a program element p is obtained by
p.getASTParent() and is a
NonTerminalProgramElement.
Most program elements feature additional parent queries returning more
specialized parent types; for instance,
MethodDeclaration.getParent() returns a
TypeDeclaration.
A program element may be the parent of different children types. RECODER
introduces additional interfaces when the class type of a program element
alone is not sufficient to describe this role. For instance, a
MethodDeclaration can contain the following child types:
Modifier (as any Declaration),
TypeReference (as a TypeReferenceContainer),
Identifier (as a NamedProgramElement),
ParameterDeclaration (as a ParameterContainer),
Throws (as any MethodDeclaration),
StatementBlock (as a StatementContainer).
StatementContainers
that allow a single StatementBlock only, while others allow
a single Statement, or a list of Statements.
However, the existing interfaces reduce the amount of possible parent types
significantly.
In some cases, there are several specialized parent types reflecting different
roles an element can play. For instance, a MethodReference
may be used as a pure expression (its ExpressionContainer is set),
as a pure statement (its StatementContainer is set), or as a
prefix of an access path (its ReferenceSuffix is set).
Only one specialized parent link may be set (!= null) at a time.
The generic getASTParent method will report the unique parent
link, but will only return it as a NonTerminalProgramElement.
Example:
Find the CompilationUnit u of a ProgramElement p.
|
|---|
NonTerminalProgramElement q = p, r = q.getASTParent()
while (r != null) {
q = r;
r = q.getASTParent();
}
u = (q instanceof CompilationUnit) ? (CompilationUnit)q : null;
If p is part of a complete syntax tree with valid parent links,
only a compilation unit may have a null parent.
Note that this useful function is also already part of the library:
u = recoder.kit.UnitKit.getCompilationUnit(p).
|
Correct treatment of whitespaces (blanks, linefeeds) is important for transformation systems in order to guarantee properly formatted output. Logical formatting of the code such as additional line feeds should be retained to facilitate recognition of transformed code.
Common abstract syntax representations usually ignore "implicit" tokens such as commas and semicolons. Thus, format information between those implicit tokens cannot be detected. RECODER uses a compromise and stores information about one "primary" implicit token per non terminal, e.g. the keyword, or an opening bracket. Further tokens such as commas are not made explicit and whitespaces in between them might be changed by the pretty printer. Fortunately, this will usually result in improved code quality.
Each SourceElement has three types of positional information
(SourceElement.Position) attached: The absolute start position
of the token, the absolute ending position, and a relative position.
While absolute positions are valid after parsing and before any modifications
have occured, relative positions remain stable during transformations.
RECODER 0.6 does not set all relative positions of all implicit
tokens, e.g. dots between names are still left out; end coordinates are
not set at all yet.
The PrettyPrinter can reset absolute positions if the
OVERWRITE_PARSE_POSITIONS property is set.
RECODER 0.6 does not yet implement this behavior correctly.
The pretty printer will set relative positions to proper values if their
position is set to Position.UNDEFINED, which is the case for
all newly created elements. The combination of defined and undefined
relative positions allows to embed new code seamlessly.
The pretty printer also ensures that a minimum relative position is obeyed
in order to produce correct concrete syntax:
e.g. an unmodified declaration might have zero blanks assigned in front of
the type reference; after addition of a modifier, at least one blank will
be inserted.
After cloning or parsing of a code fragment, absolute and relative positions
have been defined. For seamless embedding, these might not be wanted and
can be unset: MiscKit.unindent will do this for a subtree.
Java offers three kinds of comments:
The SingleLineComment //..., the
ordinary multi line Comment /* ... */, and
the special DocComment /** ... */ in front of named
member declarations. RECODER assigns comments to one adjacent program element,
following an assignment heuristics considering empty lines (as sketched in
the table below). Each ProgramElement can contain a list
of comments, which in turn might preceed or follow this element
(Comment.isPrefixed()). The pretty printer knows how to
handle both cases.
###### // comment ###### | ###### // comment ###### | ###### // comment ###### |
| Should belong to the upper code part. | Should belong to the lower code part. | Might belong to the lower code part. |
As all heuristics it may fail, thus associating a comment with the wrong element. This has no visible consequences unless the element is moved away - then, the comment moves with the (wrong) element.
As of RECODER 0.6, missing implicit tokens impose further, technical problems. As ending positions are still missing, comments that follow their logical parent element might not be assigned to this element when there are no tokens to be attached to. For instance, comments in empty statement blocks are assigned to the next visible element rather then the empty block, as the closing brackets are not registered. Ending positions will allow to fix these remaining issues.
Comment update is a major problem and there is little hope to deal with this. However, when restructuring a program, the documentation must be brought up to date anyway. Also, most programmers use sparse comments only...
Future RECODER versions might evaluate DocComments for the
tags @see and @link in order to update
entity names as part of cross reference information. To do so, comments
must become proper ModelElements, including the
package.html files.
For each program model element of the abstract model, there
is at most one syntactic declaration, and arbitrarily
many references to that element.
The source code model provides a hierarchy of
recoder.java.Declarations in the
recoder.java.declaration package
and recoder.java.References in the
recoder.java.reference package
that correspond to the respective abstract model entities.
| Program Model Element | Declaration | Reference | SourceInfo Reference Resolution Method |
|---|---|---|---|
Package |
- |
PackageReference |
getPackage |
Type |
- |
TypeReference |
getType |
ArrayType |
- |
ArrayReference |
- |
ClassType |
TypeDeclaration |
TypeReference |
getType |
Method |
MethodDeclaration |
MethodReference |
getMethod |
Constructor |
ConstructorDeclaration |
ConstructorReference |
getConstructor |
Variable |
VariableSpecification |
VariableReference |
getVariable |
Field |
FieldSpecification |
FieldReference |
getField |
Note that the declarations of variables and fields are not
VariableDeclarations but VariableSpecifications
which are children of the former. This is a consequence of the C-like syntax
that allows to write code like "int i, j, k;".
Accessing the program model element of a declaration could not be easier:
The corresponding source elements already implement the abstract element
interface, so the Method of a MethodDeclaration md
is the object itself: Method m = (Method)md;
There are additional casting methods in the SourceInfo service
for sake of consistency.
Accessing the program model element of a reference is handled by the
SourceInfo service. The service will automatically resolve
inheritance and overloading issues properly.
Note that references such as a.b.c in a parsed tree cannot be
resolved by considering the local context only. For instance, a or
b could be packages, types, or variables. After parsing, RECODER
will insert an UncollatedReferenceQualifier for unknown
references. This reference must be resolved using the SourceInfo
method resolveURQ. In the cross reference configuration, RECODER
will resolve these references automatically when creating cross reference
information.
Finding the declaration fitting to a reference is easy.
The opposite direction is also available: The
CrossReferenceSourceInfo as an extension of
SourceInfo provides all known references to a
given program model element.
Obviously, cross referencing will work for known sources
only and requires a closed world.
The cross referencer will not find references inside class
files, nor will it find references in source files that have
not been touched yet.
Example:
Find all MethodReferences to a Method m
|
|---|
getReferences(m) defined by the
CrossReferenceSourceInfo service returns a list of
MethodReferences that are currently
known.
|
Note that references such as a.b.c in a parsed tree cannot be
resolved by considering the local context only. For instance, a
could denote a package, type, or variable. The parser will insert
UncollatedReferenceQualifiers for these problematic references.
The SourceInfo contains a resolveURQ method to
assign this type of references. Depending on the configuration of RECODER,
all references might be resolved automatically, so URQs only occur in
new syntax trees.
The CrossReferenceSourceInfo service can also deliver known
subtypes of given types using the cross reference information.
RECODER provides a set of services bundled in a service configuration. Configurations ensure a consistent set of service implementations suited for a particular task. For instance, the closed world assumption is necessary for consistent transformation, but is not suited for demand-driven analysis.
The DefaultServiceConfiguration contains only a standard
SourceInfo service without cross reference information and is
suited for pure one-pass analysis purposes only. Uncollated references will
not become resolved automatically, although this is still possible by manual
calls to resolveURQ.
The CrossReferenceServiceConfiguration will automatically
analyse all references in the program model and add the cross reference
information. This is the configuration of choice for transformational
tasks.
The recoder.io.ProjectSettings service is a global repository
for important settings such as
The project settings service can also locate the proper class file archives
containing at least the java.lang system classes, and is
able to load and write property files containing the project information.
Project files contain all properties of the ProjectSettings
in textual form. A typical project file lists all source files:
#RECODER Project File input.path=converter/original output.path=converter/modified units=ConversionPanel.java,DecimalField.java,Unit.java,Converter.java,\ ConverterRangeModel.java,FormattedDocument.java,FollowerRangeModel.java
Project files allow to make project settings persistent in an easy way.
| Example: A small program that returns all supertypes of a given class type. |
|---|
Imports are left out for brevity:
public class Demo {
public static void main(String[] args) {
System.out.println(recoder.convenience.Format.toString("%N",
new recoder.CrossReferenceServiceConfiguration().getNameInfo().
getClassType(args[0]).getAllSupertypes()));
}
}
After compilation, execution of
java Demo java.lang.String
should produce (java.lang.String, java.io.Serializable, java.lang.Comparable, java.lang.Object)
Note that the |
Program transformation applications are the primary intent of RECODER. The architecture attempts to facilitate the task of writing transformations; transformations operate on the abstract syntax of a program and do not have to maintain derived data (e.g. changing the name of a type if its name has changed) or the concrete syntax (e.g. adding a comma if a supertype has been added).
The following sections will show how to modify syntax trees, how to print the results, and finally several sections about the proper use and definition of program transformations.
Concrete syntax of the language is controled by the
ProgramFactory service.
This module provides a lot of factory methods for all source elements
(one method for each constructor available), direct access to the parser,
and a factory method for the pretty printer backend.
The ProgramFactory is also used internally by the
SourceFileRepository.
Building a syntax tree is possible by
deepClone method to clone a node or subtree.
ProgramFactory for
a given text reader or a given string containing the code.
ProgramFactory and
linking them "manually".
These alternatives differ in the following ways:
makeParentRoleValid convenience
method does a complete traversal of the known children and sets the proper
parent links, which is useful if efficiency is not critical.
Also, the Transformation base class provides auxiliary
methods that take on the parent linking.
New CompilationUnit nodes are not added to a common root node
but registered to the system via the ChangeHistory service
as special case of an ordinary transformation report. This protocol is
described later.
Example:
Insert a TypeDeclaration d into a CompilationUnit u.
|
|---|
TypeDeclarationMutableList list = u.getDeclarations();
if (list == null) {
list = new TypeDeclarationArrayList();
u.setDeclarations(list);
}
list.add(d);
d.setParent(u);
In general, consistency of syntax trees is not checked agains the language
rules, so there might be temporarily invalid trees. Therefore, child lists
of certain roles might be undefined.
An alternative to the dedicated call d.setParent(u) which
can differ for other child roles, u.makeParentRoleValid()
could be called which would (re)set the parent links of all children of
u.
|
Removal of subtrees is easy to do: Simply remove a child node from it's
parents list, or set the child attribute to null.
There is no obligation to update the parent link, but transformations
usually will have to give a report on the changes they have performed
in order to obtain a conforming model. We will get back to that soon.
Creating concrete syntax out of an abstract syntax tree is the job of a
PrettyPrinter. Pretty printers are created
byte the ProgramFactory service and are initialized with
the current ProjectSettings.
Top-level printing of compilation units is triggered by the
SourceFileRepository which can print out all units, or
only all changed units, or single units that it knows about.
The destination is a file corresponding to the full path name
of a unit and the output path as defined in the ProjectSettings.
It is not possible to write back subtrees or to other destinations
using this interface.
Each SourceElement also defines a convenient
toSource() method which creates a string with a
dump of the subtree represented by the given root element.
The output has no initial indentation (it is "trimmed").
Example:
Write a Statement s to the console.
|
|---|
ProgramFactory pf = serviceConfig.getProgramFactory(); PrettyPrinter pp = pf.getPrettyPrinter(new PrintWriter(System.out)); s.accept(pp);Usually one would probably prefer the short version for this task: System.out.println(s.toSource())
|
The ChangeHistory service is the central agenda mechanism
of RECODER. It propagates changes of the syntactic model to services
that use this information to update the model elements they maintain.
Changes are reported by the automatic class definition loader
contained in the SourceFileRepository and by transformations.
In terms of design patterns, the change history serves as a mediator
and a subject of an observer pattern.
Currently changes are interpreted by the SourceFileRepository
and the SourceInfo.
RECODER 0.6 does not yet use the change information to full extend; the update
mechanism is not very fine grained. Model updates are currently very expensive
until the report handlers are refined.
The change history only propagates changes on demand.
This allows to bundle change reports which reduces the amount of update
phases. Services that require up to date model information will request a
model update from the change history (via an updateModel() call).
Transformations will not have to do trigger the updates by themselves;
instead, updating service queries will perform this call
transparently.
The change history maintains a queue of change reports for propagations. In parallel, the service maintains a stack of changes that is used for rollbacks.
Usually program transformations will generate small amount of new code. This code obviously is not yet supported by the services. It is therefore important to know if a syntax element is part of the known model or temporarily "invisible":
A program element is visible for the RECODER services if and only if it is either
Obviously, service queries only work for visible elements. To make a new syntax element visible, transformations must report them as "attached" to the change history.
Change reports describe syntactic changes of syntax trees in terms of two atomic transformations: Attachment of a new subtree, and Detachment of a subtree. It is possible to describe any syntactic change using a series of these atomic transformations. Such a series is the syntactic "footprint" of a metaprogram. Transformations must send change reports for every visible change they have performed, before a model update becomes necessary.
In case of a model update, all incoming change reports will be propagated to the respective services who will traverse the reported trees and update their caches.
Change reports are also used for rollbacks. If a rollback is requested, the change history will revert and execute all changes to that point (a detach becomes an attach, and vice versa) and update all data structures accordingly. To allow nested rollbacks, transformations must inform the change history when they begin to make changes. Undo operations are possible until a commit is performed.
Changes may stem from different transformations and may be redundant. For traversal purposes, not all changed trees must be visited - it is sufficient to descent the largest subtrees only. Change reports that deal with subtrees contained in other changed trees will be marked as "minor" and can be safely skipped. This does not apply to undo operations, however.
Example:
Add a new CompilationUnit to the system.
|
|---|
Assume we are within a Transformation subclass and the
new CompilationUnit is u. Then,
attach(u) will create a change report scheduling the
compilation unit as new element. The SourceFileRepository will
recognize it correctly during the next model update - the new unit is
now visible.
|
Transformations in RECODER follow the Command design pattern and are
materialized as objects. This allows explicit management of
transformation, to do rollbacks, and to access intermediate results
from within other objects. The base class of any transformation is
recoder.kit.Transformation.
Transformations know the CrossReferenceServiceConfiguration
they are running in, and offer a lot of convenience functionality. They
provide quick access methods to all services
(such as getSourceInfo()), as well as routines to detach,
replace, and attach nearly all possible combinations of program
elements. These helper methods take over the following steps:
null list in the parent,
a corresponding list is created and attached.
attach versions defined
(such as attach(Else,If)), and all possible ambiguities are
resolved by chosing unambiguous names to avoid overloading pitfalls.
For instance, there are four different methods to attach a
MethodReference to a For loop:
attachAsGuard(Expression,LoopStatement),
attachAsInitializer(LoopInitializer,For),
attachAsUpdate(ExpressionStatement,For,int),
attachAsBody(Statement,LoopStatement)
A high-level transformation is executed in several phases and must follow a proper protocol which will be described in detail when the requirements are clear.
While in general, a transformation can use any service queries, some transformations are syntactic, such as the atomic transformations.
Sometimes, it is useful to modify newly created, invisible syntax trees. Transformations that modify these are also invisible and may be marked as such. Invisible transformations must be syntactic and do not send change reports.
To support invisible, syntactic transformations, the
Transformation class offers versions of detach,
replace, and the attach variants,
which perform the changes but do not send change reports. They are static
methods and have a do prefix: doAttach,
doDetach, doReplace.
These variants are also valuable for small invisible transformation steps as
part of a visible transformation, e.g. to construct a small tree "on the fly"
before attaching the result and thereby making it visible.
Meaningful transformations are usually chains of existing transformations. Unfortunately, composing transformations is not trivial. For total correctness (that is, partial correctness plus termination), dependencies between transformations must be taken into account: each change of the model might invalidate or restrict known results. This influences results of old meta model queries as well as results of syntactic traversals. Syntax trees might have removed, added, or just moved with new identities.
For consistency and efficiency, queries and modifications should be
kept separate as much as possible to avoid unwanted side-effects and
to reduce the number of model updates.
The Transformation framework supports this
by separating the analysis and the transformation phase.
In order to receive all infrastructural benefit, transformations must obey certain protocol rules and should follow some conveniences. A transformation operates in three phases:
analyze method should derive all data necessary to
perform the changes.
ProblemReport must be set (to the report
field) and returned.
NoProblem instance when transform is called.
Callers use the ProblemReport to display warnings
or choose alternative strategies. There are three constants for positive
reports: IDENTITY (the transformation phase will not change
anything), EQUIVALENCE (the resulting program shows the
same functional observable behavior, excluding introspective data),
NO_PROBLEM (no particular guarantees).
transform method performs the syntactic changes.
super.transform().
transform phase, no model update may happen,
that is, no updating query may be performed. It is admissible to perform
syntactic navigation.
Transformation
are used.
clone should
be attached instead, using the deepClone() method.
Documentation of a transformation should inform about the purpose of the transformation (its pragmatics), the syntactic changes, and the semantic consequences: Is the result identical, or does it preserve the observable behavior, or is it just compileable, or is it only parseable, or none thereof?
The kit auxiliary classes support the Transformation classes.
The contain auxiliary queries and factory methods for often needed syntax
trees. Kit classes are roughly grouped by the most important model
element dealt with: UnitKit, TypeKit,
MethodKit, and so on. The MiscKit deals with
general syntax trees.
To give an impression of the contents of the kits, the following table lists a few auxiliaries:
MethodKit.getGetters | Guesses access methods for fields |
UnitKit.getCompilationUnit |
Walks up to the compilation unit of a program element |
MiscKit.contains |
Checks if a program element is contained in a subtree |
MiscKit.unindent |
Removes indentation information from a subtree (useful after partial parsing) |
TypeKit.createTypeReference |
Creates the shortest reference to a type for a given context |
VariableKit.getNewVariableName |
Creates admissible variable names mangling corresponding type names |
The following sections describe auxiliaries that are necessary to work with RECODER.
The RECODER API tries to cover type constraints in detail. As the current
version of Java is lacking generic types, this is hard to do with containers
of unclassified objects. Containers that occur in public signatures
are therefore expanded for concrete subtypes. What would be otherwise declared
as a List of Identifier, or List<Identifier>
becomes a IdentifierList in that approach. In contrast to the
standard Java collection API, the RECODER lists distinguish between mutable
and read-only lists: You can browse through a IdentifierList,
but you can change a IdentifierMutableList only.
Lists of elements that stand in a subtype relation also stand in a subtype
relation: An IdentifierList is a ProgramElementList.
To prevent dangerous manipulations, this is not true for the mutable versions:
An IdentifierMutableList is not a
ProgramElementMutableList.
List elements are addressed by indices rather than iterators or
explicit nodes. This saves storage and provides a slim interface.
The default implementations end with ~ArrayList and use the
common array doubling technique.
If elements must be added and the number is known a-priori (e.g. before
concatenating a set of lists), it is wise to increase the capacity of the
array.
The following examples illustrate the usage of the RECODER lists.
Example:
Traverse a MethodList list.
|
|---|
for (int i = 0, s = list.size(); i < s; i += 1) {
dosomething(list.getMethod(i));
}
The reverse order can be more convenient:
for (int i = list.size() - 1; i >= 0; i -= 1) ...
|
Example:
Create a list of methods for at least n elements.
|
|---|
MethodMutableList list = new MethodArrayList(n);
The standard implementation is a growable array. This version
of the constructor will pre-allocate enough space for |
Example:
Remove method m from a MethodMutableList list and append a new element n.
|
|---|
int pos = list.indexOf(m);
if (pos >= 0) {
list.remove(pos);
}
list.add(n);
Note that the indexOf method will use the equals
method of the elements; here, it will look for an object
== m.
|
The recoder.util.Debug auxiliary class contains methods
for error logging and assertion checking.
Conveniently formatted output is produced by
recoder.convenience.Format. The Format.toString
method takes a customizable format strings, such as
"%c \"%N\" @%p in %f",
to produce suitable output, such as
MethodDeclaration "example.Main.main" @6/78 in example/Main.java.
The recoder.convenience.Formats interface provides some
predefined format strings.
The following sections briefly describe the available RECODER services.
| This service provides factory methods creating source elements for a given target language - at the moment this is Java; dialects might be added later. There is one factory method for each public constructor of each element. Further methods allow to parse of complete or partial program fragments from Readers, or directly from strings. This service also creates pretty printers that reproduce the concrete syntax. |
| This service delivers syntax trees from data locations, files, or by automatic lookup in the search path obeying the order of the path and the priority of different file types (e.g. class files versus java sources). Usually, the repository is called by an analysis service, but a direct use is also possible, for instance to preload units. The service keeps track of the origins and changes of a compilation unit and is able to write it back to file. Compilation units are cached and will not be parsed twice. |
| This service delivers class files from data locations. Usually, the repository is not called directly, but this is also possible. The service keeps track of the origins of a class file and will cache the results. |
| This service delivers project configuration information such as the search path for source and class files, and pretty printing styles. It can import and export these properties as standard project files. |
| This service knows all packages, types and variables/fields by name. In addition, the name info can deliver representations of primitive types and important predefined types and packages. This interfaces comes close to the definition table of a compiler. |
|
This service can analyze program elements and keeps track of compilation units that have been analyzed by the service. All methods deal with program elements, sometimes in combination with names, and deliver semantic properties, or relations between them. While hidden from the public interface, the service knows about scopes. It can resolve inheritance and containment, find out the meaning of uncollated names in a path and associate program elements with their corresponding semantic entity, types being the most important thereof. The service also relates a program element with its properties, hiding the exact nature of a semantic entity. If this service cannot handle a query because it does not correspond to a syntactic entity, the service delegates to the responsible one. |
|
The cross reference source info is an extension of the standard source
info and is part of the |
| This service corresponds to the SourceInfo, but deals with Java byte code instead. The biggest syntactic unit is a ClassFile in contrast to a CompilationUnit. At the moment, RECODER does not allow to synthesize byte code and does not scan the byte code at instruction level, so many queries of the source level have no relevant counterpart in the byte code info yet. |
| This service corresponds to the SourceInfo, but deals with implicitly defined elements such as array types or packages. |
| This service allows to evaluate expressions that are defined as "compile-time-constant". This is a very conservative version of constants and particularly important for some special cases concerning the type of a conditional operator result (?:). |
| This service keeps track of syntactic changes in a worklist (queue) and can notify services on demand. This function is also invoked when classes are loaded on demand; hence the service is the backbone of the whole system. The ChangeHistory also keeps logical blocks of changes on a stack. These describe user-level transformations and can be committed or rolled back like usual nested transactions. |