An Instrumentation Record Language for Kieker

Instrumentation of software systems is used to monitor systems’ behavior to determine performance properties, the internal state, and also to recover the actual architecture of systems. Today, systems are implemented in different programming languages. Monitoring different parts of the system, require therefore monitoring record structures in the different languages. To perform joined analyses over all components of a system the different records must be compatible. This can be achieved by defining common serialization standards for various languages, as we did for the Kieker Data Bridge, and implement suitable runtime libraries for the different languages. However, the monitoring records must be rewritten for different languages all from scratch.

Furthermore, model-driven software development uses models to describe software systems. Code generators interpret these models and generate source code in the used programming languages. Therefore, it would be beneficial to express monitoring also in a model level.

A solution to these two issues are modeling languages to express instrumentation. At present, we divide instrumentation in monitoring records, a monitoring runtime, and weaving instructions for the monitoring aspect. In this post, we address modeling and code generation for monitoring records and their supplemental serialization functions.

This post first introduces the requirements for such an instrumentation record language (IRL), followed by the language features realizing these requirements. Then briefly the technological context is introduced. Before the grammar and code generation are explained. Finally, a short summary of the results is given.

Requirements

The model-driven instrumentation record language must fulfill the following distinct requirements to be part of a model-driven monitoring approach:

  1. Target language independent description of monitoring data
  2. Mapping of types from other languages to Java types
  3. Code generation for all languages supported by Kieker
  4. Ability to add additional payload to records
  5. Limit the size of the payload, to ensure reasonable response times
  6. Reuse of record definition, especially in context of payload definitions
  7. Fit in model-driven monitoring approach

Language Features

The instrumentation record language must be able to support different target languages, which requires a language independent type systems. The type system must provide the necessary structures and primitive types used in monitoring records. As the analysis side of Kieker is written in Java, primitive types of a target language must be mappable to Java types.

To initialize properties of records with default values, the language must support the definition of constants for defaults.

Kieker has many different record types and in future, with the payload feature, the number of records might increase. Therefore, it is helpful to reuse record definitions, which can easily be realized through a sub-typing feature.

Payload data is formulated in the type system of the target language of the software system. Therefore, the generators must be able to inherit those types and provide serialization for them. This feature can be implemented later, as it requires quite some work on the level of code generator design.

On the modeling level, the language must be able to import the type system of the modeling language. It must also provide a transformation for these type structures to the target language of the project and a transformation to map the types onto Java-types.

In OO-languages a payload can be described by a reference to an object. As objects can have references to other objects, this could result in a large quantity of data to be transmitted every time the probe is executed. To limit the size, references should not be followed implicitly. They must be explicitly specified.

Finally, the language requires code generators for every supported programming language. These code generators should be easy to implement and share model navigation routines. Also they require an API specification and runtime. The runtime is not part of the language itself, but it is necessary to specify the code generator, as it has to generate code utilizing that API.

Development Environment

The IRL is based on the Xtext DSL development framework, which utilizes Eclipse as IDE and EMF as basis for the meta-modeling. Xtext provides a grammar editor and language, which utilizes Antlr to generate actual parsers for editors’ syntax highlighting and the model generation. The framework is bundled with the model to model and model to text language Xtend, which is used to implement the IRL code generators.

Grammar

The grammar if the IRL can be divided in four different parts. First, the generic header of any Xtext language importing terminals and meta-models. Second, a service part to specify package names, import ecore models and other resources. Third, the actual record type definition. And fourth, literals and additional terminals.

Header

The header is not very spectacular. It defines the class name for the parser, specifies the import of the terminals grammar, and imports the ecore package so ecore types can be used in the grammar.

grammar de.cau.cs.se.instrumentation.rl.RecordLang with org.eclipse.xtext.common.Terminals

generate recordLang "http://www.cau.de/cs/se/instrumentation/rl/RecordLang"
import "http://www.eclipse.org/emf/2002/Ecore" as ecore

Service Part

The service part defines the root of the model. Therefore, the first rule is called Model. The model has a name attribute holding the package name, where the model belongs to. The grammar then allows to specify ecore models. In general that should be ecore-based meta-models comprising a type system of some sort. They are used introduce the typing of DSLs to formulate the payload. However, the mechanism behind them might not be perfect and will be subject to change.

The language allows also to import record entities with the import statement. However, normally they should be imported automatically, if they are in the same project or in an associated project.

Model: 'package' name = QualifiedName
       (packages += Package)*
       (imports += Import)*
       (records += RecordType)*
;

Import: 'import' importedNamespace = QualifiedNameWithWildcard ;

Package: 'use' name=ID package=[ecore::EPackage|STRING] ;

In future, the use and import infrastructure must be able to import other language type system and reflect their code generation principles in order to realize arbitrary payloads for any language.

Record Declaration

The record declaration must fulfill different tasks. At present, it must to be able to specify record properties for the monitoring, and default values for these properties. Depending on present source code of record structures in Kieker, it might be necessary to augment these basic declarations. First, there might be automatic values in the record structure, which the language must support, and second, some properties might be mandatory while others or optional. Those optional properties require then a default value for the serialization. At present this is realized through default constants.

TemplateType:
	'template' name=ID (':' inherits+=[TemplateType|QualifiedName] (',' inherits+=[TemplateType|QualifiedName])*)? 
	(
		('{' (properties+=Property | constants+=Constant)* '}') |
		(properties+=Property)
	)?
;

RecordType:
	(abstract?='abstract')? ('struct') name=ID 
	('extends' parent=[RecordType|QualifiedName])?
	(':' inherits+=[TemplateType|QualifiedName]
	('{'
		(properties+=Property | constants+=Constant)*
	'}')?
;

In the grammar each record is represented by a RecordType instance. The language uses the keyword struct, as the word record is already part of the Kieker package names and would then clash with the keyword. A RecordType can have a parent RecordType which it extends with additional properties. Previously defined properties and default constants may not be overwritten. The inheritance pattern is a classic sub-typing pattern. So properties are all visible on the child level. The body of a RecordType consists of Property and Default-declarations. However, at present there also exists a combined method to declare defaults in combination with properties.

Default:
	'default' type=Classifier name=ID '=' value=Literal
;

A normal default constant starts with a default followed by a type. In general all ecore types are allowed. However, the literals are limited to primitive types. Therefore, at present only primitive classifiers are useful. The language defines a set of primitive types, which are programmatically extendable by modifying the enumeration de.cau.cs.se.instrumentation.rl.typing.PrimitiveTypes.

Properties are defined in a similar way. However, they do not have a initial keyword. They are defined by a type followed by a name.

Property: type=Classifier name=ID
             ('{' (properties+=ReferenceProperty)* '}' | 
              '=' value=Literal |
              '=' const=Constant)
;

Constant: name=ID value=Literal ;

After the name three different things can happen. The second option allows to specify a literal, which can be a value of some kind or a previously defined default value. The third option allows to define a default constant and automatically assign it to the property, just like a combination of a default constant declaration and a property using the default constant as literal. The first option addresses payloads.

The type is declared by a reference to a ecore classifier, which can be one of the built in types, or one of the imported structures in a package imported above.

Classifier:
	(package=[Package] '.')? class=[ecore::EClassifier|ID]
;

To specify a payload, the language uses also a property. In that case, the classifier is not a primitive type, but a complex type defined in an imported model. As complex types may imply recursive structures, which could result in referencing the whole runtime state, each required value must be stated explicitly if it is complex. For example, a property may have the type of a Java class, then all properties of that class, which have a primitive type will be serialized and stored in the transmission, however, every property referencing a complex type, will only be represented by an object id. If also the content of such complex type is required, it must be specified as ReferenceProperty between the two braces.

ReferenceProperty:
	ref=[ecore::EStructuralFeature|ID] ('{' (properties+=ReferenceProperty)* '}'
;

The ReferenceProperty allows to recurse down into data structures with nested ReferenceProperty declarations. While this looks complicated for deeply nested structures, it assures that only a minimum of data is retrieved from the systems runtime state and that the data size is limited.

Literals and Terminals

The language supports literals for all supported primitive types and references to default constants.

Literal:
	StringLiteral | IntLiteral | FloatLiteral | BooleanLiteral | DefaultLiteral
;

StringLiteral:
	value=STRING
;

IntLiteral:
	value=INTEGER
;

FloatLiteral:
	value=FLOAT
;

BooleanLiteral: 
	{BooleanLiteral} ((value?='true')|'false')
;

DefaultLiteral:
	value=[Default|ID]
;

To model qualified names and express imports, standard qualified name rules have been added. The literals require signed floating point and integer values, therefore, two suitable terminals complete the language.

QualifiedName:
  ID (=>'.' ID)*;

QualifiedNameWithWildcard:
	QualifiedName ('.' '*')?
;

// terminals
terminal INTEGER returns ecore::EInt: ('+'|'-')? INT;

terminal FLOAT returns ecore::EFloat: ('+'|'-')? (INT '.' INT? | '.' INT);

Code Generation

At present the code generator for the IRL allows to produce Java and C-files representing the monitoring records and provide some serialization functionality for C-based records.

Every code generator must provide a type mapping routine called createTypeName which converts a classifier of a primitive type into a target language primitive type. And all generators must be able to unfold the number of properties, which belong to the monitoring record. This functionality is realized in the RecordLangGenericGenerator class with the compileProperties method.

The detailed structure of the monitoring record the generators have to produce is partially documented for Java in the source code of IMonitoringRecord. The two other example languages C and Perl used records from the KDB test client written in C and the eprints instrumentation written in Perl ab Nis Wechselberg.

Summary

This post introduced the instrumentation record language for the Kieker monitoring framework and the Kieker Data Bridge. At present it is a simple language with sub-typing to build monitoring record structures in three programming languages (C,Java,Perl). It is the first prototype for a generic record language for Kieker used in future releases to ease instrumentation of applications written in different languages. Furthermore, it is a building block in the effort to develop model-driven instrumentation.

The whole language is based on Xtext, the typing mechanism explained in a previous post, which will get its typing rules from XSemantics in the near future. The code generation is based on Xtend and due to a generator API, it is easy to write generators for new languages.

At the moment, the code can be found at:

git clone https://github.com/research-iobserve/instrumentation-language.git

The repository contains also the previous record language in de.cau.cs.se.instrumentation.language, which is deprecated, and an instrumentation application language in de.cau.cs.se.instrumentation.al, which is one of the next step to model-driven monitoring. The tree also contains an old version of the Kieker Data Bridge, which has since been moved to the main Kieker repository.

Reference Representation in Joined Target Models

The previous post discussed three different scenarios for reference representation in target models generated out of one base and one aspect model. The first scenario addresses the weaving of target aspect model fragments into an target base model, calling that a joined target model.

In this article, the prerequisites for creation of a joined target model and the weaving process are explained. The goal is to define APIs and artifact classes required to implement this approach, which can then be tested subsequently.

The article provides first a scenario description with the involved models and transformation. Second, target meta-model characteristics are discussed. Third, the weaving approach is introduced. The weaving process induces some requirements, especially the determination of the insertion point, which are explained in the fourth section. Finally, a summary comprises all relevant information to be able to develop this weaving approach.

Scenario

The scenario involves one source base model (SBM) and one source aspect model (SAM) with different meta-models, called source base meta-model (SBMM) and source aspect meta-model (SAMM). The SBM and SAM are transformed in target base models (TBM) and target aspect models (TAM), respectively, by two code generators TBM: SBMM -> TMM* and TAM: SAMM -> TMMfragments*. The target models are then combined by a weaving process. To be able to weave the aspect model, the TAM must return weavable TM fragments forming subgraphs of a valid target model. The TBM must produce self-contained target models.

Target Meta-Model Characteristics

Target meta-models can be quite diverse in the way they represent language features, and the different languages features required by the languages. First, languages may model behavior with imperative statement sequences, such as C or Java, or use functions, like Lisp or Clojure. Second, sequences can be represented by arrays, lists, or trees.

Statement sequences form together with control structures directed graphs with one entry and many exit points. To add code to such a statement structure, the begin and end of the sequence or graph must be determined. Sometimes the end of a sequence or a branch of the graph is marked by a return statement, which then must be rewritten to add code after the value of the return statement is computed. The detection of entry and exit points highly depend on the way these graphs are realized. Figure 1, shows three typical representations of statement sequences, as they occur in Xtext-based languages.

data-structures-to-represent-statement-sequences

Figure 1: Different realizations of statement sequences in meta-models, especially in the context of Xtext, which often results in list or tree structures.

In functional programming languages, the abstract syntax tree (AST) is much simpler. The insertion is done by adding an aspect function which takes the annotated function as its parameter resulting in one additional node in the AST for the aspect function having one child, the annotated function. In total functional languages form trees similar to the tree in Figure 1.

In order to preserve the behavior of the base model, the aspect function must be able to handle the return value of the annotated function and return the value unchanged to the outer function.

Weaving of Target Models

The weaving process incorporates target aspect model fragments in the target base model. Model weaving has been addressed in different publications and requires a way to identify subgraphs in the target base model which reflect the insertion point and alter or move nodes to integrate the aspect model fragment. After the identification, the alteration and movement must be controlled by a composition specification. The inserted model fragments must structural compatible to the insertion point, otherwise the resulting target model would no longer conform to its meta-model.

The specification to drive the aspect weaving is called a weaving template, which comprises a target base model pattern to identify insertion points, a composition specification, and the relation to a source base model reference.

The target base model pattern solely depend on the target meta-model. In imperative languages or languages with imperative features it might be designed to identify procedures or methods, and subsequently find the begin and end of the statement graphs in the methods’ bodies.

The composition specification describes an insertion mechanism for target aspect model fragments. It specifies the type and configuration of the root node of such fragments. Furthermore, it rewrites the target base model graph and inserts the aspect into the base model.

In imperative languages, statement graphs may end with a return statement, which itself my contain a complex expression. To ensure the execution of the aspect after these expressions, the composition specification must rewrite the return statement, by introducing a variable carrying the result of the return statement, followed by a call of the aspect model, and finally a return statement with the previous variable.

When inserting model fragments, it is important, that the identifier names differ between the base and the aspect models. Under the assumption that the aspect code generator is written independent of the base code generator, the guarantee that names are different cannot be implemented in the different code generator without some considerations. In languages providing package support, aspects can be encapsulated in packages which differ from base model packages, similar to Java frameworks which have their own namespace. While this is suffice for aspect model fragments, it is not for the rewriting of the return statement. The variable can always clash with generated or user defined variables in the target base model.

A solution to this issue can be a special method in Java, which has one parameter for the return statement expression and is inserted as outer function in the return statement. For example, return value*4; is transformed to return Aspect.specialFunctionName(value*4);

Other languages, like C, might not provide namespaces or packages, which hinders the realization of the weaving in the proposed way. However, developers require such distinction mechanism, which they then emulate with rules how names must be specified. Based on these rules code models can be created with the above approach.

Determination of the Insertion Point

The insertion point of the target aspect model fragment into the target base model must be determined prior to the weaving. The insertion point is determined by the reference’s destination of the join point specified in the source aspect model, either by a direct reference or a pointcut, or in a separate aspect join point model.

The references on the source model level between aspect and base model must be represented on the target model level. For the reference source, this is a trivial issue, as the code generator for the aspect model realizes the creation of the target model nodes representing the source model node owning the reference source. The reference destination, however, cannot be computed by the aspect code generator, as the representation of the source base model node in the target model is determined by the base code generator. Therefore, it is unknown to the aspect code generator and might only be determinable after the base code generator completed its work.

A solution to this problem must therefore resolve the references destination in the target model and subsequently create of fix the realization of the reference. This can be realized through collected node creation traces during the base model code generation. The traces express which target model node was created by interpreting a source model node.

As result traces from one target model node to different source model nodes and from multiple target model nodes to one source model node can occur. However, in code generation contexts for DSLs, the source model is more concise than the target model, therefore, it can be assumed that one target model node has only one source model node. But one source model node might have multiple target model nodes (see Fig. 2).

trace-references

Figure 2: References expressing the creation relationship between source and target model node.

Therefore, a selection of the correct target model node based only on the source model node is not possible without additional information. The information can be derived from two sources the target model structure and specialized selection pattern, which are similar to the pattern used to find join points in other AOM approaches [Kienzle 2009, Whittle 2009].

First, all target model nodes having a common source model node form a collection of nodes belonging to a self-contained model, which is in fact a directed graph according to the containment graph. Especially, models based on DSLs resemble a tree like structure similar to an AST. The goal is to identify the root nodes of the subgraphs created from the the source model node. This can be done by calculating the distance between each node and the global root node. While tracing node distances, the algorithm can come across another node of the collection, which is then considered a parent node and the child node is then not relevant to define the top node for the weaving template.

The weaving template comprises of a subgraph pattern which matches the found target base model and which describes the transformation of the base model and the insertion of the aspect model. For example, in an imperative language, the referenced root node is a method definition and the insertion point is in the enclosed statement sequence.

Second, the root node might no be the right insertion point and the deviation cannot be described in the weaving template. Then a specialized selection pattern, including model queries can be used. This is also required to limit the number of found root nodes, if for instance a large set of methods is found in the target model. However, most of them are not relevant in context of the creation. The selection pattern, therefore, depends not only on the target meta-model, but also on the base model code generator.

Summary

In this article, I have shown how in general the weaving process of an target aspect model into a target base model originating in different source models can be achieved and what the obstacles are when joined target models are generated.

The important building blocks of the approach are traces representing information of the origin of target model nodes, insertion point detection pattern, and composition specification governing the weaving process.

The next step for this weaving scenario is the evaluation in a simple example requiring an aspect and a base language, suitable generators and example models for the involved languages. The use case for the evaluation is a smartphone app development project where the app code must include monitoring probes. As languages a simple language for smartphone apps (LAD) and an instrumentation language from the Kieker project will be used. The target language will be Java. Java already has an aspect language called AspectJ and therefore weaving Java code in my own method looks like a pointless endeavor, it has two advantages to use Java.

First, a meta-model for Java is already available reducing the effort to build the evaluation scenario. Second, the other two model reference representation methods mentioned in the previous article can then be evaluated with the same source models increasing comparability of the different approaches.