Modernizing Kieker Record (De-)Serialization

Kieker is a monitoring and analysis framework implemented primarily in Java, but is also supports other programming languages on the monitoring side, like C, Perl, C#, or VB. The Java monitoring side can use an extensive set of methods to store measurements in files, queues, or databases. For non JVM languages, the Kieker Data Bridge (KDB) provides access to these different methods via a set of connectors for different programming languages or transfer protocols. The KDB provides connectors accepting binary or textual measurement records via TCP or JMS, which are interpreted and compiled into Kieker IMonitoringRecords. Kieker analysis reader components use a similar approach to deserialize measurements form a data store or message queue. While the two different tools realize similar behavior, they are implemented quite differently. This results in a duplication in code and features, which is an undesirable situation and should be resolved by a common infrastructure. This post discusses the present state in Kieker and sketches different solutions to improve the framework.

Present Kieker Reader Implementation

The present reader implementations differ greatly in the way they receive records. The JMSReader accepts only ObjectMessages of IMonitoringRecords and discards JMS TextMessages, the PipeReader handles in-memory IMonitoringRecords. Only the DbReader and the FSReaders relies on functionality provided by AbstractMonitoringRecord to deserialize records from files or the database.

AbstractMonitoringRecord provides primarily two static methods, which are actually factory methods, to produce an IMonitoringRecord from a given input. Due to the age of Kieker the method supports two different construction models. The first method is createFromArray, which requires a class object representing the type of the record and an array of objects, which are used to initialize the record. The second method is createFromStringArray, which requires a class object for the record type and a String array containing String serializations of all values.

Both methods use a getConstructor call for each deserialization attempt, when utilizing the new IMonitoringRecord.Factory API and check the presence of this interface for every call. This is quite expensive. I wrote a small program to determine the impact of different operations in Java. The program is far from perfect and the results are not calibrated. However, certain operations are that expensive that the results are still valuable.

gitlab@build.se.informatik.uni-kiel.de:rju/reflectioncost.git

Presently, the program test four different Java reflection API based cases. First, it measures the cost of the query for a constructor of a class object and its invocation. Second, it measures only the cost of the invocation of the constructor. Third, the cost for class.isAssignableFrom is measured, which is used in Kieker to check whether a class has a given interface. And fourth, the cost for a change of access rights of a field is determined. The test was executed on a QuadCore Xeon machine, but the system performance is not important to understand the results.

# Case Calls per micro second
1. find and call constructor 2.7543
2. call constructor 27.0715
3. class has interface 326.8133
4. access privileged info 1.0281

It is pretty obvious by this test, that querying for a constructor with getConstructor is very expensive, as the second test shows that calling the constructor is almost 10 times faster than querying and calling.

public static final IMonitoringRecord createFromArray(final Class clazz, final Object[] values) throws MonitoringRecordException {
		try {
			if (IMonitoringRecord.Factory.class.isAssignableFrom(clazz)) {
				// Factory interface present
				final Constructor constructor = clazz.getConstructor(Object[].class);
				return constructor.newInstance((Object) values);
			} else {
				// try ordinary method
				final IMonitoringRecord record = clazz.newInstance();
				record.initFromArray(values);
				return record;
			}
[...]
}

As the above code snippet shows, the present createFromArray call is expensive. Florian and Jan already worked on a caching feature to reduce the constructor queries.

The other very expensive call is a call of java.security.AccessController.doPrivileged which is used in case number 4. It is used to allow access to static constant array property in a monitoring record, which contains a list of all logged data fields in form of class types.

This call is also very expensive and in addition not good practice for Java programs. If the property must be public accessible, for whatever reason, it is better to make it public instead of making it public at runtime. A getter for the TYPES array cannot be used with the present API, as the getter can only be called after object is constructed. The code in AbstractMonitoringRecord.createFromStringArray requires the TYPES array before construction. the getter getValueTypes cannot be defined as static. Therefore, the only direct solution would be a public accessible TYPES array. However, there are other solutions possible based on annotations, which are available in Java 5.

Kieker Data Bridge Deserialization

The Kieker Data Bridge (KDB) provides connectivity to the Kieker infrastructure for non JVM languages. Its execution can be divided in three phases: initialization, record processing, and termination.

The initialization phase processes command line parameters, which steer the configuration. In context of the deserialization the important actions are. Loading jars providing IMonitoringRecord types. The KDB registers all jars at its class loader and subsequently reads a mapping file for IMonitoringRecord types. The mapping file contains entries for each record type in form of an positive integer id followed by an equal sign (=) and the full-qualified class name of a record type. This style is similar to the mapping file for FSDirectoryReaders, which prefix lines with a $. The mapping file is read and for each entry the class loader is asked for the class representation. The result is added to a record map. The record map provided to the KDB connector, which compiles a lookup map. This map contains a reference to the constructor and the types array of a record identified by the record id as key of each entry. After that the connector establishes its service.

The lookup map compilation uses the same routines to determine the property TYPES array and the constructor as the AbstractMonitoringRecord, but it utilizes these costly functions in the initialization phase and not during record deserialization. In the upcoming Kieker 1.8 release, this has been fixed for the TYPES array through an internal cache in typesForClass used in createFromArray, but not in createFromStringArray. Furthermore, the constructor lookup is also expensive and should be cached too in future.

The deserialization happens in the KDB in dedicated routines, which fetch one record from the input stream or the message they process. They first read on integer from the stream of message to find the right constructor and types array. Each property is then read from the stream or message steered by the property types array. The selection of the correct reader is realized through a large if-then-else construct. Different then the processBinaryInputFile routine of the FSDirectoryReader, primitive types and wrapper classes are handled with separate cases, and the equality of types is tested with the equals method rather than the equals operator.

Preliminary measurements showed that type comparison with the equals operator is 1.42% to 2.02% slower than the equals method, determined by the measure.lang.CompareOperatorOrMethodMain program located in the ReflectionCost repository. The measure.lang.IfThenElseMain program compared the different realizations of if-then-else type selections, while the equals operator vs. method used in these realization play only a minor role, it makes a big difference to use the condensed implementation from the file system reader where two types are checked in every if reducing the number of if-then-else structures by half. The file system reader code realized 1.4 to 1.5 operations per micro second, where the other is able to handle 1.6 to 1.68 operations per micro second. This indicates that an or-operation (||) in Java is much more expensive than a jump.

Discussion

All the test program code is available in the ReflectionCost repository. It most likely needs some improvements to be able to really provide stable and valid results. However, the present results advise to perform some changes, or at least, the changes should be considered when discussing further Kieker design improvements.

File readers and KDB connectors share similar code for the deserialization of records. The actual interpretation of the serialized record information is done in the reader, while the AbstractMonitoringRecord class provides the instance mechanism. The code in createFromArray is rather complex for two reasons. First, it presently supports two record implementations, which is not really necessary, as the record types in kieker.common.record all support the new interface. And second, it fetches the class constructor in every call. This could be improved by a cache or better, by changing the method as follows and implement the cache separately.

public static final IMonitoringRecord createFromArray(final Constructor constructor, final Object[] values)
		throws MonitoringRecordException {
	try {
		return constructor.newInstance(values);
	} catch (final SecurityException ex) {
		throw new MonitoringRecordException("Failed to instatiate new monitoring record of type " + constructor.getName(), ex);
	} catch (final IllegalArgumentException ex) {
		throw new MonitoringRecordException("Failed to instatiate new monitoring record of type " + constructor.getName(), ex);
	} catch (final InstantiationException ex) {
		throw new MonitoringRecordException("Failed to instatiate new monitoring record of type " + constructor.getName(), ex);
	} catch (final IllegalAccessException ex) {
		throw new MonitoringRecordException("Failed to instatiate new monitoring record of type " + constructor.getName(), ex);
	} catch (final InvocationTargetException ex) {
		throw new MonitoringRecordException("Failed to instatiate new monitoring record of type " + constructor.getName(), ex);
	}
}

Instead of passing a class type, the method expects an already resolved constructor. Furthermore, the support for ancient Kieker record classes has been removed. For those who have private IMonitoringRecord classes utilizing those old structures, can easily upgrade to the present interface. I implemented this modification is a local branch. It can be pushed if their is an interest for these modifications.

The FSDirectoryReader contains only one type per file. In that case a cache for the types array and the constructor is not required. I also advice against an “behind the scene” cache if not all deserializers require it. Rather, I propose a caching method similar to the one realized in the KDB, whcih could be realized in a factory for IMonitoringRecords, like the present functions for record creation.

In our present code, we have four kinds of record sources: text strings, binary messages, input streams, and byte arrays. I doubt that this will change much in future. I therefore propose to define four deserialization functions in the factory, which can then be used in all the readers and connectors.

The KDB connectors are presently not able to process Kieker Configuration objects. This should be changed in future bringing them closer to a form already provided by the readers. It might not be possible to use the readers API for the connectors, but that is yet to be seen. Maybe we can come up with a common structure usable for both context.

And finally, the present Kieker record factory is also the abstract class for IMonitoringRecords. It might be advisable to move the factory code to a separate class. A lot of basic Java classes implement the factory together with the class, but in many modern frameworks they are implemented in different classes increasing the readability. As IMonitoringRecords are in fact entity classes, it makes more sense to me to implement the factory in a different class.

Summary

This post is a short situation report on different yet similar parts of the Kieker framework, which could use some overhaul. I proposed several changes, which might be useful in future. As some of my conclusions are based on a hacked-together performance evaluation, these results might have to be revised.

An Instrumentation Record Language for Kieker

Instrumentation of software systems is used to monitor systems’ behavior to determine performance properties, the internal state, and also to recover the actual architecture of systems. Today, systems are implemented in different programming languages. Monitoring different parts of the system, require therefore monitoring record structures in the different languages. To perform joined analyses over all components of a system the different records must be compatible. This can be achieved by defining common serialization standards for various languages, as we did for the Kieker Data Bridge, and implement suitable runtime libraries for the different languages. However, the monitoring records must be rewritten for different languages all from scratch.

Furthermore, model-driven software development uses models to describe software systems. Code generators interpret these models and generate source code in the used programming languages. Therefore, it would be beneficial to express monitoring also in a model level.

A solution to these two issues are modeling languages to express instrumentation. At present, we divide instrumentation in monitoring records, a monitoring runtime, and weaving instructions for the monitoring aspect. In this post, we address modeling and code generation for monitoring records and their supplemental serialization functions.

This post first introduces the requirements for such an instrumentation record language (IRL), followed by the language features realizing these requirements. Then briefly the technological context is introduced. Before the grammar and code generation are explained. Finally, a short summary of the results is given.

Requirements

The model-driven instrumentation record language must fulfill the following distinct requirements to be part of a model-driven monitoring approach:

  1. Target language independent description of monitoring data
  2. Mapping of types from other languages to Java types
  3. Code generation for all languages supported by Kieker
  4. Ability to add additional payload to records
  5. Limit the size of the payload, to ensure reasonable response times
  6. Reuse of record definition, especially in context of payload definitions
  7. Fit in model-driven monitoring approach

Language Features

The instrumentation record language must be able to support different target languages, which requires a language independent type systems. The type system must provide the necessary structures and primitive types used in monitoring records. As the analysis side of Kieker is written in Java, primitive types of a target language must be mappable to Java types.

To initialize properties of records with default values, the language must support the definition of constants for defaults.

Kieker has many different record types and in future, with the payload feature, the number of records might increase. Therefore, it is helpful to reuse record definitions, which can easily be realized through a sub-typing feature.

Payload data is formulated in the type system of the target language of the software system. Therefore, the generators must be able to inherit those types and provide serialization for them. This feature can be implemented later, as it requires quite some work on the level of code generator design.

On the modeling level, the language must be able to import the type system of the modeling language. It must also provide a transformation for these type structures to the target language of the project and a transformation to map the types onto Java-types.

In OO-languages a payload can be described by a reference to an object. As objects can have references to other objects, this could result in a large quantity of data to be transmitted every time the probe is executed. To limit the size, references should not be followed implicitly. They must be explicitly specified.

Finally, the language requires code generators for every supported programming language. These code generators should be easy to implement and share model navigation routines. Also they require an API specification and runtime. The runtime is not part of the language itself, but it is necessary to specify the code generator, as it has to generate code utilizing that API.

Development Environment

The IRL is based on the Xtext DSL development framework, which utilizes Eclipse as IDE and EMF as basis for the meta-modeling. Xtext provides a grammar editor and language, which utilizes Antlr to generate actual parsers for editors’ syntax highlighting and the model generation. The framework is bundled with the model to model and model to text language Xtend, which is used to implement the IRL code generators.

Grammar

The grammar if the IRL can be divided in four different parts. First, the generic header of any Xtext language importing terminals and meta-models. Second, a service part to specify package names, import ecore models and other resources. Third, the actual record type definition. And fourth, literals and additional terminals.

Header

The header is not very spectacular. It defines the class name for the parser, specifies the import of the terminals grammar, and imports the ecore package so ecore types can be used in the grammar.

grammar de.cau.cs.se.instrumentation.rl.RecordLang with org.eclipse.xtext.common.Terminals

generate recordLang "http://www.cau.de/cs/se/instrumentation/rl/RecordLang"
import "http://www.eclipse.org/emf/2002/Ecore" as ecore

Service Part

The service part defines the root of the model. Therefore, the first rule is called Model. The model has a name attribute holding the package name, where the model belongs to. The grammar then allows to specify ecore models. In general that should be ecore-based meta-models comprising a type system of some sort. They are used introduce the typing of DSLs to formulate the payload. However, the mechanism behind them might not be perfect and will be subject to change.

The language allows also to import record entities with the import statement. However, normally they should be imported automatically, if they are in the same project or in an associated project.

Model: 'package' name = QualifiedName
       (packages += Package)*
       (imports += Import)*
       (records += RecordType)*
;

Import: 'import' importedNamespace = QualifiedNameWithWildcard ;

Package: 'use' name=ID package=[ecore::EPackage|STRING] ;

In future, the use and import infrastructure must be able to import other language type system and reflect their code generation principles in order to realize arbitrary payloads for any language.

Record Declaration

The record declaration must fulfill different tasks. At present, it must to be able to specify record properties for the monitoring, and default values for these properties. Depending on present source code of record structures in Kieker, it might be necessary to augment these basic declarations. First, there might be automatic values in the record structure, which the language must support, and second, some properties might be mandatory while others or optional. Those optional properties require then a default value for the serialization. At present this is realized through default constants.

TemplateType:
	'template' name=ID (':' inherits+=[TemplateType|QualifiedName] (',' inherits+=[TemplateType|QualifiedName])*)? 
	(
		('{' (properties+=Property | constants+=Constant)* '}') |
		(properties+=Property)
	)?
;

RecordType:
	(abstract?='abstract')? ('struct') name=ID 
	('extends' parent=[RecordType|QualifiedName])?
	(':' inherits+=[TemplateType|QualifiedName]
	('{'
		(properties+=Property | constants+=Constant)*
	'}')?
;

In the grammar each record is represented by a RecordType instance. The language uses the keyword struct, as the word record is already part of the Kieker package names and would then clash with the keyword. A RecordType can have a parent RecordType which it extends with additional properties. Previously defined properties and default constants may not be overwritten. The inheritance pattern is a classic sub-typing pattern. So properties are all visible on the child level. The body of a RecordType consists of Property and Default-declarations. However, at present there also exists a combined method to declare defaults in combination with properties.

Default:
	'default' type=Classifier name=ID '=' value=Literal
;

A normal default constant starts with a default followed by a type. In general all ecore types are allowed. However, the literals are limited to primitive types. Therefore, at present only primitive classifiers are useful. The language defines a set of primitive types, which are programmatically extendable by modifying the enumeration de.cau.cs.se.instrumentation.rl.typing.PrimitiveTypes.

Properties are defined in a similar way. However, they do not have a initial keyword. They are defined by a type followed by a name.

Property: type=Classifier name=ID
             ('{' (properties+=ReferenceProperty)* '}' | 
              '=' value=Literal |
              '=' const=Constant)
;

Constant: name=ID value=Literal ;

After the name three different things can happen. The second option allows to specify a literal, which can be a value of some kind or a previously defined default value. The third option allows to define a default constant and automatically assign it to the property, just like a combination of a default constant declaration and a property using the default constant as literal. The first option addresses payloads.

The type is declared by a reference to a ecore classifier, which can be one of the built in types, or one of the imported structures in a package imported above.

Classifier:
	(package=[Package] '.')? class=[ecore::EClassifier|ID]
;

To specify a payload, the language uses also a property. In that case, the classifier is not a primitive type, but a complex type defined in an imported model. As complex types may imply recursive structures, which could result in referencing the whole runtime state, each required value must be stated explicitly if it is complex. For example, a property may have the type of a Java class, then all properties of that class, which have a primitive type will be serialized and stored in the transmission, however, every property referencing a complex type, will only be represented by an object id. If also the content of such complex type is required, it must be specified as ReferenceProperty between the two braces.

ReferenceProperty:
	ref=[ecore::EStructuralFeature|ID] ('{' (properties+=ReferenceProperty)* '}'
;

The ReferenceProperty allows to recurse down into data structures with nested ReferenceProperty declarations. While this looks complicated for deeply nested structures, it assures that only a minimum of data is retrieved from the systems runtime state and that the data size is limited.

Literals and Terminals

The language supports literals for all supported primitive types and references to default constants.

Literal:
	StringLiteral | IntLiteral | FloatLiteral | BooleanLiteral | DefaultLiteral
;

StringLiteral:
	value=STRING
;

IntLiteral:
	value=INTEGER
;

FloatLiteral:
	value=FLOAT
;

BooleanLiteral: 
	{BooleanLiteral} ((value?='true')|'false')
;

DefaultLiteral:
	value=[Default|ID]
;

To model qualified names and express imports, standard qualified name rules have been added. The literals require signed floating point and integer values, therefore, two suitable terminals complete the language.

QualifiedName:
  ID (=>'.' ID)*;

QualifiedNameWithWildcard:
	QualifiedName ('.' '*')?
;

// terminals
terminal INTEGER returns ecore::EInt: ('+'|'-')? INT;

terminal FLOAT returns ecore::EFloat: ('+'|'-')? (INT '.' INT? | '.' INT);

Code Generation

At present the code generator for the IRL allows to produce Java and C-files representing the monitoring records and provide some serialization functionality for C-based records.

Every code generator must provide a type mapping routine called createTypeName which converts a classifier of a primitive type into a target language primitive type. And all generators must be able to unfold the number of properties, which belong to the monitoring record. This functionality is realized in the RecordLangGenericGenerator class with the compileProperties method.

The detailed structure of the monitoring record the generators have to produce is partially documented for Java in the source code of IMonitoringRecord. The two other example languages C and Perl used records from the KDB test client written in C and the eprints instrumentation written in Perl ab Nis Wechselberg.

Summary

This post introduced the instrumentation record language for the Kieker monitoring framework and the Kieker Data Bridge. At present it is a simple language with sub-typing to build monitoring record structures in three programming languages (C,Java,Perl). It is the first prototype for a generic record language for Kieker used in future releases to ease instrumentation of applications written in different languages. Furthermore, it is a building block in the effort to develop model-driven instrumentation.

The whole language is based on Xtext, the typing mechanism explained in a previous post, which will get its typing rules from XSemantics in the near future. The code generation is based on Xtend and due to a generator API, it is easy to write generators for new languages.

At the moment, the code can be found at:

git clone https://github.com/research-iobserve/instrumentation-language.git

The repository contains also the previous record language in de.cau.cs.se.instrumentation.language, which is deprecated, and an instrumentation application language in de.cau.cs.se.instrumentation.al, which is one of the next step to model-driven monitoring. The tree also contains an old version of the Kieker Data Bridge, which has since been moved to the main Kieker repository.