Proposal: Kieker Data Bridge and Adaptive Monitoring

The Kieker Data Bridge is a service designed to support Kieker probes for non-JVM languages. Instead of reimplementing the wide range of Kieker data stores, the bridge allows to implement Kieker records and probes for non-JVM languages, like Perl or C, which send their monitoring data to the bridge for conversion and storage. While it is possible to run the KDB and the monitored application on the same machine, we use the term remote site or remote application to refer to the monitored application form the view point of the KDB. Also the KDB is called remote from the view point of the monitored application.

The present design of the Kieker Data Bridge (KDB) is only able to handle incoming Kieker monitoring records and store them in a Kieker data store. However, Kieker has evolved and supports adaptive monitoring based on regular expression pattern for method or function signatures. In our effort to support a wide range of programming and modeling languages, the Kieker bridge has to be extended accordingly.

In this blog post, we discuss this feature and the different implementation ideas. First, we explain the present KDB behavior and the behavior specification of probes. Second, the properties of the adaptation feature are defined, and then used in the third section to discuss solution concepts. Fourth, we define a new behavior for probes and the bridge.

Behavior Model of the KDB and Probes

The Kieker Data Bridge in its current design, is configured on startup by dynamically loading Kieker record classes, configuring a Kieker monitoring controller, and setting up a network connection with one of its network connectors for TCP or JMS. Then it starts listening for incoming records, as show in Figure 1.

Figure 1: KDB and probe behavior illustrated in an activity diagram

Figure 1: KDB and probe behavior illustrated in an activity diagram

Of course there can be more than one probe and each probe can be triggered multiple times. Each time the probe is triggered it sends a record to the KDB, which then processes the data and stores it in a Kieker store, which can be a number of different storage methods, like files, databases or message queues.

While the KDB design can use multiple threads to handle incoming data, the probe can run embedded in normal code and does not require a task switch. The single thread network connectors of the KDB and the JMS connectors, also are single threaded, as they can wait for incoming data and block while listening. This results in low system overhead for the transmission of record data.

Understanding the Adaptive Monitoring Feature

Adaptive monitoring allows to activate and deactivate probes at runtime. Every time, a probe is triggered the system must evaluate if the triggered probe is active and then collect data and store it with Kieker.

The adaptive monitoring in Kieker is based on regular expression patterns, which allow to describe method or function signatures similar to expressions used in AspectJ. In general, if a probe is triggered, it passes that method’s signature as a string to the ProbeController, which checks if the given signature matches to any stored pattern. Successful matches are stored in an internal signature cache to speed up lookup at a later time. The controller returns either true or false, indicating that the probe can proceed with data collection and data storage or must skip those tasks respectively.

These lookups are performed every time a probe is triggered. Therefore, the cost for the evaluation should be minimized as much as possible. In the present implementation of the probes, the method signature is provided to the ProbeController and that is first checked against a signature cache. On cache misses the signature is passed to a pattern matching routine.

In the KDB context, each time a probe is triggered in a remote program, the activation lookup must be performed, prior to the data collection. If this lookup has to use IPC for every call, then the overall response time of the probes would be unacceptable slow. Especially, the presently implemented TCP and JMS connections bring a lot of latency to remote calls.

Another problem arises when patterns for probe activation are changed. In that case a probe activation cache must be invalidated or at least updated. The ProbeController performs such invalidation, but in the KDB context, these changes must be communicated to the remote application, at best before the next probe is triggered. However, the probe design should stay as simple as possible.

Design Concepts for Adaptive Monitoring with KDB

The adaptive monitoring via KDB must solve all the above problems and provide a general approach which is suitable for a wide range of languages and run-time contexts.

In many bi-directional communication scenarios, two threads are used to realize the communication. One thread is there to listen for incoming requests and the other thread is actively processing something and initiates communication when it is necessary. For monitoring probes, this is a complex solution. First, it requires multi-threading, which is not available in all run-time environments, in some contexts it is expensive, and in some contexts threads are hard to handle. Second, it includes synchronizations and could lead to a wide range of race-conditions or other execution artifacts you do not want to handle in monitoring probes, as they would add to the overall execution time. A simpler solution for two way communication is a simple query-response pattern, initiated by the probe, either when the probe is triggered or when the probe is executed.

If the update for the signature cache is requested in the probe trigger routine, it would be called for every probe. That would completely render an application side cache useless, because a request to the KDB has to be made every time and then waited for the reply. In this scenario, it would be simpler to just send the signature and wait for a one byte result containing 0 or 1.

The alternative scenario, would trigger a request for a cache update only when a probe is actually executed. While that would reduce the number of update calls, it has two downsides. First, updates in the KDB signature cache are only promoted on active probes. If the program is not triggering any probe, no updates can be made. Second, every probe execution would delay the overall execution, as the update request must be processed.
In consequence, the application side needs some sort of update thread, dedicated to listening to signature update requests.

We, therefore, propose a classic model with two application-threads and two KDB-threads to implement the two way communication. On application side, the application main thread includes probes, which first check if they are active and if so, execute their code. On cache misses they send the signature to the KDB and request an answer. This operation will also update the signature cache. If the probe activation pattern are changed on the KDB side, KDB will reevaluate the signature cache and send an update list to application side.

Figure 2 illustrates the interaction of one probe in the application main thread with the KDB. The focus of the figure focuses is on the interaction on cache misses. Therefore, the send data event between Send Data to Bridge and Receive Command is omitted.

Figure 2: Behavior model in the adaptive monitoring scenario for a probe and the KDB

Figure 2: Behavior model in the adaptive monitoring scenario for a probe and the KDB

These two activity charts can be embedded into the present concept of the KDB, as the communication is initiated the probe. However, to handle signature cache updates on KDB side, a push mechanism must be modeled as well. The push mechanism is also triggered by an event, which originates from an JMX call or the periodically invoked pattern reader ConfigFileReader inside of the ProbeController.

As the JMX method is more suitable for the KDB, the following proposal focus on that technology for pattern modification. Via JMX, methods of the MonitoringController can be called, especially the methods

  • boolean activateProbe(final String pattern)
  • boolean deactivateProbe(final String pattern)

Both methods take one pattern string and add them to the pattern list by invoking corresponding functions in the ProbeController. The present implementation of the ProbeController then invalidates its signature cache and registers the pattern and the activation state in an internal list.

For the KDB, this is not suffice, as the signature cache on the probe side must be updated as well. Therefore, the ProbeController must be extended to use the ServiceContainer and cause an update of the application signature cache through one of the ServiceContainer method. As stated earlier, this must be done in a separate thread. Fortunately, we already have that thread, as the JMX MBeanServer runs in one. Also, the file update mechanism is a runnable and is executed by a scheduler in its own thread.

Realization of the Adaptive KDB

The previous section explained some of the design issues of the KDB, in this section we propose an implementation of the design within the KDB. This can be accomplished by modifying and extending three places in the current implementation. First, the ServiceContainer requires one additional method, implementing a update push method. Second, the IServiceConnector interface requires a push method signature and its implementation corresponding push method. And third, a the ProbeController must be extended to trigger the ServiceContainer.

For the ServiceContainer and the ServiceConnectors, two implementation options are available. We could use synchronous calls throughout that implementation, which would cause the JMX-thread to be blocked until the update is sent. Or we could use an asynchronous approach where we deliver the updates internally to either one other thread or instantiate a new thread for every update.

The last method is quite simple on the KDB implemenetation side, but would require unique and strictly increasing transaction ids to be able to sort out if updates arrive out of order. This approach would increase the memory footprint on the application side and require extra execution time, as for every update not only the signature must be found, but also the id must be checked. As branches can be very expensive on modern CPUs, we should avoid this.

The second approach would not require such transaction ids, but it would require an internal cache for updates generated by the ProbeController. Furthermore, it does not require the instantiation of a thread for every update request. Its advantage is, that the JMX or ConfigFileReader thread can terminate or go back to listening directly after they calculates the update. Its disadvantages are, that they still require another thread to handle the update transmission, and if updates appear faster than they can be transmitted, the internal buffer would grow. The latter might be a rare case for slow remote connections, but on fast connections the delay of a blocking synchronous approach would be minimal.

Therefore we propose the synchronous approach. Every time the ConfigFileReader is triggered and updates are computed or every time a JMX pattern change happens which causes an update, this event handling thread can directly call the ServiceContainer‘s push method and communicate the updates. After that communication, the thread can terminate or listen for the next event.

In a Java environment, the probes call the ProbeController and hand over the method signature as a string to ask if the probe is active or inactive. This can be slow, but might be necessary in a Java environment. Also, there is only one lookup per call. However, for the signature cache updates there can be multiple signature cache updates at once, which would require a search for all updates. Furthermore, all the signatures must be send over the (Internet) connection.

A more efficient solution is the use of signature ids. Every time a signature is not present in the application cache, the signature is send to the KDB, which answers with a boolean value, indicating the probe status, and a signature id. These ids are generated in the KDB uniquely. In future updates only a boolean and a signature id are suffice to describe the update.

A slightly different approach would generate the id on application side. On a cache miss, a signature is added to the signature cache on application side, and then the signature is send together with the cache slot number to the KDB, which answers with a boolean and the slot number. This last implementation would require no complicated lookup on application side and reduce the load for the price of 4 bytes more traffic towards the KDB for every cache miss. As many method signatures are at least 80 characters long, the payload overhead is approx. 5%. Together with network communication overhead it is negligible.

The last piece to build is a suitable ProbeController. The present ProbeController throws away its signature cache on pattern updates and regenerates the cache. this makes sense in a Java environment, because long running update threads working on common data may result in long probe activation status checks. In the KDB, the lookup is decoupled. Therefore, the update does not need to dump the signature cache. Even more, it should preserve the cache and calculate a delta to the application side. If the cache would be dropped, the application side cache must be dropped too, and in consequence, the application signature cache must be rebuild.

Protocol Specification

We have two communication directions in the adaptive monitoring, which require a proper protocol for communication. This protocol must be able to work in a wide range of transport protocol contexts, which require data serialization.

The first protocol describes communication initiated by a probe. There are two types of communication coming from a probe. Either it is a data record, like in the present KDB, or it is a signature request. Based on the present serialization scheme, the necessary extension can be modeled via the existing class type indicator. The difference is, beside normal Kieker IMonitoringRecord type ids,  command ids are allowed as well. This requires to reserve some ids for KDB commands moving the lowest possible id for IMonitoringRecord types to 32. This allows us to specify 32 different commands, which should be suffice for future extensions.

The record structure for signature requests can be defined as follows:

  1. 4 bytes (int32), defining the data type. In this case the id is 0 for SignatureRequest.
  2. 4 bytes (int32), defining the signature cache position
  3. 4 bytes (int32), defining the length of the signature string
  4. A byte sequence representing the string content

The string notation with the size prefix, allows faster read operations, otherwise we either need to read byte by byte until we read 0, or we read into a buffer and must parse the buffer.

The answer from the KDB to the application probe controller has the following serialization:

  1. 4 bytes (int32), defining the data type (in this case 0)
  2. 4 bytes (int32), defining the signature cache position
  3. 1 byte (int8), representing true or false (0 or 1) indicating the probe is active or inactive.

The same structure is also used for updates triggered by the KDB. These records also have a prefix containing a 4 byte integer representing the answer type id. This is necessary to allow future extensions without breaking code.

For the plain TCP connections, it makes no difference to send one or multiple updates in a chunk, as TCP can be used as a stream. However, in many cases using a block reader can be faster in low level programming languages. In addition, with other technologies, like JMS, it might be advisable to send updates in groups. Therefore, it multiple records must be stored in one message or communication event. To distinguish these two types a second data type id is required.

The format is defined as:

  1. 4 bytes (int32), defining the data type (in this case 1)
  2. 4 bytes (int32), defining the nested type (in this case 0)
  3. n times
    • 4 bytes (int32), defining the signature cache position, where n is the number of consecutive signature cache positions.
    • 1 byte (int8), representing true or false (0 or 1) indicating the probe is active or inactive.

Summary

The presented proposal specifies the generic parts of the KDB extension for adaptive monitoring. It defines the transport serialization format for all technologies based on streams of buffers. Furthermore, it explains which components require a modification. However, it is not a complete design document.