Extending the Entropy-based Software Analysis

In Analyzing structural properties of Java applications, we introduced how to use our analysis tool to evaluate Java applications. However, the underlying metrics are applicable to other sources (models, code, data). We demonstrated this capability by developing model to hypergraph adapters for Ecore metamodels, PCM models, and GECO transformation composition models. In this article we, illustrate how to create your own extension of the analysis tool.


The analysis tooling is based on Eclipse (Neon) and utilizes the Eclipse extension mechanism.  Therefore, it is useful to understand basic principles and concepts of Eclipse extensions, which can be found here.

The source code of the analysis tool can be found at github.

Basic Concepts

The analysis tool comes with a set of metrics performing analyses based on a hypergraph. A hypergraph comprises of nodes and edges, where a node can be connected to none or more edges, and an edge can be connected to multiple node. In addition we can modularize the hypergraph by grouping nodes in modules. Such hypergraph is then called a modular hypergraph. The metamodel for hypergraphs can be found in Hypergraph.ecore.

To connect the hypergraph metrics to any kind of software artifact, a mapping transformation is required. Such transformation queries the source model representing a software artifact and constructs a hypergraph utilizing the query results. The hypergraph is the target model of the transformation. Therefore, before creating an extension, you need a way to query your source model, and an idea how to map the structure of your software artifact to a hypergraph or modular hypergraph.

Creating an Extension

As reference, you may use the Ecore extension for the analysis tool, which can be found here. An extension comprises at least of four files: the transformation, the analysis job, which is used to execute transformation and the metrics, an Activator (which I am not totally sure is necessary anymore), and an analysis job provider.

Analysis Job Provider

The analysis job provider must implement the IAnalysisJobProvider interface which declare two methods:

The file extension is used to define the content type this extension supports. The second method is used to instantiate the Eclipse job used to execute the source model transformation and the analysis.

Analysis Job

The analysis job class must inherit the basic metric calculations from AbstractHypergraphAnalysisJob class, which provides four metrics for hypergraphs and modular hypergraphs:

  • calculateSize to calculate the size of a hypergraph
  • calculateComplexity to calculate the complexity of a hypergraph
  • calculateCoupling to calculate the inter module complexity of a modular hypergraph
  • calculateCohesion to calculate the cohesion of a modular graph. As cohesion is a ratio metric which requires a maximal interconnected graph as basis, it first transforms the hypergraph into a graph representation and applies then its graph cohesion metric.

All three methods take three arguments, as hypergraph, an Eclipse progress monitor, and a reference to the result model. The result model is provided by AnalysisResultModelProvider.INSTANCE.

The run method of an analysis job follows a similar structure, as depicted in the following listing.

It is of course possible to execute the different metrics multiple times, using different hypergraphs. For example, an ECore metamodel could be analyzed completely mapping all classes to nodes and all references to edges, and this could be limited to the containment hierarchy. In such cases, you need multiple transformations providing the different hypergraphs for analysis, and you must execute the metrics for each hypergraph.

Model to Hypergraph Transformation

The core of an extension is the model to hypergraph mapping transformation. It must adhere to the AbstractTransformation class, an extension of the IGenerator interface of GECO. The abstract class defines a general result property result, a general constructor, and the abstract method workEstimate. The latter is used to estimate the execution effort of the transformation before execution, which is used to setup the progress bar.

The core method of the transformation is generate which implements the transformation.

In case of a transformation, which only produces a hypergraph and not a modular hypergraph, the first line must be adjusted accordingly.

To construct a hypergraph, the analysis tooling provides a set of construction methods in a factory called HypergraphCreationFactory. The factory, defines eight operations:

  • createNode(Hypergraph hypergraph, String name, EObject element)
  • createNode(ModularHypergraph hypergraph, Module module, String name, EObject element)
  • createEdge(Hypergraph hypergraph, Node source, Node target, String name, EObject element)
  • createModule(ModularHypergraph hypergraph, String name, EObject element)
  • deriveNode(Node node)
  • deriveEdge(Edge edge)
  • deriveModule(Module module)
  • createUniqueEdge(ModularHypergraph hypergraph, Node source, Node target)

The operations always return the created object, they have in additional side effects. The create-operations add the created object to the specificed hypegraph, and the derive-operations are used when a hypergraph is derived from another. The single operations create a new node, edge, or module and let it refer to the predecessor element.

The parameters have the following implications:

  • hypergraph, always refers to the hypergraph where the node, edge or module belongs to.
  • name refers to the node, module, and edge name.
  • element refers to the object the node, module, and edge represents. For example, a node represents a ECore metamodel class, then this class is the element to be passed to the method.


We illustrated above how a new kind of artifact can be added to the analysis tool, via an Eclipse plug-in. We suggest to use one of the existing implementations as reference beside this post. As we kept the extension mechanism as simple as possible, it is easy to add your own transformation. To get started we provide here some links as reference:

Analyzing Structural Properties of Java Applications

Edward B. Allen [Allen 2002, Allen et al. 2007] created a set of graph and hypergraph-based metrics to evaluate the complexity, coupling and size of programming artifacts. The approach is based on information entropy (see information theory for details). We developed Eclipse tooling to compute these metrics for various ECore metamodels and Java programs. In this post, we describe the application of this tooling to Java programs in Eclipse.


The tooling is available as source  code at github, but we provide also an Eclipse update site: https://build.se.informatik.uni-kiel.de/eus/se/snapshot/

For the installation in Eclipse:

  • Select the menu item Help
  • Choose Install New Software
  • Click the Add… button
  • Add the update site URL
  • Choose the Software Artifact Evaluation and all additional function you want to use. You need at least the Core Framework and Metrics and Java Project Support
  • Press Next
  • Press Finish

Using the Evaluation Tool

We assume you already have a Java project open in your Eclipse IDE. It makes no difference whether this is an J2EE project, Servlet project, plain Java project, Maven project etc.  In this tutorial, we use the org.iobserve.analysis project.

Figure 1: The iObserve analysis projects

Before we can run the analysis, we must define which part of the code should be taken into account for the analysis. In Java everything is a class (except of enumerations and primitive data types). However, Java classes can represent behavior or just data types. Also not all of the Java code might belong to the part of the software which should be analyzed. Therefore, we have to define which classes are used as data types and which represent the observed system.

The two sets of classes are configured via two configuration files. Each file contains a set of Java regular expressions which represent fully qualified class name patterns. Based on these patterns classes are categorized as observed system or data type class. Other classes which are referred to by method calls from the observed system are considered framework classes. They are included, but only with the used interface. That implies, that parts of the framework API which is not used, is not included in the complexity analysis.

Therefore, we define two files in the root directory of the project, called observed-system.cfg and data-type-pattern.cfg (see Figure 2).

Figure 2: Two simple definitions for observed system and data type patterns.
Figure 2: Two simple definitions for observed system and data type patterns.

With both files specified the analysis can be started with:

  • Right click on the org.iobserve.analysis project
  • Choose Java Analysis

During the analysis two views open, called Analysis Result View and Analysis Log View. The result view contains all calculated measurements of the analysis including lines of codes and complexity. In detail the result view shows (see also Figure 3):

  • Size of observed system which is the number of classes belonging to the system
  • Lines of code of the observed system
  • N lines representing an aggregation of the cyclomatic complexity of the methods of the system. The aggregation is based on buckets. Where each bucket represents a certain cyclomatic complexity and the value indicates how many methods have this complexity.
  • The number of modules are the classes, inner classes and anonymous classes which comprise the system and the used framework interfaces.
  • The number of nodes are the number of methods used in these modules
  • The number of edges are call or data edges between these nodes. Call edges represent method invocations while data edges refer to variable access form a method.
  • The hypergraph size represents the information contained in the hypergraph
  • The hypergraph complexity represents the complexity of this hypergraph
  • The inter module coupling represents the complexity of the hypergraph representing only the inter module edges
  • The graph cohesion describes the complexity inside the modules
Figure 3: Analysis Result View
Figure 3: Analysis Result View

The result view provides three additional functions which can be activated by pressing on the little icons in the upper right of the view.

  • The left button allows to export the result in a CSV file
  • The center button allows to export the hypergraph
  • The right button clears the result view (without warning)

The log view (see Figure 4) shows all created modules, nodes and contains warnings and errors according the construction of the Java hypergraph model. Such errors lower the overall complexity and size, as they indicate missing edges or nodes.

Analysis log view
Analysis log view

Visualization of Graphs

The tool comes with tow visualizations for saved hypergraphs and modular hypergraphs. the views require Kieler and Klighd to work properly. Unfortunately, with the migration of parts of Kieler to the Eclipse project, some functionality is no longer available and activating the view requires complicated voodoo. However, in case you have a menu entry in the context menu in the project explorer labeled Modular Hypergraph Diagram, you can select a saved hypergraph file and choose this option from the menu.


When using the analysis tool, always specify the version and revision of the tool you are using. While we try to keep changes compatible with older versions. There might be additions which can result in different measurements.

Java 8 is partially supported. However, as it is relatively new, there might be constructs which are not covered by the tool and will cause an runtime error.

The analysis requires that the Java project compiles in Eclipse, as it uses the Eclipse JDT resolving mechanisms.

In Eclipse Mars, in some cases the analysis did not work properly when the library and plug-in dependencies where listed before the actual source folders.

The last measurement calculating the cohesion may require a lot of time (hours). Most annoying is a long wait time when the process bar is already at 100%. This is a known issue, but in case you want the last measurement, you have to be patient.