perfino Help

Basic Concepts

perfino collects data of two fundamentally different types: transactions and telemetries. Policies and thresholds are used to detect anomalous conditions while triggers take action if something is out of order.

Transactions

In perfino, you analyze your business processes with transactions. At a technical level, a transaction is simply a method invocation. To measure a transaction, perfino records its timing and constructs a transaction name that describes the business process.

The transaction naming has a significant impact on what you will see in the perfino UI.

It enables you to understand what triggered the transaction.
It groups all business processes with the same transaction name and so determines the granularity that is used to measure business processes.
It can serve as a basis to filter out unwanted operations.

perfino cannot know what your business processes are, so configuring transactions is an important part in setting up an application for monitoring. Some frameworks are high-level by nature, and so perfino can offer them as transaction types that can be configured with a minimal amount of work.

The most common example of a transaction is the invocation of a URL that is handled by your application server. In the default configuration, perfino intercepts the method that handles HTTP calls and constructs a transaction name including the first three segments of the URL. This is an arbitrary naming strategy that is just intended to get started. In your application, only the first segment of the URL may be relevant for the business process, or you may need a particular query parameter in the name.

Also, you will probably not want all URL invocations to become transactions. Many HTTP requests are for static resources, and those are not interesting in terms of business processes. In perfino, you can discard transactions based on the name that would be associated with a transaction. If you generate too many different transaction names, perfino's overload protection is activated.

The following figure shows how different URLs end up as the same transaction based on a transaction naming that

adds the value of the query parameter "action"
adds the fixed text "in shop"
adds the second segment of the URL

Policies

Transactions have associated policies. The policies determine

the acceptable timing for a transaction
the way errors are detected and handled
when to perform method-level sampling

For each violated condition in the policies, you can see transaction details separately in the perfino UI. For example, you can inspect slow transactions or transactions that resulted in an error separately and not cumulated with other regular transactions of the same name.

perfino gets information from the monitored application by instrumenting methods. To keep the overhead low, very few methods are instrumented. To get more detailed information in the case of a very slow transaction, the policy can start method level sampling for a transaction once it is clear that it is taking too long.

With sampling, you get a cumulated call tree and hot spots on the method level that show you where the time is actually spent.

Telemetries

The other fundamental type of data source in perfino is the periodic sampling of scalar values, like heap size or thread count. Each telemetry can be plotted as a time-resolved graph. In perfino, telemetries are often shown as sparklines, without defined axes and with a trailing current value.

There are many standard telemetries in perfino that collect their data from well-known subsystems of the JVM or popular databases and frameworks. In addition, integer values that exposed by an MBean can be monitored by perfino. On a programmatic level, you can use the @Telemetry annotation to define custom telemetries on static methods with a numeric return value.

Thresholds

You will have different expectations with respect to different telemetries. For example, the heap usage often oscillates around a baseline and where a steady increase is a sign of a bug in the application.

Or, the average duration of JDBC statements usually varies with server load and is an indicator for the health of the application.

To detect anomalous conditions, you define thresholds with an optional lower and an optional upper bound. Threshold violations are counted on a per-VM basis or for each VM group. They do not have actions associated with them. Often, you will not want to take any action for single threshold violations, but only for a cascade of such conditions.

Triggers and alerts

Both transactions and telemetries can lead to anomalous conditions: A transaction policy can identify a slow transaction, and a telemetry threshold can be violated.

In order to take action on these conditions, you use triggers. Triggers do not operate on a per-VM level, they process all recursively contained VMs in a VM group. Each VM group in a hierarchy of groups has its separate triggers.

For example, you could define a trigger for all VMs that fires when the number of connected VMs falls below 20. In the same VM hierarchy, you might have a group that only contains database VMs. In that group, you might want a separate trigger that fires when the number of connected database VMs falls below 3.

When a trigger fires, it executes its actions. Actions can start data collection, such as full VM sampling, send emails or create alerts.

Alerts are shown in the dashboard and are the highest level in perfino's pyramid of concepts: