This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Concepts

Horreum concepts

1: Core Concepts
2: Horreum Terminology
3: Users and security

1 - Core Concepts

Core Horreum concepts

Teams

A Team is a required top-level organizational and authorization construct.

Folders

A Folder is an optional organizational structure to hold Tests

Schema

A Schema is required by Horreum to define the meta-data associated with a Run

It allows Horreum to process the JSON content to provide validation, charting, and change detection

A Schema defines the following;

An optional expected structure of a Dataset via a JSON Validation Schema
Required Labels that define how to use the data in the JSON document
Optional Transformers, to transform uploaded JSON documents into one or more datasets.

A Schema can apply to an entire Run JSON document, or parts of a Run JSON document

Label

A Label is required to define how metrics are extracted from the JSON document and processed by Horreum.

Labels are defined by one or more required Extractors and an optional Combination Function

There are 2 types of Labels:

Metrics Label: Describes a metric to be used for analysis, e.g. “Max Throughput”, “process RSS”, “startup time” etc
Filtering Label: Describes a value that can be used for filtering Datasets and ensuring that datasets are comparable, e.g. “Cluster Node Count”, “Product version” etc

A Label can be defined as either a Metrics label, a Filtering label, or both.

Filtering Labels are combined into Fingerprints that uniquely identify comparable Datasets within uploaded Runs.

Extractor

An Extractor is a required JSONPath expression that refers to a section of an uploaded JSON document. An Extractor can return one of:

A scalar value
An array of values
A subsection of the uploaded JSON document

Note

In the majority of cases, an Extractor will simply point to a single, scalar value

Combination Function:

A Combination Function is an optional JavaScript function that takes all Extractor values as input and produces a Label value. See Function

Note

In the majority of cases, the Combination Function is simply an Identity function with a single input and does not need to be defined

Test

A Test is a required repository for particular similar runs and datasets

You can think of a test as a repo for the results of a particular benchmark, i.e. a benchmark performs a certain set of actions against a system under test

Test Runs can have different configurations, making them not always directly comparable, but the Run results stored under one Test can be filtered by their Fingerprint to ensure that all Datasets used for analysis are comparable

Run

A Run is a particular single upload instance of a Test

A Run is associated with one or more Schemas in order to define what data to expect, and how to process the JSON document

Transformers

A Transformer is optionally defined on a Schema that applies required Extractors and a required Combination Function to transform a Run into one or more Datasets

Transformers are typically used to;

Restructure the JSON document. This is useful where users are processing JSON documents that they do not have control of the structure, and the structure is not well defined
Split a Run JSON output into multiple, non-comparable Datasets. This is useful where a benchmark iterates over a configuration and produces a JSON output that contains multiple results for different configurations

A Schema can have 0, 1 or multiple Transformers defined

Note

In the majority of cases, the Run data does not need to be transformed and there is a one-to-one direct mapping between Run and Dataset. In this instance, an Identity Transformer is used and does not need to be defined by the user

Dataset

A Dataset is either the entire Run JSON document, or a subset that has been produced by an optional Transformer

It is possible for a Run to include multiple Datasets, and the Transformer(s) defined on a Schema associated with the Run has the job of parsing out the multiple Datasets

Note

In most cases, there is a 1:1 relationship between a Run and a Dataset, when the Dataset is expected to have one unified set of results to be analyzed together

Fingerprint

A Fingerprint is combination of Filtering labels that unique identifies comparable datasets within a test

Function

A Function is a JavaScript function that is executed on the server side. Functions can be used for validating expected data formats, substitution and rendering. Also used in Combination Functions to create derived metrics. See Define Functions for more detailed information.

Datasource

A Datasource is a required top-level organizational construct that defines the source of the data to be stored or retrieved by Horreum. Currently, Horreum supports 3 types of Datasource: Postgres, Elasticsearch, and Collector

Baseline

The initial sample for an Experiment comparison. Configured in an Experiment Profile.

Change Detection Variable

Change detection tracks Schema Labels that have been configured as a Change Detection Variable.

Experiment

This enables running a comparison between Runs for a particular Test. There can be multiple Profiles configured for an Experiment to check for a variety of desired conditions. The outcome status for each Profile condition will be one of the following:

SAME
WORSE
BETTER

Experiment Profile

A Profile consists of:

Experiment selector
Baseline
0, 1 or many Comparison conditions

JSON validation schema

An optional schema added to a Test to validate uploaded Run JSON data.

Report Configuration

In Horreum a Report Configuration is used for compiling a summary of information to be displayed using tables.

Report

A Report is an instance of a Report Configuration. Creation date and time is used for differentiating each Report.

Actions

An Action is the ability by Horreum to send an Event notification to an external system. These are added to a Test.

Global Action

Actions that occur globally in Horreum

Test Action

Actions only for a particular Test

Action allow list

Horreum will only allow generic HTTP requests to domains that have been pre-configured in Horreum by an Administrator.

API Filter Query

Horreum API provides query parameters that are JSONPath paths that filter operation results.

2 - Horreum Terminology

Horreum Terminology

Any document stored in Horreum is called a Run - usually this is the output of a single CI build. Horreum does not need to know much about: it accepts it as a plain JSON document and stores it in the database with very little metadata.

Each Run belongs to a series called a Test. This is where you can define metrics and setup regression testing.

Sometimes it’s practical to execute a job in CI that produces single JSON document and upload it only once, but in fact it contains results for different configurations or several test cases. It might be more convenient to remix the Run into several documents, and this is where Datasets come. Datasets hold another JSON document that is created from the original Run JSON using Transformers, and share some metadata from the original Run. If you don’t need to do anything fancy by default the Run is mapped 1:1 into a Dataset. Horreum builds on the concept of “store data now, analyze later” - therefore rather than forcing users to pre-process the Run before upload you can change the transformation and the datasets are re-created.

If you want to do anything but upload and retrieve data from Horreum, such as customize the UI or run regression testing you need to tell Horreum how to extract data from the JSON document: in case of Horreum this is a combination of jsonpaths¹ and Javascript/Typescript code. However it’s impractical to define the JSONPaths directly in the test: when you’re running the test for several months or years it is quite likely that the format of your results will evolve, although the information inside stay consistent. That’s why the data in both Run and Dataset should contain the $schema key:

{
  "$schema": "urn:load-driver:1.0",
  "ci-url": "https://my-ci-instance.example.com/build/123",
  "throughput": 4567.8
}

For each format of your results (in other words, for each URI used in $schema) you can define a Schema in Horreum. This has several purposes:

Validation using JSON schema
Defines Transformers: Again a set of JSON paths and Javascript function that remix a Run into one or more Datasets.
Defines a set of Labels: a combination of one or more JSON paths and Javascript function that extracts certain data from the document. The labels let you reuse the extraction code and process data from datasets using different schemas in a unified way.

You don’t need to use all of these - e.g. it’s perfectly fine to keep the JSON schema empty or use an extremely permissive one.

In our case you could create a schema ‘Load Driver 1.0’ using the URI urn:load-driver:1.0, and a Label throughput that would fetch jsonpath $.throughput. Some time later the format of your JSON changes:

{
  "$schema": "urn:load-driver:2.0",
  "ci-url": "https://my-ci-instance.example.com/build/456",
  "metrics": {
    "requests": 1234,
    "duration": 10,
    "mean-latency": 12.3
  }
}

As the format changed you create schema ‘Load Driver 2.0’ with URI urn:load-driver:2.0 and define another label in that schema, naming it again throughput. This time you would need to extract the data using two jsonpaths $.metrics.requests and $.metrics.duration and a function that would calculate the throughput value. In all places through Horreum you will use only the label name throughput to refer to the jsonpath and you can have a continuous series of results.

You can define a label mean-latency in Load Driver 2.0 that would not have a matching one in Load Driver 1.0. You can use that label without error even for runs that use the older schema, but naturally you won’t receive any data from those.

In other cases you can start aggregating data from multiple tools, each producing the results in its own format. Each run has only single JSON document but you can merge the results into single object:

{
  "load-driver": {
    "$schema": "urn:load-driver:1.0",
    "throughput": 4567.8
  },
  "monitoring": {
    "$schema": "urn:monitoring:1.0",
    "cpu-consumption": "1234s",
    "max-memory": "567MB"
  },
  "ci-url": "https://my-ci-instance.example.com/build/123"
}

Horreum will transparently extract the throughput relatively to the $schema key. Note that this is not supported deeper than on the second level as shown above, though.

Since the jsonpath is evaluated directly in the database we use PostgreSQL jsonpath syntax ↩︎

3 - Users and security

User management and security

It is assumed that the repo will host data for multiple teams; each user is a member of one or more teams. Each run, test or schema is owned by one of the teams. The team corresponds to a Keycloak role (see below) with -team suffix, e.g. engineers-team. In the UI this will be displayed simply as Engineers, dropping the suffix and capitalizing the name.

Data access

We define 3 levels of access to each item (test, run, dataset or schema):

public: available even to non-authenticated users (for reading)
protected: available to all authenticated users that have the viewer role (see below)
private: available only to users who ‘own’ this data - those who have the team role.

Users and roles

There are few generic roles automatically created during initial realm import.

viewer: read permission to view non-public runs
uploader: write permission to upload new runs, useful for bot accounts (CI)
tester: write permission to define tests, modify or delete data.
manager: read/write permission to manage team members and their roles within the team
admin: permission both see and change application-wide configuration such as global actions

The admin role is a system-wide role and is not restricted to a particular teams.

API Keys

Users can generate an API key, that will provide programatic access to the Horreum API with the same authorization permissions as the user who created the API Key.

User authentication

There are three possibilities to authenticate users. Users and roles can be managed in a dedicated Keycloak instance, in Horreum itself, or a mixed mode with both Horreum and an external OpenID provider.

Managed keycloak instance

In this mode users and teams are stored in a Keycloak instance that Horreum can manage. In non-production environment it can be reached it on localhost:8180 using credentials admin/secret.

Besides the team role itself (e.g. engineers-team) there must be a composite roles for each team combining the team role and permission role: bot account that uploads team’s data will have engineers-uploader which is a composite role, including engineers-team and uploader. This role cannot view team’s private data, it has a write-only access. Users who explore runs, create and modify new tests should have the engineers-tester role; a composite role including engineers-team, tester and viewer. You can also create a role that allows read-only access to team’s private runs, engineers-viewer consisting of engineers-team and viewer.

Horreum

It is possible to run Horreum without any external service for authentication. That is enabled with a combination of horreum.roles.provider=database while leaving the horreum.keycloak.url property undefined. This mode relies on HTTP Basic authentication. Users are managed in Horreum’s own database.

OpenID Connect (OIDC) provider

Authentication is handled by an outside OpenID provider. Users and roles are managed in Horreum and therefore the horreum.roles.provider property must be set to database. Users need to be created in Horreum with a username that match the one registererd with the OpenID provider. (It’s assumed the users are already registered with the provider, but they still need to be created in Horreum to define their teams and roles)

This mode requires setting horreum.keycloak.url and quarkus.oidc-client.auth-server-url properties, as well as the client authentication details shared by the provider. For further details on the client configuration see the Quarkus OIDC Reference Documentation on the subject.

Bootstrap account

Horreum configures one admin account when there are none. This allows the initial configuration of Horreum.

This account has horreum.bootstrap as the username and the password is secret in non-production enviroment. In production this account has a generated random password that is shown in the logs.

Once other admin accounts are created the bootstrap account should be removed.

Create User

Administrators can create users in the Administators panel and after that assign them to teams with the appropriate roles.

Team managers can also create new users on the Managed Teams panel on their user profile. The new user will be associated with the selected team.

In both cases, a form needs to be filled with the user details.

Remove User

Only Administrators can remove users. For that, navigate to the Remove Users panel and search for the users to be removed. After that click on the red Remove button and confirm.