This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Concepts

Horreum concepts

1 - Core Concepts

Core Horreum concepts

Teams

A Team is a required top-level organizational and authorization construct.

Folders

A Folder is an optional organizational structure to hold Tests

Schema

A Schema is required by Horreum to define the meta-data associated with a Run

It allows Horreum to process the JSON content to provide validation, charting, and change detection

A Schema defines the following;

  1. An optional expected structure of a Dataset via a JSON Validation Schema
  2. Required Labels that define how to use the data in the JSON document
  3. Optional Transformers, to transform uploaded JSON documents into one or more datasets.

A Schema can apply to an entire Run JSON document, or parts of a Run JSON document

Label

A Label is required to define how metrics are extracted from the JSON document and processed by Horreum.

Labels are defined by one or more required Extractors and an optional Combination Function

There are 2 types of Labels:

  • Metrics Label: Describes a metric to be used for analysis, e.g. “Max Throughput”, “process RSS”, “startup time” etc
  • Filtering Label: Describes a value that can be used for filtering Datasets and ensuring that datasets are comparable, e.g. “Cluster Node Count”, “Product version” etc

A Label can be defined as either a Metrics label, a Filtering label, or both.

Filtering Labels are combined into Fingerprints that uniquely identify comparable Datasets within uploaded Runs.

Extractor

An Extractor is a required JSONPath expression that refers to a section of an uploaded JSON document. An Extractor can return one of:

  • A scalar value
  • An array of values
  • A subsection of the uploaded JSON document

Combination Function:

A Combination Function is an optional JavaScript function that takes all Extractor values as input and produces a Label value. See Function

Test

A Test is a required repository for particular similar runs and datasets

You can think of a test as a repo for the results of a particular benchmark, i.e. a benchmark performs a certain set of actions against a system under test

Test Runs can have different configurations, making them not always directly comparable, but the Run results stored under one Test can be filtered by their Fingerprint to ensure that all Datasets used for analysis are comparable

Run

A Run is a particular single upload instance of a Test

A Run is associated with one or more Schemas in order to define what data to expect, and how to process the JSON document

Transformers

A Transformer is optionally defined on a Schema that applies required Extractors and a required Combination Function to transform a Run into one or more Datasets

Transformers are typically used to;

  • Restructure the JSON document. This is useful where users are processing JSON documents that they do not have control of the structure, and the structure is not well defined
  • Split a Run JSON output into multiple, non-comparable Datasets. This is useful where a benchmark iterates over a configuration and produces a JSON output that contains multiple results for different configurations

A Schema can have 0, 1 or multiple Transformers defined

Dataset

A Dataset is either the entire Run JSON document, or a subset that has been produced by an optional Transformer

It is possible for a Run to include multiple Datasets, and the Transformer(s) defined on a Schema associated with the Run has the job of parsing out the multiple Datasets

Fingerprint

A Fingerprint is combination of Filtering labels that unique identifies comparable datasets within a test

Function

A Function is a JavaScript function that is executed on the server side. Functions can be used for validating expected data formats, substitution and rendering. Also used in Combination Functions to create derived metrics. See Define Functions for more detailed information.

Datasource

A Datasource is a required top-level organizational construct that defines the source of the data to be stored or retrieved by Horreum. Currently, Horreum supports 2 types of Datasource: Postgres and Elasticsearch

Baseline

The initial sample for an Experiment comparison. Configured in an Experiment Profile.

Change Detection Variable

Change detection tracks Schema Labels that have been configured as a Change Detection Variable.

Experiment

This enables running a comparison between Runs for a particular Test. There can be multiple Profiles configured for an Experiment to check for a variety of desired conditions. The outcome status for each Profile condition will be one of the following:

  • SAME
  • WORSE
  • BETTER

Experiment Profile

A Profile consists of:

JSON validation schema

An optional schema added to a Test to validate uploaded Run JSON data.

Report Configuration

In Horreum a Report Configuration is used for compiling a summary of information to be displayed using tables.

Report

A Report is an instance of a Report Configuration. Creation date and time is used for differentiating each Report.

Actions

An Action is the ability by Horreum to send an Event notification to an external system. These are added to a Test.

Global Action

Actions that occur globally in Horreum

Test Action

Actions only for a particular Test

Action allow list

Horreum will only allow generic HTTP requests to domains that have been pre-configured in Horreum by an Administrator.

API Filter Query

Horreum API provides query parameters that are JSONPath paths that filter operation results.

2 - Horreum Terminology

Horreum Terminology

Any document stored in Horreum is called a Run - usually this is the output of a single CI build. Horreum does not need to know much about: it accepts it as a plain JSON document and stores it in the database with very little metadata.

Each Run belongs to a series called a Test. This is where you can define metrics and setup regression testing.

Sometimes it’s practical to execute a job in CI that produces single JSON document and upload it only once, but in fact it contains results for different configurations or several test cases. It might be more convenient to remix the Run into several documents, and this is where Datasets come. Datasets hold another JSON document that is created from the original Run JSON using Transformers, and share some metadata from the original Run. If you don’t need to do anything fancy by default the Run is mapped 1:1 into a Dataset. Horreum builds on the concept of “store data now, analyze later” - therefore rather than forcing users to pre-process the Run before upload you can change the transformation and the datasets are re-created.

If you want to do anything but upload and retrieve data from Horreum, such as customize the UI or run regression testing you need to tell Horreum how to extract data from the JSON document: in case of Horreum this is a combination of jsonpaths1 and Javascript/Typescript code. However it’s impractical to define the JSONPaths directly in the test: when you’re running the test for several months or years it is quite likely that the format of your results will evolve, although the information inside stay consistent. That’s why the data in both Run and Dataset should contain the $schema key:

{
  "$schema": "urn:load-driver:1.0",
  "ci-url": "https://my-ci-instance.example.com/build/123",
  "throughput": 4567.8
}

For each format of your results (in other words, for each URI used in $schema) you can define a Schema in Horreum. This has several purposes:

  • Validation using JSON schema
  • Defines Transformers: Again a set of JSON paths and Javascript function that remix a Run into one or more Datasets.
  • Defines a set of Labels: a combination of one or more JSON paths and Javascript function that extracts certain data from the document. The labels let you reuse the extraction code and process data from datasets using different schemas in a unified way.

You don’t need to use all of these - e.g. it’s perfectly fine to keep the JSON schema empty or use an extremely permissive one.

In our case you could create a schema ‘Load Driver 1.0’ using the URI urn:load-driver:1.0, and a Label throughput that would fetch jsonpath $.throughput. Some time later the format of your JSON changes:

{
  "$schema": "urn:load-driver:2.0",
  "ci-url": "https://my-ci-instance.example.com/build/456",
  "metrics": {
    "requests": 1234,
    "duration": 10,
    "mean-latency": 12.3
  }
}

As the format changed you create schema ‘Load Driver 2.0’ with URI urn:load-driver:2.0 and define another label in that schema, naming it again throughput. This time you would need to extract the data using two jsonpaths $.metrics.requests and $.metrics.duration and a function that would calculate the throughput value. In all places through Horreum you will use only the label name throughput to refer to the jsonpath and you can have a continuous series of results.

You can define a label mean-latency in Load Driver 2.0 that would not have a matching one in Load Driver 1.0. You can use that label without error even for runs that use the older schema, but naturally you won’t receive any data from those.

In other cases you can start aggregating data from multiple tools, each producing the results in its own format. Each run has only single JSON document but you can merge the results into single object:

{
  "load-driver": {
    "$schema": "urn:load-driver:1.0",
    "throughput": 4567.8
  },
  "monitoring": {
    "$schema": "urn:monitoring:1.0",
    "cpu-consumption": "1234s",
    "max-memory": "567MB"
  },
  "ci-url": "https://my-ci-instance.example.com/build/123"
}

Horreum will transparently extract the throughput relatively to the $schema key. Note that this is not supported deeper than on the second level as shown above, though.


  1. Since the jsonpath is evaluated directly in the database we use PostgreSQL jsonpath syntax ↩︎

3 - Users and security

User management and security

It is assumed that the repo will host data for multiple teams; each user is a member of one or more teams. Each run, test or schema is owned by one of the teams. The team corresponds to a Keycloak role (see below) with -team suffix, e.g. engineers-team. In the UI this will be displayed simply as Engineers, dropping the suffix and capitalizing the name.

Data access

We define 3 levels of access to each item (test, run, dataset or schema):

  • public: available even to non-authenticated users (for reading)
  • protected: available to all authenticated users that have the viewer role (see below)
  • private: available only to users who ‘own’ this data - those who have the team role.

In addition to these 3 levels, runs and schemas can have a ’token’ (randomly generated string): everyone who knows this token can read the record. This token is reset any time the restriction level changes.

Tests can have tokens, too: you can have an arbitrary number of tokens, each with a subset of read, modify and upload privileges.

Users and roles

Users and teams are managed in Keycloak. In non-production environment you can reach it on localhost:8180 using credentials admin/secret.

There are few generic roles automatically created during initial realm import.

  • viewer: general permission to view non-public runs
  • uploader: permission to upload new runs, useful for bot accounts (CI)
  • tester: common user that can define tests, modify or delete data.
  • manager: set team members and their roles within the team
  • admin: permission both see and change application-wide configuration such as global actions

Besides the team role itself (e.g. engineers-team) there must be a composite roles for each team combining the team role and permission role: bot account that uploads team’s data will have engineers-uploader which is a composite role, including engineers-team and uploader. This role cannot view team’s private data, it has a write-only access. Users who explore runs, create and modify new tests should have the engineers-tester role; a composite role including engineers-team, tester and viewer. You can also create a role that allows read-only access to team’s private runs, engineers-viewer consisting of engineers-team and viewer.

The admin role is not tied to any of the teams.