1 - Core Concepts
Teams
A Team
is a required top-level organizational and authorization construct.
Folders
A Folder
is an optional organizational structure to hold Tests
Schema
A Schema
is required by Horreum to define the meta-data associated with a Run
It allows Horreum
to process the JSON content to provide validation, charting, and change detection
A Schema
defines the following;
- An optional expected structure of a Dataset via a
JSON Validation Schema
- Required
Labels
that define how to use the data in the JSON document - Optional
Transformers
, to transform uploaded JSON documents into one or more datasets.
A Schema
can apply to an entire Run JSON document, or parts of a Run JSON document
Label
A Label
is required to define how metrics are extracted from the JSON document and processed by Horreum.
Labels are defined by one or more required Extractors
and an optional Combination Function
There are 2 types of Labels:
Metrics Label
: Describes a metric to be used for analysis, e.g. “Max Throughput”, “process RSS”, “startup time” etcFiltering Label
: Describes a value that can be used for filteringDatasets
and ensuring that datasets are comparable, e.g. “Cluster Node Count”, “Product version” etc
A Label
can be defined as either a Metrics
label, a Filtering
label, or both.
Filtering Labels
are combined into Fingerprints
that uniquely identify comparable Datasets within uploaded Runs.
Extractor
An Extractor
is a required JSONPath expression that refers to a section of an uploaded JSON document. An Extractor can return one of:
- A scalar value
- An array of values
- A subsection of the uploaded JSON document
Note
In the majority of cases, an Extractor will simply point to a single, scalar valueCombination Function:
A Combination Function is an optional JavaScript function that takes all Extractor values as input and produces a Label value. See Function
Note
In the majority of cases, the Combination Function is simply an Identity function with a single input and does not need to be definedTest
A Test
is a required repository for particular similar runs
and datasets
You can think of a test
as a repo for the results of a particular benchmark, i.e. a benchmark performs a certain set of actions against a system under test
Test
Runs
can have different configurations, making them not always directly comparable, but the Run results stored under one Test can be filtered by their Fingerprint
to ensure that all Datasets
used for analysis are comparable
Run
A Run
is a particular single upload instance of a Test
A Run
is associated with one or more Schemas
in order to define what data to expect, and how to process the JSON document
Transformers
A Transformer
is optionally defined on a Schema
that applies required Extractors
and a required Combination Function
to transform a Run
into one or more Datasets
Transformers
are typically used to;
- Restructure the JSON document. This is useful where users are processing JSON documents that they do not have control of the structure, and the structure is not well defined
- Split a
Run
JSON output into multiple, non-comparableDatasets
. This is useful where a benchmark iterates over a configuration and produces a JSON output that contains multiple results for different configurations
A Schema
can have 0, 1 or multiple Transformers
defined
Note
In the majority of cases, theRun
data does not need to be transformed and there is a one-to-one direct mapping between Run
and Dataset
. In this instance, an Identity Transformer
is used and does not need to be defined by the userDataset
A Dataset
is either the entire Run
JSON document, or a subset that has been produced by an optional Transformer
It is possible for a Run
to include multiple Datasets
, and the Transformer(s)
defined on a Schema
associated with the Run
has the job of parsing out the multiple Datasets
Note
In most cases, there is a 1:1 relationship between aRun
and a Dataset
, when the Dataset
is expected to have one unified set of results to be analyzed togetherFingerprint
A Fingerprint
is combination of Filtering labels
that unique identifies comparable datasets
within a test
Function
A Function
is a JavaScript function that is executed on the server side. Functions can be used for validating expected data formats, substitution and rendering. Also used in Combination Function
s to create derived metrics. See Define Functions for more detailed information.
Datasource
A Datasource
is a required top-level organizational construct that defines the source of the data to be stored or retrieved by Horreum. Currently, Horreum supports 3 types of Datasource
: Postgres, Elasticsearch, and Collector
Baseline
The initial sample for an Experiment
comparison. Configured in an Experiment Profile
.
Change Detection Variable
Change detection tracks Schema
Label
s that have been configured as a Change Detection Variable.
Experiment
This enables running a comparison between Runs
for a particular Test
. There can be multiple Profile
s configured for an Experiment
to check for a variety of desired conditions. The outcome status for each Profile
condition will be one of the following:
- SAME
- WORSE
- BETTER
Experiment Profile
A Profile
consists of:
Experiment
selectorBaseline
- 0, 1 or many Comparison conditions
JSON validation schema
An optional schema added to a Test
to validate uploaded Run
JSON data.
Report Configuration
In Horreum a Report Configuration
is used for compiling a summary of information to be displayed using tables.
Report
A Report
is an instance of a Report Configuration
. Creation date and time is used for differentiating each Report
.
Actions
An Action
is the ability by Horreum to send an Event notification to an external system. These are added to a Test
.
Global Action
Actions that occur globally in Horreum
Test Action
Actions only for a particular Test
Action allow list
Horreum
will only allow generic HTTP requests to domains that have been pre-configured in Horreum by an Administrator.
API Filter Query
Horreum
API provides query parameters that are JSONPath
paths that filter operation results.
2 - Horreum Terminology
Any document stored in Horreum is called a Run - usually this is the output of a single CI build. Horreum does not need to know much about: it accepts it as a plain JSON document and stores it in the database with very little metadata.
Each Run belongs to a series called a Test. This is where you can define metrics and setup regression testing.
Sometimes it’s practical to execute a job in CI that produces single JSON document and upload it only once, but in fact it contains results for different configurations or several test cases. It might be more convenient to remix the Run into several documents, and this is where Datasets come. Datasets hold another JSON document that is created from the original Run JSON using Transformers, and share some metadata from the original Run. If you don’t need to do anything fancy by default the Run is mapped 1:1 into a Dataset. Horreum builds on the concept of “store data now, analyze later” - therefore rather than forcing users to pre-process the Run before upload you can change the transformation and the datasets are re-created.
If you want to do anything but upload and retrieve data from Horreum, such as customize the UI or run regression testing you need to tell Horreum how to extract data from the JSON document: in case of Horreum this is a combination of jsonpaths1 and Javascript/Typescript code. However it’s impractical to define the JSONPaths directly in the test: when you’re running the test for several months or years it is quite likely that the format of your results will evolve, although the information inside stay consistent. That’s why the data in both Run and Dataset should contain the $schema
key:
{
"$schema": "urn:load-driver:1.0",
"ci-url": "https://my-ci-instance.example.com/build/123",
"throughput": 4567.8
}
For each format of your results (in other words, for each URI used in $schema
) you can define a Schema in Horreum. This has several purposes:
- Validation using JSON schema
- Defines Transformers: Again a set of JSON paths and Javascript function that remix a Run into one or more Datasets.
- Defines a set of Labels: a combination of one or more JSON paths and Javascript function that extracts certain data from the document. The labels let you reuse the extraction code and process data from datasets using different schemas in a unified way.
You don’t need to use all of these - e.g. it’s perfectly fine to keep the JSON schema empty or use an extremely permissive one.
In our case you could create a schema ‘Load Driver 1.0’ using the URI urn:load-driver:1.0
, and a Label throughput
that would fetch jsonpath $.throughput
. Some time later the format of your JSON changes:
{
"$schema": "urn:load-driver:2.0",
"ci-url": "https://my-ci-instance.example.com/build/456",
"metrics": {
"requests": 1234,
"duration": 10,
"mean-latency": 12.3
}
}
As the format changed you create schema ‘Load Driver 2.0’ with URI urn:load-driver:2.0
and define another label in that schema, naming it again throughput
. This time you would need to extract the data using two jsonpaths $.metrics.requests
and $.metrics.duration
and a function that would calculate the throughput value. In all places through Horreum you will use only the label name throughput
to refer to the jsonpath and you can have a continuous series of results.
You can define a label mean-latency
in Load Driver 2.0 that would not have a matching one in Load Driver 1.0. You can use that label without error even for runs that use the older schema, but naturally you won’t receive any data from those.
In other cases you can start aggregating data from multiple tools, each producing the results in its own format. Each run has only single JSON document but you can merge the results into single object:
{
"load-driver": {
"$schema": "urn:load-driver:1.0",
"throughput": 4567.8
},
"monitoring": {
"$schema": "urn:monitoring:1.0",
"cpu-consumption": "1234s",
"max-memory": "567MB"
},
"ci-url": "https://my-ci-instance.example.com/build/123"
}
Horreum will transparently extract the throughput relatively to the $schema
key. Note that this is not supported deeper than on the second level as shown above, though.
Since the jsonpath is evaluated directly in the database we use PostgreSQL jsonpath syntax ↩︎
3 - Users and security
It is assumed that the repo will host data for multiple teams; each user is a member of one or more teams.
Each run, test or schema is owned by one of the teams. The team corresponds to a Keycloak role (see below) with -team
suffix, e.g. engineers-team
. In the UI this will be displayed simply as Engineers
, dropping the suffix and capitalizing the name.
Data access
We define 3 levels of access to each item (test, run, dataset or schema):
- public: available even to non-authenticated users (for reading)
- protected: available to all authenticated users that have the
viewer
role (see below) - private: available only to users who ‘own’ this data - those who have the team role.
Users and roles
There are few generic roles automatically created during initial realm import.
viewer
: read permission to view non-public runsuploader
: write permission to upload new runs, useful for bot accounts (CI)tester
: write permission to define tests, modify or delete data.manager
: read/write permission to manage team members and their roles within the teamadmin
: permission both see and change application-wide configuration such as global actions
The admin
role is a system-wide role and is not restricted to a particular teams.
API Keys
Users can generate an API key, that will provide programatic access to the Horreum API with the same authorization permissions as the user who created the API Key.
User authentication
There are three possibilities to authenticate users. Users and roles can be managed in a dedicated Keycloak instance, in Horreum itself, or a mixed mode with both Horreum and an external OpenID provider.
Managed keycloak instance
In this mode users and teams are stored in a Keycloak instance that Horreum can manage. In non-production environment it can be reached it on localhost:8180 using credentials admin
/secret
.
Besides the team role itself (e.g. engineers-team
) there must be a composite roles for each team combining the team role and permission role: bot account that uploads team’s data will have engineers-uploader
which is a composite role, including engineers-team
and uploader
. This role cannot view team’s private data, it has a write-only access.
Users who explore runs, create and modify new tests should have the engineers-tester
role; a composite role including engineers-team
, tester
and viewer
.
You can also create a role that allows read-only access to team’s private runs, engineers-viewer
consisting of engineers-team
and viewer
.
Horreum
It is possible to run Horreum without any external service for authentication. That is enabled with a combination of horreum.roles.provider=database
while leaving the horreum.keycloak.url
property undefined. This mode relies on HTTP Basic authentication. Users are managed in Horreum’s own database.
OpenID Connect (OIDC) provider
Authentication is handled by an outside OpenID provider. Users and roles are managed in Horreum and therefore the horreum.roles.provider
property must be set to database
. Users need to be created in Horreum with a username that match the one registererd with the OpenID provider. (It’s assumed the users are already registered with the provider, but they still need to be created in Horreum to define their teams and roles)
This mode requires setting horreum.keycloak.url
and quarkus.oidc-client.auth-server-url
properties, as well as the client authentication details shared by the provider. For further details on the client configuration see the Quarkus OIDC Reference Documentation on the subject.
Bootstrap account
Horreum configures one admin account when there are none. This allows the initial configuration of Horreum.
This account has horreum.bootstrap
as the username and the password is secret
in non-production enviroment. In production this account has a generated random password that is shown in the logs.
Once other admin accounts are created the bootstrap account should be removed.
Create User
Administrators can create users in the Administators
panel and after that assign them to teams with the appropriate roles.
Team managers can also create new users on the Managed Teams
panel on their user profile. The new user will be associated with the selected team.
In both cases, a form needs to be filled with the user details.
Remove User
Only Administrators can remove users. For that, navigate to the Remove Users
panel and search for the users to be removed. After that click on the red Remove
button and confirm.