Software Engineering 04/08/2021

Introducing Siembol: An Open-Source, Real-Time Security Information and Event Management Tool

Co-authors: 

  • Marian Novotny, Software Developer at G-Research
  • Caterina Rindi, Director of Open-Source Community at G-Research 

Siembol, G-Research’s latest project and ‘new kid on the block’, is loaded with functionality. It is a welcome innovation and a user-friendly alternative to Apache Metron, following the latter’s retirement to ‘the attic’ last year.

At its heart, Siembol is a scalable, advanced security analytics framework based on open-source big data technologies. It normalises, enriches, and alerts on data from a wide variety of sources. Best of all, it allows security teams to respond to attacks before they escalate to ‘incidents’.

A brief history of Siembol

Siembol was developed in-house at G-Research as a security data processing application, forming the core of our Security Data Platform.

The need for a highly efficient, real-time event processing engine led to the use of both Splunk and Apache Metron. However, as neither product satisfied all requirements, the team set to work on creating a new platform – a custom-built platform with specific features that mattered to G-Research.

As early adopters of Apache Metron, G-Research believed in the product and worked hard to adapt it to our needs. Ultimately, by recognising its limitations, the team was able to add missing features and shore up its instabilities, ready to give back to the Metron community.

Sadly, before this work was complete, Metron’s time had already passed. However, as strong advocates of Metron’s core mission, G-Research decided to release their work under a new project name. ‘Siembol’ was born – an effective alternative for the security community, filling the void left by Metron’s move to the Apache Attic.

How Siembol improves upon Apache Metron

So, how far do Siembol’s improvements go, and how do they compare to the functions of its predecessor, Metron?

Siembol has components for alert escalation…

This means that CSIRT security teams can easily create a rule-based alert from a single data source. Alternatively, they can create advanced correlation rules that combine various data sources. And the Siembol UI supports translating Sigma rule specification (generic and open signature format for SIEM alerting) into the Siembol alerting rule.

Siembol has the ability to integrate with other systems…

While the core functionality of Metron was great, users desired more integration with the growing ecosystem of SIEM-related projects. Currently, Siembol integrates easily with other systems such as Jira, TheHive, Cortex, ELK, and LDAP. Beyond this, Siembol’s plugin interface allows a custom integration with other systems used in incident response.

There is even functionality to provide additional alert enrichments, such as ELK searches or LDAP searches, with the option to filter as part of an automatic incident response. The G-Research team is planning to publish a collection of internally used plugins, while providing space for collecting plugins from the Siembol community.

Siembol boasts an advanced parsing framework for building fault-tolerant parsers…

Metron provided an effective way of introducing a powerful parsing framework to the Security Data Platform. However, it was brittle and sensitive to even minor syntactic mistakes. In contrast, Siembol provides a robust, custom-built framework for normalising and extracting fields from logs, supporting chaining of parsers, field extractors and transformation functions. This allows users to:

  • extract JSON, CSV structures, key value pairs, timestamps
  • parse timestamps using standard formatters to an epoch form
  • transform messages by renaming fields, filtering fields, or even the option to filter the whole message

Siembol is also able to support use cases for advanced log ingestion using multiple parsers and a routing logic. Moreover, it can support a generic text parser, syslog, BSD syslog, and NetFlow v9 binary parser.

Siembol has an enhanced enrichment component…

Siembol allows the defining of rules for selecting enrichment logic, joining enrichment tables, and defining how to enrich the processed log with information from user-defined tables.

Siembol’s configurations and rules are defined by a modern Angular web application – Siembol UI – and stored in Git repositories…

All Siembol configurations are stored in JSON format in Git repositories. They are edited by web forms, which speeds up creation and learning time, whilst avoiding mistakes. Furthermore, the Siembol UI supports validation, testing, and creating and evaluating test cases to mitigate configuration errors in a production environment.

Siembol also supports high integrity use cases with protected GitHub main branches for deploying configurations. Going forward, configuration errors for a rule will only affect that specific rule; errors will no longer bring down the entire apparatus, which is a significant improvement over Metron.

Siembol prefers a declarative JSON language rather than a scripting language like Stellar. The G-Research team has found declarative language with testing and validation to be less error-prone and simpler to understand.

Siembol supports OAuth2/OIDC for authentication and authorisation in the Siembol UI…

All Siembol services can have multiple instances with authorisation based on OIDC group membership. This allows for multi-tenancy usage without the need to deploy multiple instances of Siembol. G-Research plans to test and tune OAuth2/OIDC integration with popular identity providers.

Siembol has easy installation for use with prepared Docker images and Helm charts…

Metron’s installation process was arduous and overwhelming; due to its flexible architecture, there were a multitude of ways to set up and configure Metron – all of which could overwhelm a first-time user. While Siembol maintains the flexibility of Metron for advanced users, Siembol has simplified the installation process for those new to the project.

Siembol supports deployment on external Hadoop clusters to ensure high performance. However, G-Research is providing k8s Helm charts for all deployment dependencies in order to test Siembol in development environments.

Use cases of Siembol

SIEM log collection using open-source technologies 

Siembol can be used to centralise both security data collecting and the monitoring of logs from different sources. In the process of collecting and inspecting logs from third party tools, the format of these logs can vary. Therefore, it is important for Siembol to support the normalisation of logs into a standardised format with common fields, such as timestamp. It is often useful to enrich a log with metadata provided by CMDB or other internal systems which are important for building detections. 

For example, data repositories can be enriched by data classification, network devices by a network zone, and username by active directory group, etc. By using Siembol alerting services, CSIRT teams can now use the tool to add detection on top of normalised logs. Alerts triggered from the detections are integrated into incident response and defined and evaluated by the Siembol response service. This allows for the integration of Siembol with systems such as Jira, TheHive, or Cortex. It also provides additional enrichments by searching ELK or doing LDAP queries. 

G-Research uses Siembol to parse, normalise, enrich and detect an enormous number of events – approximately 150k a second. Per day, this adds up to volumes of approximately 15TB of raw data, which is equal to 13 billion events.

Detection tool for the detection of leaks and attacks on infrastructure

Siembol can be used as a tool for detecting attacks or leaks by teams responsible for the system platform. For example, the Big Data team at G-Research is using Siembol to detect leaks and attacks on the Hadoop platform. These detections are then used as another data source within the Siembol SIEM log collection for the CSIRT team handling these incidents.

High-level architecture

Data pipelines

Siembol services:

  • Parsing – normalising logs in messages with one layer of key/value pairs
  • Enrichment – adding useful data to events to assist in detection and investigations
  • Alerting – filtering matching events from an incoming data stream of events based on a configurable rule set. The correlation alerting allows users to group several detections together before raising an alert
  • Response – flexible incident response workflows can be built and triggered in real-time via the highly modular and pluggable framework

Infrastructure dependencies:

  • Kafka – message broker for data pipelines
  • Storm – stream processing framework for services except Siembol response integrated in Kafka streaming
  • GitHub – store of service configurations used in Siembol UI
  • ZooKeeper – synchronisation cache for updating service configurations from Git to services
  • Kubernetes cluster – environment to deploy Siembol UI and related microservices for management and orchestration of Siembol services configurations.
  • Identity provider – identity provider (OAuth2/OIDC) used for Siembol UI, allowing for OIDC groups in managing authorization to services

Architecture of Siembol

 

Join the Siembol conversation

The G-Research team is excited to launch Siembol into the open-source world. Members of the community are warmly invited to attend the following virtual presentations:

G-Research welcomes thoughts and contributions from the community. Visit GitHub Discussions to be part of the conversation.

Stay up to-date with G-Research

Subscribe to our newsletter to receive news & updates

You can click here to read our privacy policy. You can unsubscribe at anytime.