Streaming architecture

This page show components used for the POC and how they interact.

Architecture schema

Dataflow

  1. Events are consumed by Logstash from BOTES JSON files. Logstash apply some logics and normalize events following ECS (Elastic Common Schema) format.

  2. Events are then pushed/published to Apache Kafka topics, following their type/tags.

  3. Multiple Apache Flink DataStream consume events from Apache Kafka. This allow to apply process logic only when events contain relevant information/field(s) and to avoid waste of processing time and resources.

  4. DataStream are then filtered (For example enrichment is only made on public IP addresses) and then redirected to asynchronous DataStream.

  5. Enrichment logic is applied on asynchonous DataStream :

    1. Redis is queried to check if Key (IP or Hash) already exists.

    2. If Key exists, Redis is queried to get the value which contains information from Shodan, Onyphe or VirusTotal, in JSON.

    3. If Key not exists, asynchronous API requests are launch against Shodan, Onyphe and VirusTotal and result (Value) are stored in Redis.

  6. Once enriched, events are sent to Elasticearch though a Sink, to be indexed.

Last updated