SIEM : Data enrichment

This page describe what is data enrichment and which type of enrichment will be used for this POC.

Security Events enrichment

Enrichment is the process of adding data about data. Information about data themselves or about context they belong to can be added.

Security events can be enriched from internal or external sources and sources can be various things like databases, query tools, etc..

Internal sources

Internal sources can be, for example :

DNS entries: often events contain only IP addresses and no information about hostname, even for internal hosts. Events can be enrich with hostname with DNS query and use of PTR records. That especially usefull when DHCP is configured to update PTR records.
Assets database: An internal assets database can be queried to obtain information about an host/ip address, like location, os type & version, network information, and so on...
Classification database: This database can be link to the assets database but contains information about assets classification. An asset can be classified as "Critical", "Medium" or "Low" and rules can be developed following the importance ranking. An asset can also be classified as "Scanner" or "Red Team" to avoid false positive or at lease loose time on investigation of events coming from automatic vulnerability scanner.

External sources

External sources can be, for example :

Threat lists: they can be queried to check if an host or IP address is tag/categorized as C2, Botnet, Dropper, .. server.
GeoIP database: GeoIP give information about Public IP addresses, like location, ASN, Organization Name, City/Country code and name, etc.
File hash lists: Sometimes tools like Endpoint Security software or IDS/IPS generate and log file hashes. These hashes can be compared with files hashes databases to check if a file is already categorized as malicious.
Search engine: search engines specialized in connected devices to internet (routers, servers, webcam, fridges, ...) are existing. They can be queried to obtain more information about hosts/IP addresses like open network ports, os type/version, banner, etc.

BOTES enrichment POC

The purpose of this POC is to show how it's possible to enrich security events in real-time with streaming and processing application.

For that, here is components that will be used for the underlying infrastructure :

Logstash: used to consume events from BOTES JSON files then normalize them in Elastic Common Schema and send them to Apache Kafka.
Apache Kafka: is a pub/sub messaging system that can handle huge amount of data. Kafka is organized around "topics". For example events from a Firewall can be published in "firewall-topic" by Logstash and be consumed be Flink only where Firewall logs are needed.
Apache Flink: is a real-time data streaming and processing platform. Flink can process events in stream or batch mode and can be used for enrichements, pattern detection (FlinkCEP) or Machine Learning (Flink ML). Like Apache Kafka, Flink is scalable and can process events at large scale.
Redis: is an in-memory data structure store that can be used as cache or database. Redis will be used to store information grabbed from external database in order to limit number of API call and to not slow down the data stream.
Elasticsearch: is a search engine used to index events and provide fast and advanced investigation thought API or Kibana (Web interface).

And here is the external enrichment sources that will be used :

Shodan: is a search engine which index information about internet connected devices. This can be any type of devices like routers, servers, webcams, fridges or pig farm management consoles. Shodan index information about banner returned by open ports.
Onyphe: is also a search engine which index treat intelligence data list and other information grabbed by crawling different sources or internet directly.
VirusTotal: is a website that can be used to analyze suspicious files but also URLs. Once a file have been submitted and analyzed, hashes are created and then can be queried (thought an API or Web interface) for comparison.

For more information about external sources :

Note: External sources list is non-exhaustive, as long as you have an API Key (if needed) and you know URL parameters to get desired results, you will be able to adapt the code.

Note: For extensive usage of these sources, a subscription to a "Premium" or "Enterprise" offer could be necessary.

PreviousData Sanitization NextStreaming architecture

Last updated 5 years ago

Was this helpful?