ThreatDB is a platform that aims to collect and aggregate data from several different Threat Information Sources into a unique structure, similar to other commercial sharing platforms, such as IBM X-Force Exchange, Microsoft Interflow and HP Threat Central. It has as a main purpose to make security information easily accessible to any kind of Threat Intelligence System. In reality, it allows decision-making systems to focus on the security analysis, rather on the overkill of data normalization. That is a significant pre-processing step, which simplifies post-processing for all future consumers and it sets a good baseline towards real-time alerting.
Why do we need ThreatDB
As sharing of Threat Information becomes a major driver in the security industry, ThreatDB provides an advanced service for retrieving publicly available threats on an automated and configurable way. The volume of threat related data, which are shared among organizations, communities and companies, is enormous, extremely dynamic, and valuable and it will continue to grow.
ThreatDB optimizes the first level of analysis of that information and it enriches the threat data with useful additional details, like geolocation information, and providers ASN. At the same time, it tracks all the source changes by scheduling periodic updates and it applies proper filters, in order to exclude all the unnecessary data noise from future analysis. Finally, it normalizes the consumed raw data into unique and common structures, which helps information handling and storage via a standardize format. That entire individualized process allows any Intelligent System to access an ecosystem of databases and emphasize efficiently only on targeted data, by performing their own custom predictive risk analysis.
The goal of ThreatDB
The main goal of ThreatDB is to generate reliable data with respect to all kinds of threats in a mixed database ecosystem. Instead of directly trusting what has been reported as a threat from open sources, communities, certified organizations or third parties, it goes a step further for evaluating the reliability of data. The platform introduces mechanisms for permitting a deeper assessment via a statistical analysis that is based on historical records, but also a real-time feedback. Imagine a system that runs simultaneously in 2 different modes; a real-time mode, which aims to provide fast access to an in-memory database that holds the latest reported threats and a historical mode, which holds months of threat metrics, together with an intelligent feedback that guarantees a confidence level for decision-making.
Although, that ThreatDB is not directly involved in the decision-making or the quality assessment of all collected Threat Information, it plays a key role for transforming and finalizing the data-structure of collected threat data into a flexible format, ready for consumption. Therefore, ThreatDB acts as a smart input generator for any Intelligent System. It provides the fundamental historical enriched data sets without dependencies and that gives a general freedom to consumers to correlate that information with more advanced factors.
By advanced factors we refer to the intelligent processing that involves machine learning , data modeling and a separate logic for concluding into the most valuable security decisions per process , system or network . Most of the time, Intelligent Systems go beyond deterministic methodologies and introduce a probabilistic complexity for enhancing their results. Usually, due to the computational cost of re-learning algorithms, efficiency becomes a core requirement, but also an interesting challenge. The easy consumption of the relevant and well-structured data that ThreatDB offers, it helps such systems to achieve the maximum benefits of threat information sharing, but without sacrificing their own level of customization. In practice, they adopt all the existing process flow directly into their needs. In an informal definition, we could consider ThreatDB to be an upcoming data-warehouse for threats.
Threat Information Sources
A vital part of the ThreatDB system is the selection of Threat Information Sources, for collecting all basic types of security data. At the moment, the crawler includes a list of 23 reviewed sources, which they lead roughly to 4,000,000 daily threat records. At the moment, these records constitute a mix of blacklisted IPs, suspicious URLs, spam emails, malicious domain and Common Vulnerabilities and Exposures (CVEs). However, new sources and types are continuously reviewed and they will be supported gradually, as we successfully verify their security value. The following list summarizes our currently active sources:
As future enhancements to the existing ThreatDB services a public ThreatDB API is already under development and it targets to provide efficiently information sharing via subscriptions for any third parties. It should be noted that the upcoming API also includes the ability to query at real-time the confidence levels of individual threat records. In practice, this can be interpreted that any ThreatDB consumer gets the benefits of indirect access to Crypteia Networks Intelligence System (ThreatIQ). Under this line of work, support of standards like STIX/TAXI, and OpenIOC is at a preliminary investigation and research stage.
Finally, there is already a roadmap for integrating additional Threat Information Sources, coming from other security companies, social sources, but also the open-web. The goal is to cover all possible threat types and allow ThreatDB to successfully correlate data, certify existing security risks and re-share information in a unified form, enhancing trustworthiness and reliability of reported threats.