The following documentation aims to offer an overview of the implementation of the Vulnerability Plugin, breaking down the main components and offering details regarding the format of the vulnerability data.
The two main components are the Producer and Consumer. The Producer does the heavy lifting by gathering information from different sources, merging them into a preliminary Vulnerability Object and later enriching the Object with callable level-detail, thanks to the PatchFinder. Each Vulnerability Object is published to a Kafka topic and found on the other hand by the Consumer, which injects the data in the Knowledge base and stores the vulnerability statement in the local file system.
The tool is designed to run as a standalone process, gathering, enriching and ultimately publishing the information on a Kafka Topic. The code sits in this repository.
The ParserManager
class contains and handles data inputs from all different parsers implemented. Every Parser (NVDParser
, GHParser
, ExtraParser
and OVALParser
) pulls information from (a) differente source(s) and is also capable of retrieving updates from the same source(s). Each Parser Class implements the following Interface:
public interface VulnerabilityParser {
// Method to retrieve existing vulnerabilities
HashMap<String, Vulnerability> getVulnerabilities();
// Method to retrieve updated and new vulnerabilities
HashMap<String, Vulnerability> getUpdates();
}
The ParserManager
first calls getVulnerabilities
from each Parser and then aggregates all the information in a commmon format that is passed down the pipeline for more data enrichment. The other method implemented by each Parser is getUpdates
, which will be called daily in order to aggregate new information from each source. This makes the process of adding new Parsers from new sources of information easier.
The ParserManager
aggregates information from the following sources of information:
Source | License | Frequency of updates |
---|---|---|
NVD JSON Feed | Public Domain | Every 2 hours |
GitHub Advisories | Public Domain | Daily |
MSR 20191 | Public Domain | n/a |
MSR 20202 | Public Domain | n/a |
Safety DB (by pyup.io) | CC BY-NC-SA 4.0 | Monthly |
cvedb (by fabric8-analytics) | n/a | Daily |
victims-cve-db | CC BY-SA 4.0 | n/a |
Debian Security Tracker | Public Domain | Daily |
SAP project-kb | Public Domain | n/a |
In order to find where specifically the vulnerability lies in a package, patch
links allow to retrieve information regarding what was changed in order to patch the vulnerability. Combined with some heuristics, this allows to drill down the specific callables
that were patched.
The PatchFarmer
receives a list of references contained in each Vulnerability Object and handles each of them in order to figure out if it's possible to extract some patch diffs. The following is a list of the sources of information handled by the class:
- GitHub Commits
- GitHub Pull Requests
- GitHub Issues
- GitLab Commits
- GitLab Merge Requests
- GitLab Issues
- BitBucket Commits
- BitBucket Pull Requests
- BitBucket Issues
- Bugzilla bugs
- JIRA tickets
- Git Trackers Commits
- SVN Revisions
- Mercurial Revisions
- Apache Mailing List
The consumer consumes Vulnerability definitions published by the Producer on the Kafka topic and stores the information in the DB.
Each Vulnerability is stored in the metadata
of package_versions
and callables
table, precisely, the following format will be used:
# metadata #
{
"vulnerabilities": {
"CVE-2020-0042": {...},
"CVE-2020-0043": {...}
}
}
The individual definition of each vulnerability will also be available through the REST API.
In order to merge all the different sources together, a common difinition of vulnerability has been introduced. Here is a JSON representation of an example from the famous HearthBleed (CVE-2014-0160):
{
"id": "CVE-2019-11777",
"description" : "In the Eclipse Paho Java client library version 1.2.0, when connecting to an MQTT server using TLS and setting a host name verifier, the result of that verification is not checked. This could allow one MQTT server to impersonate another and provide the client library with incorrect information.",
"severity": "MODERATE",
"scoreCVSS2": 5.0,
"scoreCVSS3": 7.5,
"published_date": "2019-09-11",
"last_modified_date": "2020-06-10",
"vulnerable_purls": [
"pkg:org.eclipse.paho/paho.client.mqttv3@1.0.2",
"pkg:org.eclipse.paho/paho.client.mqttv3@1.1.0",
"pkg:org.eclipse.paho/paho.client.mqttv3@1.1.1",
"pkg:org.eclipse.paho/paho.client.mqttv3@1.2.0"
],
"vulnerable_fasten_uris": [
"/org.eclipse.paho.client.mqttv3.internal/SSLNetworkModule.start()%2Fjava.lang%2FVoidType",
"/org.eclipse.paho.client.mqttv5.internal/SSLNetworkModule.start()%2Fjava.lang%2FVoidType"
],
"patch_date": "2018-05-26",
"references": [
"https://nvd.nist.gov/vuln/detail/CVE-2019-11777",
"...",
],
"patches": [
"https://bugs.eclipse.org/bugs/show_bug.cgi?id=549934"
],
"exploits": [
"http://www.exploit-db.com/exploits/42",
"...",
]
}
id: Identifies the vulnerability (e.g. CVE-2014-0160
, GHSA-3pc2-fm7p-q2vg
, pyup.io-34978
)
description: Textual description of the vulnerability
severity: One of the following: LOW, MEDIUM, MODERATE, HIGH, CRITICAL
scoreCVSS2: Find more information here
scoreCVSS3: Find more information here
published_date: Date when the vulnerability was published (yyyy-mm-dd)
last_modified_date: Date when the vulnerability has been last modified (yyyy-mm-dd)
vulnerable_purls: Package coordinates of vulnerable packages. Follows purl-spec guidelines
vulberable_fasten_uris: Vulnerable callables. Listed using FASTEN URI
format.
patch_date: Date when the vulnerability has been patched (yyyy-mm-dd)
references: List of links to pages and documentation
patches: List of links to patches that fixed the vulnerability
exploits: List of links to exploits. Most of them from exploit-db
1 Ponta, S. E., Plate, H., Sabetta, A., Bezzi, M., & Dangremont, C. (2019). A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software. 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). doi:10.1109/msr.2019.00064
2 Jiahao Fan, Yi Li, Shaohua Wang and Tien N. Nguyen. 2020. A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries. In MSR ’20: The 17th International Conference on Mining Software Repositories,May 25–26, 2020, MSR, Seoul, South Korea. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3379597.3387501