We host a full web API that includes all objects created by cve2stix and cpe2stix, Vulmatch.
A small wrapper to download data using cve2stix and cpe2stix, organising it into STIX bundles based on time ranges.
# clone the latest code
git clone https://github.com/muchdogesec/cxe2stix_helper -b main --recurse-submodules
# create a venv
cd cxe2stix_helper
python3 -m venv cxe2stix_helper-venv
source cxe2stix_helper-venv/bin/activate
# install requirements
pip3 install -r requirements.txt
cxe2stix_helper has various settings that are defined in an .env
file.
To create a template for the file:
cp .env.example .env
To see more information about how to set the variables, and what they do, read the .env.markdown
file.
python3 cxe2stix_helper.py \
--run_cve2stix boolean \
--run_cpe2stix boolean \
--last_modified_earliest date \
--last_modified_latest date \
--file_time_range dictionary
Where;
run_cve2stix
(optional, boolean): will run the cve2stix script with settings defined- default:
false
- default:
run_cpe2stix
(optional, boolean): will run the cpe2stix script with settings defined- default:
false
- default:
last_modified_earliest
(required, date in formatYYYY-MM-DDThh:mm:ss
): used in the the cve2stix/cpe2stix config- default: none
last_modified_latest
(required, date in formatYYYY-MM-DDThh:mm:ss
): used in the the cve2stix/cpe2stix config- default: none
file_time_range
(optional): defines how much data should be packed in each output bundle. Used
for days,m
for months,y
for years. Note, if no results are found for a time period, a bundle will not be generated. This usually explains why you see "missing" bundles for a day or month.- default
1m
(1 month)
- default
python3 cxe2stix_helper.py \
--run_cpe2stix \
--last_modified_earliest 2023-03-04T00:00:00 \
--last_modified_latest 2023-06-04T23:59:59 \
--file_time_range 1m
Will generate 4 bundle files in directories as follows:
output
└── bundles
└── cpe
└── 2023
├── cpe-bundle-2023_03_04-2023_03_31.json
├── cpe-bundle-2023_04_01-2023_04_30.json
├── cpe-bundle-2023_05_01-2023_05_31.json
└── cpe-bundle-2023_06_01-2023_06_04.json
python3 cxe2stix_helper.py \
--run_cve2stix \
--last_modified_earliest 2023-01-01T00:00:00 \
--last_modified_latest 2023-01-03T23:59:59 \
--file_time_range 1d
Will generate 3 bundle files:
cve-bundle-2023_01_01-2023_01_01.json
cve-bundle-2023_01_02-2023_01_02.json
cve-bundle-2023_01_03-2023_01_03.json
output
└── bundles
└── cve
└── 2023-01
├── cve-bundle-2023_01_01-2023_01_01.json
├── cve-bundle-2023_01_02-2023_01_02.json
└── cve-bundle-2023_01_03-2023_01_03.json
python3 cxe2stix_helper.py \
--run_cve2stix \
--run_cpe2stix \
--last_modified_earliest 2023-01-01T00:00:00 \
--last_modified_latest 2023-01-02T23:59:59 \
--file_time_range 2m
Will generate 2 bundle files:
output
└── bundles
├── cve
│ └── 2023
│ └── cve-bundle-2023_01_01-2023_01_02.json
└── cpe
└── 2023
└── cpe-bundle-2023_01_01-2023_01_02.json
The APIs can return a large amount of data, and downloading large time ranges in one run can cause an issue.
We use a range of downstream tools that require STIX bundles in smaller sizes and with certain naming conventions.
Which means you need to manually edit the .env files for many time ranges each time.
cxe2stix_helper is designed to automate the process of downloading very large datasets whilst also allowing control on the output filenames.
If you want to keep a copy of each individual STIX .json object, you should use cve2stix or cpe2stix. cxe2stix_helper will only print the final bundles.
The first CVE published was 1988-10-01T04:00:00.000
(. There are 250,888 at the time of writing, and this number increasing rapidly.
Note, whilst the first CVE was published in October 1988, it appears all CVEs published before 2005 were updated at the end of 2005 (or afterwards).
There are more CPEs (1,267,211 currently) than CVEs but the STIX objects created from them are smaller, and thus a smaller file size. The earliest CPEs have a last modified date in 2007.
Due to the volume and size of CVEs, we recommend iterating through the data in days. This means all bundles (especially those after 2018) will always be less than 10mb.
Here is what we use;
python3 cxe2stix_helper.py \
--run_cve2stix \
--run_cpe2stix \
--last_modified_earliest 2005-01-01T00:00:00 \
--last_modified_latest 2024-11-30T23:59:59 \
--file_time_range 1d
We try and keep this repo in sync with the remote cve2stix / cpe2stix repos used as Git submodules when changes happen.
Sometimes this is not always the case (either because we've forgot, or there are breaking changes).
If it's the case we've forgotten, you can update the Git Submodules in this repo as follows:
cd cpe2stix && \
git checkout main && \
git pull && \
cd .. && \
cd cve2stix && \
git checkout main && \
git pull && \
cd ..
We use a Github action to run this script daily to store the bundles generated by cxe2stix_helper on Cloudflare R2.
The script runs at 0700 UTC everyday (github servers UTC) using cron: "0 7 * * *"
You can see the action in: /.github/workflows/daily-r2.yml
.
Essentially the following command is run everyday by the action
python3 cxe2stix_helper.py \
--run_cve2stix \
--run_cpe2stix \
--last_modified_earliest "YESTERDAY (00:00:00)" \
--last_modified_latest "YESTERDAY (23:59:59)" \
--file_time_range 1d
The action will store the data in the bucket as follows;
cxe2stix-helper-github-action-output
├── cve
│ └── 2023-01
│ └── cve-bundle-2023_01_01-2023_01_02.json
└── cpe
└── 2023-01
└── cpe-bundle-2023_01_01-2023_01_02.json
If you'd like to run the action in your own repository to create your own data store you will need to do the following;
First, go to Cloudflare.com and navigate to R2. Create a new bucket called cxe2stix-helper-github-action-output
.
Now you need to create a CloudFlare API keys. For the CloudFlare API Key you create, make sure to set the permissions to Admin Read & Write
. For security, it is also worth limiting the scope of the key to the bucket cxe2stix_helper-github-action-output
(defined in the action).
Then go to the Github repo, then repo > settings > secrets and variables > actions > new repository secret
.
Then choose one of the following options;
Set the following in the secrets;
CLOUDFLARE_ACCOUNT_ID=#Get this in Cloudflare R2 UI
CLOUDFLARE_ACCESS_KEY_ID=#Get this in Cloudflare R2 UI
CLOUDFLARE_ACCESS_KEY_SECRET=#Get this in Cloudflare R2 UI
NVD_API_KEY=#Get this from https://nvd.nist.gov/developers/request-an-api-key
You most likely want to use this approach.
In the RCLONE_CONFIG
var, add a valid RClone conf file (title must be [R2]
), e.g.
[r2]
type = s3
provider = Cloudflare
access_key_id = <ACCESS_KEY>
secret_access_key = <SECRET_ACCESS_KEY>
region = auto
endpoint = https://<ACCOUNT_ID>.r2.cloudflarestorage.com
acl = private
This approach allows you to potentially use other services than just Cloudflare, if you know what you're doing.
Where:
[r2]
: An alias for the storage service. We need to use it to operate files, should always be[r2]
type
= s3: The type of file operation API. R2 supports the S3 standard protocol.provider
= Cloudflare: The storage provider ID. You could use man rclone in your terminal to get the supported providers.access_key_id
: You need to create a token with Admin Read & Write permissions on the R2 console (note, I am not sure if this is a bug, but I couldn’t get it to work with any other permissions levels)secret_access_key
: Same as above.endpoint
: The URL that rclone uses to operate files. To get the account id on the top-right of the R2 homepage.
Due to the backfill size it will cause timeouts if you try to run it on Github. Similarly, if you set the file_time_range
above 1d
it is likely to timeout due to data sizes. It's better to run the backfill locally and then start the automated action to backfill from backfill dayN+1.
Here are the Rclone commands you can use to upload the backfill files downloaded locally;
rclone copy output/bundles/cpe r2:cti-public/cxe2stix-helper-github-action-output/cpe --exclude '.*{/**,}' && \
rclone copy output/bundles/cve r2:cti-public/cxe2stix-helper-github-action-output/cve --exclude '.*{/**,}'
You will need to replace cti-public
with your bucket name. /cxe2stix-helper-github-action-output/cpe
is the path the the directory in the bucket you want to store the files.
Note, the default behaviour of running this command will be to overwrite old files. If you need to delete the directories, you can use rclone to do so as follows
rclone purge r2:cti-public/cxe2stix-helper-github-action-output/cpe && \
rclone purge r2:cti-public/cxe2stix-helper-github-action-output/cve