Please download the all the example datasets here.
Three NetFlow datasets: Netflow data has the following schema TBD
- UGR16 dataset consists of traffic (including attacks) from NetFlow v9 collectors in a Spanish ISP network. We used data from the third week of March 2016.
- CIDDS dataset emulates a small business environment with several clients and servers (e.g., email, web) with injected malicious traffic was executed. Each NetFlow entry recorded with the label (benign/attack) and attack type (DoS, brute force, port scan).
- TON dataset represents telemetry IoT sensors. We use a sub-dataset (“Train_Test_datasets”) for evaluating cybersecurity-related ML algorithms; of its 461,013 records, 300,000 (65.07%) are normal, and the rest (34.93%) combine nine evenly-distributed attack types (e.g., backdoor, DDoS, injection, MITM).
Three PCAP datasets:
- CAIDA contains anonymized traces from high-speed monitors on a commercial backbone link. Our subset is from the New York collector in March 2018. (Require an CAIDA account to download the data)
- DC dataset is a packet capture from the "UNI1" data center studied in the IMC 2010 paper.
- CA dataset is traces from The U.S. National CyberWatch Mid-Atlantic Collegiate Cyber Defense Competitions from March 2012.
Zeek: Zeek logs have the following schema TBD
Wikipedia: The wikipedia web page view logs have the following schema TBD