Fuzz testing (fuzzing) allows developers to detect bugs and vulnerabilities in code by automatically generating defect-revealing inputs. Most fuzzers operate by generating inputs for applications and mutating the bytes of those inputs, guiding the fuzzing process with branch coverage feedback via instrumentation. Whitebox guidance (e.g., taint tracking or concolic execution) is sometimes integrated with coverage-guided fuzzing to help cover tricky-to-reach branches that are guarded by complex conditions (so-called "magic values"). This integration typically takes the form of a targeted input mutation, for example placing particular byte values at a specific offset of some input in order to cover a branch. However, these dynamic analysis techniques are not perfect in practice, which can result in the loss of important relationships between input bytes and branch predicates, thus reducing the effective power of the technique.
CONFETTI introduces a new, surprisingly simple, but effective technique, global hinting, which allows the fuzzer to insert these interesting bytes not only at a targeted position, but in any position of any input. We implemented this idea in Java, creating CONFETTI, which uses both targeted and global hints for fuzzing. In an empirical comparison with two baseline approaches, a state-of-the-art greybox Java fuzzer and a version of CONFETTI without global hinting, we found that CONFETTI covers more branches and finds 15 previously unreported bugs, including 9 that neither baseline could find.
CONFETTI is a research prototype, but nonetheless, we have had success applying it to fuzz the open-source projects Apache Ant, BCEL and Maven, Google's Closure Compiler, and Mozilla's Rhino engine.
We provide an artifact of our development and evaluation of CONFETTI that contains all of our code, scripts, dependencies and results in a Virtual Machine image, which we believe will provide a stable reference to allow others to be sure that they can make use of our tool and results in the future. However, we recognize that there is a significant tension between an artifact that is "resuable" and one which is stable. In the context of the rapidly-evolving field of fuzzers, "reusable" is likely best signified by a repository and set of continuous integration workflows that allow other researchers to fork our repository, develop new functionality, and automatically conduct an evaluation. For example, we found this artifact particularly useful when we used it in preparation of a pull-request that we provided to the upstream maintainers of the baseline fuzzer that we compared to, JQF.
A continuous integration artifact likely has an enormous number of external dependencies that are not possible to capture - such as the provisioning and configuration of the CI server itself. We make our "live" continuous integration artifact available on GitHub, and have permanently archived our CI workflow and its components on FigShare. This virtual machine image is less likely to be useful than our CI workflow for reusing CONFETTI, but will be resilient to bitrot, since it is fully self-contained
Reviewers of this artifact should:
- Have VirtualBox or VMWare available to run our VM. It requires 4 CPU cores, 32GB RAM, and the disk image is approximately 20 GB.
- We recommend using VirtualBox over VMWare. We used the base VM image distributed by the artifact evaluation chairs, which we found could be fickle with VMWare, although we were ultimately able to make it work with both.
- In either solution, reviewers need to create a new VM (Linux 64bit) as specified above with 32GB RAM and 4 CPUs, and attach the provided VMDK as the first storage device (e.g. SATA port 0 on VirtualBox)
Ideally, reviewers might also consider checking to see whether they can build and run CONFETTI directly on their local machines and run a short (3-5 minute) fuzzing campaign, to validate that this simpler development model is also possible. The requirements for running CONFETTI directly on a machine are:
- Mac OS X or Linux (we have tested extensively with Ubuntu, other versions are sure to work, but may require manually installing the correct release of z3 version 4.6 for the OS)
- Have Java 8 installed, with the
JAVA_HOME
environmental variable configured to point to that Java 8 JDK - Have Maven 3.x installed
- Have at least 16GB RAM
This document is in a snapshot of a living artifact.
You are most likely either reading this document in our GitHub repo, neu-se/confetti-artifact, but this document and entire repository also exist within a Virtual Machine that is packed with all of the software and dependencies that we used to perform our evaluation, along with copies of all of the primary data that we collected and the intermediate data that we processed.
If you are ambitious, you are reading document in a VM that contains a clone of a github, and are ready to poke around in it.
If you need to update this VM, you can simply run git pull
in this directory.
When we make a new release of the artifact, we publish a new VM to FigShare, ensuring that this entire process remains reproducible without any external dependencies.
Our experimental results require an enormous quantity of computational resources to generate: there are 20 trials each of three fuzzers on 5 target applications, where each trial takes 24 hours. We do not expect reviewers to repeat this entire evalution.
For each of the tables and figures that appear in the paper, there are generally 2-3 scripts that get run:
- A perhaps very-long running (24 hour+) script that collects some primary data
- A perhaps shorter, but still longer running (10 minutes - 2 hours) script that processes the primary data into an intermediate representation
- A very fast running script that proceses the intermediate data into the final results that appear in the paper
A downside to having multiple scripts is that there is no single script for you to run to re-do the entire experiment. However, given the CPU time needed for gathering primary data, we think that the extra steps are well-worth the benefits for ease of use, debugging, and extension. We provide instructions to allow reviewers to process the same primary data that we used for our ICSE 22 paper (24 hours, 5 target apps, three fuzzers, 20 trials) through our data pipeline in order to confirm that the tables and graphs in the paper can be reproduced from this data. We also provide instructions to allow reviewers to conduct a much shorter evaluation (10 minutes, 5 target apps, three fuzzers, 1 trial), and then to process all of that data through the same scripts to generate tables and graphs.
We would suggest the following path through this artifact to evalaute it:
- Follow the steps under Producing the tables and graphs from the results. In each subsection look for this "pre-bake" icon π, which will draw your attention to where to start to generate the tables and graphs using our previously-copmuted intermediate results. This should allow you to traverse this entire document without needing to wait for any long running computation. You can validate that the results match (or nearly match; some parts remain non-deterministic due to timing, and expected deviations are noted where applicable). For convenience, the pre-bake results are all included in our artifact VM and available directly on FigShare.
- Run a fuzzing campaign, starting from Running a fuzzing campaign in the artifact. The shortest campaign that will generate sufficient data to be used by the rest of the data procesing pipeline will run all 3 fuzzers on all 5 benchmarks for 10 minutes each, with no repeated trials. This should take a bit under 3 hours to run. Our instructions show how to run the exact configuration that we performed in the paper and included the pre-bake π results for (24 hours x 20 trials x 3 fuzzers x 5 benchmarks), but look for the π three oclock symbol π, which will draw your attention to specific instructions for running a shorter experiment.
- Browse our "live" development artifact: a Continuous Integration workflow that uses our institution's HPC resources to execute short (10 minutes x 5 trials x 5 benchmarks) evalautions on each revision of our repository, and optionally full-scale (24 hour x 20 trials x 5 benchmarks) fuzzing campaigns. We would be happy to trigger this GitHub Action to run on our CI workers if you submit a pull request.
In our ICSE 2022 paper, we evaluated JQF-Zest, CONFETTI, and a variant of CONFETTI with global hints disabled.
This section describes how to reproduce those experiments, and includes pointers to the logs and results produced from our evaluation.
Note that we imagine that for future development, it will be far easier to use the Continuous Integration process described in the prior section to conduct performance evaluations.
However, we did not implement that process until after submitting the ICSE 2022 paper, and hence, with the goal of reproducibility, describe the exact steps to reproduce those results.
We executed the evaluation reported in the paper on Amazon's EC2 service, using r5.xlarge
instances, each with 32GB of RAM and 4 CPUs.
We executed our experiments in parallel by launching one VM for each run (e.g. 20 trials x 3 fuzzers x 5 benchmarks = 300 VMs for 24 hours), where each VM had the same configuration as this artifact VM.
We then collected the results from each of those VMs; all of these results are included inside of this artifact and are also directly available for download on our FigShare artifact.
We provide an Ubuntu 20 VM that contains the exact same versions of all packages that we used in our paper evaluation.
The username and password to login to this VM are both icse22ae
, and it has an SSH server running on port 22.
We provide a brief overview of the software contained in the artifact to help future researchers who may want to modify CONFETTI or any of its key dependencies. We expect that this use-case (modifying the code, recompiling, and running it) will be best supported by our Continuous Integration artifact described above, but the VM provides the most resilience to bitrot, as it includes all external dependencies and can be executed without being connected to the internet.
The artifact VM contains a suitable JVM, OpenJDK 1.8.0_312, installed to /usr/lib/jvm/java-8-openjdk-amd64/
. The CONFETTI artifact is located in /home/icse22ae/confetti-artifact
, and contains compiled versions of all dependencies. The artifact directory contains scripts to run the evaluation, and we include the source code of all of CONFETTI's key components, which can be modified and built without connecting to the internet to fetch any additional dependencies.
The key software artifacts are located in the software
directory of the artifact:
jqf
: CONFETTI (namedjqf
for historical purposes), specifically neu-se/confetti@icse-22-evaluation - The revision of CONFETTI that we evaluatedjqf-vanilla
: The baseline version of JQF we compared to, specifically neu-se/jqf-non-colliding-coverage@jqf-1.1-with-non-colliding-coverage. See discussion of patches we wrote for JQF below.knarr
: gmu-swe/knarr@icse-22-confetti-evaluation - The constraint tracking runtime used by CONFETTIgreen
: gmu-swe/green-solverjacoco-fix-exception-after-branch
: neu-se/jacoco@fix-exception-after-branch - Patched version of JaCoCo that we used to collect coverage. We found that JaCoCo wouldn't record a branch edge as covered if it was covered, and then immediately after an exception was thrown. This complicated debugging and analysis of the JaCoCo HTML output reports; this branch has that bug fixed, and it is this version of JaCoCo that is included in the artifact, and in thesoftware/jqf/jacoco-jars
directory.software/z3
: Binaries from Z3Prover/z3, release version 4.6.0, of thex64-ubuntu-16.04
flavor. This is the version of Z3 that we used in our evaluation.
We also include all of the dependencies for all of the fuzzing targets that we studied.
We do not exhaustively document their contents, but they can be found in the software/
directory.
The scripts to run our experiments apply Knarr's instrumentation to each of those dependencies, producing the *-inst
directories in the software
directory.
We can not imagine the circumstances where it would be necessary to re-instrument those dependencies, but if needed, this can be accomplished with the scripts/build/instrument-experiments.sh
script.
Expected Errors: The script that instruments class files may output a variety of FileNotFoundException
s, NullPointerException
s, and ClassCastException
s - these can be ignored.
Other software installed in the VM to support running the experiment scripts are:
- SSH server: we find it easiest to run VSCode outside of the VM, and use the "connect to remote" feature to connect your local VSCode instance to the artifact
- R: Plots and tables are generated using R. Installed packages include
readr, tidyr, plyr, ggplot2, xtable, viridis, fs, forcats
- PHP: Some of our experiment scripts are written in PHP. We promise to stop using PHP for scripting after this project :)
All commands below should be executed in the confetti-artifact
directory in the artifact
Since CONFETTI depends so heavily on the projects knarr
and green
, we include the source code for those projects in this artifact as well, so that future researchers who would like to modify those dependencies and rebuild CONFETTI will always have access to them.
If you would like to confirm that CONFETTI and its dependencies can be re-compiled in an offline mode (with no network connectivity), you may follow the following steps:
green
: In the directorysoftware/green/green
runant clean install
knarr
: In the directorysoftware/knarr
runmvn -o install
.jqf
(CONFETTI): In the directorysoftware/jqf
runmvn -o install
jqf-vanilla
: In the directorysoftware/jqf-vanilla
runmvn -o install
jacoco-fix-exception-after-branch
: In the directorysoftware/jacoco-fix-exception-after-branch
runmvn -o -DskipTests install
(our patch broke several brittle tests; we manually confirmed the correct behavior but haven't repaired the tests)
If you would like to clean any of the maven projects so that you can build them "from scratch," you may do so using the command mvn clean
.
Our artifact supports running the fuzzer in two ways: interactively (run one experiment at time, process results manually), and headless (run a complete suite of experiments, prepare results for automated analysis). If you are an ICSE 2022 artifact evaluator whose goal is to reproduce the complete suite of experiments, you should proceed directly to the "Running a Headless Experiment" section. If you are a researcher who is trying to actually reuse our tool and build on it, you may find the "Interactive" documentation more useful, and hence we provide both here.
To run a single fuzzing experiment in the artifact, use the script scripts/runExpInScreen.sh
, which takes a single parameter: the experiment to run.
After several seconds, you will see a screen open that is labeled "Zest: Validity Fuzzing with Parametric Generators", and displaying various live-updating statistics of the fuzzing campaign.
This script will run the specified experiment with a timeout of 24 hours, if you would like it to terminate sooner, you can end it by typing control-C.
The experiment name is the combination of the target application to fuzz with the fuzzer to evalaute. The list of target application names is (ant
, bcelgen
, closure
, maven
, rhino
). The list of fuzzers to evaluate are knarr-z3
, knarr-z3-no-global-hint
, and jqf
. Within this artifact, knarr-z3
stands in for the name CONFETTI
, and knarr-z3-no-global-hint
stands in for CONFETTI-NoGlobalHint
(it is perhaps not unusual for names of papers to be decided at the last minute prior to paper submission, and we include here the artifact of scripts we used to prepare the results in the paper, before that final name change).
We have also included a script, scripts/runOneExperiment.php
, that we used to automate running a fuzzing experiment in a "headless" mode, where the experiment runs for 24 hours, then copies the results to an Amazon S3 bucket, and then shuts down the VM. This is the exact script that we used to run our experiment on EC2. There is additional configuration necessary to provision an S3 bucket for use with the script; if a reviewer is familiar with S3 already then the configuration should be fairly self explanatory, but providing detailed instructions to provision a large-scale experiment is a non-goal for this artifact.
π Pre-bake available π The results presented in our paper are the result of running each of these experiments 20 times for 24 hour each. We include the raw results produced by running our scripts/runOneExperiment.php
script in the directory icse_22_fuzz_output
. You can also download these results direclty from our FigShare artifact, they are included int he archive fuzz_output.tgz
. In these result files, note that the name "Knarr-z3" is used in place of "CONFETTI" and "Knarr-z3-no-global-hint" in place of "CONFETTI no global hints" - in our early experiments we also considered a variety of other system designs, Knarr-z3 was the design that eventually evolved into CONFETTI.
π Shorter run option π The smallest experiment that will generate any meaningful results requires ~3 hours to run, and will execute 1 trial of each fuzzer on each fuzzing target, for 10 minutes each. You can run this shorter trial, and then use these results for the data processing pipelines to generate the tables and graphs. To run this experiment, run the command ./scripts/runSmokeTest.sh
. The results will be output to the directory local_eval_output
. For other durations, you can edit the timeout in runOneSmokeTest.sh
- it is specified in seconds through variable DURATION
.
We saved a copy of the output of a successful run of this script to tool_output/runSmokeTest.sh.out
, and the resulting fuzzing results to prebake_shorter_fuzz_output
.
Configuration Notes These scripts instructions, by default, process the primary data that we collected for our ICSE 2022 paper, which is stored in this artifact in the icse_22_fuzz_output
. If you are following the π Shorter run option π, the correct directory to specify is local_eval_output
. The fastest way to run these scripts in their entirety is to use the π pre-baked π short run results (1 trial of each fuzzer for 10 minutes each, pre-collected), which are in the directory prebake_shorter_fuzz_output
.
The left side of this table (branch coverage) is built by using the script scripts/reproCorpusAndGetJacocoTGZ.php
.
This script takes as input the tgz archives of each of the results directories produced from the fuzzing campaign (e.g. the files in icse_22_fuzz_output
) and automates the procedure of collecting branch coverage using JaCoCo.
To execute the script, run php scripts/reproCorpusAndGetJacocoTGZ.php icse_22_fuzz_output
- note that our experience is that this script can take several hours to run. Note that due to non-determinism, we have noticed that the exact number of branches covered might vary by one or two on repeated runs.
π Pre-bake available π The output of our run of this script is in tool_output/reproCorpusAndGetJacocoTGZ.txt
, if you do not have an hour to wait for the results, consider inspecting this file directly. You might also run the script, which will print out results as it goes, and confirm that the first few numbers look OK and then terminate the script early before it computes the rest.
π Shorter run option π Collecting JaCoCo coverage from the 10 minute trials takes approximately 5-10 minutes. To collect the coverage, run php scripts/reproCorpusAndGetJacocoTGZ.php local_eval_output
.
The right side of this table (bugs found) is built by manually inspecting the failures detected by each fuzzer, de-duplicating them, and reporting them to developers.
The failures are collected from the fuzz_output
directory and processed by a de-duplicating script.
Our de-duplicating script uses a stacktrace heuristic to de-duplicate bugs. CONFETTI itself has some de-duplication features within the source code, but JQF+Zest has minimal, resulting in many of the same issues being saved.
Our simple heuristic is effective at de-duplicating bugs (particularly in the case of JQF+Zest and Closure, which de-duplicates thousands of failures to single digits).
However, some manual analysis is still needed, as a shortcoming of a stack analysis heuristic is that two crashes may share the same root cause, despite manifesting in different ways.
Once you have a fuzzing corpus (e.g. from a local run that you completed, or using the π pre-bake resultsπ ), you may perform the de-duplication by running scripts/unique.py
as follows
python3 scripts/unique.py fuzzOutputDir outputDirectory
For example, to analyze the fuzzing corpus that we reported on in our ICSE 22 paper and save the output to bugs
, run the command python3 scripts/unique.py icse_22_fuzz_output bugs
.
The failures within the tarball will be de-duplicated and the bugs
directory will create a directory hierarchy corresponding to the target+fuzzer, the bug class, and the trials which found that bug.
The de-duplication script will also print the number of unique bugs (according to our heuristic) that were found for each target+fuzzer configuration.
Please keep in mind that running the de-duplication script could take several hours, as there are thousands of failures per run (particularly in Closure and Rhino) that require de-duplication.
We conducted manual analysis by examining the output directories from this script to determine if the unique bugs were or were not attributed to the same root cause.
The result of the manual analysis is shown in Tables 1 and 2 in the paper.
π Pre-bake available π The entire de-duplication script will take several hours to run. However, we have included a pre-run output directory located at prebake_icse_22_bugs
. This directory is organized by fuzzer+target, and subdirectories of failure hashes that the de-duplication script deemed to be unique. This directory is what we based our manual analysis upon.
π Shorter run option π The de-duplicating script finishes in a matter of seconds on the 10 minute experiment, you can run it by passing either the prebake_shorter_fuzz_output
to use our π pre-bake 3-hour results π, or local_fuzz_output
if you ran your own campaign.
Note that some issues are open as of the publishing of this artifact. Any developer feedback of bugs having the same root cause will be consolidated in the camera-ready version of the paper.
These graphs are generated in two steps:
- Generate CSV files that contain coverage over time for each fuzzing campaign. Run the script
php scripts/extract-coverage.php icse_22_fuzz_output generated-coverage
. The first argument can be changed to point to a different set of primary data (e.g.local_eval_output
, orprebake_shorter_fuzz_output
), and the second argument can be changed to put the output intermediate dat somewhere else (it is used in the next step). This script may take 30-45 minutes to run, as it needs to extract and process many large files: the fuzzer that we built atop logs statistics every 300 milliseconds, which adds up to quite a bit of data for these 24-hour runs. This script downsamples the data to a one-minute granularity.- π Pre-bake available π You can also skip directly to step 2: the VM is distributed with these files in place, in
prebake_icse_22_generated_coverage
.
- π Pre-bake available π You can also skip directly to step 2: the VM is distributed with these files in place, in
- Build the actual plots, using R: run
Rscript scripts/graphCoverage-fig2.R directoryGeneratedByStep1
(e.g.Rscript scripts/graphCoverage-fig2.R prebake_icse_22_generated_coverage
(orgenerated-coverage
if you re-computed this intermediate data). You can disregard the warning messages. 5 PDFs will be output to the current directory:(ant,bcelgen,closure,maven,rhino)_branches_over_time.pdf
Noted divergence between script and submitted paper, correction in camera-ready: The submitted paper mistakingly reports the bands on the graphs as a confidence interval. They are not, but are in fact a range between min-max values. We have updated this in the text for the camera ready, and do not believe it impacts any conclusions that one would have drawn from the figure, but note this distinction for the particular reviewer who compared the contents of this script with the text in the paper.
This table is built based on the manual analysis of figures discussed above in the context of Table 1. A more detailed description of the bugs, along with a link to their respective issue tracker (where applicable for newly discovered bugs), is included in the table below.
In order to properly compare against the state-of-the-art (JQF+Zest) we elected to test against the same version of software that the authors did, which was an earlier version than the most current release of the respective software at the time of publication. Becauses of this, some newly discovered bugs (N-Days) were unable to be replicated in the latest release of the respective target and were not reported to developers. However, all stacktraces are included in this artifact for completeness (as discussed in the Table 1 section above).
The table below idenitifes the bugs reported in the accepted paper, along with a singular stack hash that is representative of the bug. In performing manual analysis, we may examine more stack hashes that correspond to the same bug, but those are not included here for space purposes.
Bug ID (Hash) | Target | Description | Status/ Issue Tracker Link |
---|---|---|---|
A1 (6600cedc7eb64fc6879a15faca97b32c) | Apache Ant | java.lang.IllegalStateException | Previously discovered by JQF+Zest |
B1 (228be5ecff51c283cc167805f694d387) | Apache BCEL | org.apache.bcel.classfile.ClassFormatException | Previously discovered by JQF+Zest |
B2 (05cfe6a815f95a83689b309af357d9c9) | Apache BCEL | org.apache.bcel.verifier.exc.AssertionViolatedException | Previously discovered by JQF+Zest |
B3 (161a80fb5486d94686b91c9466f45be5) | Apache BCEL | java.lang.IllegalArgumentException | Open Issue: https://issues.apache.org/jira/projects/BCEL/issues/BCEL-358 |
B4 (52b77487a6d3445ed39a83d178e71349) | Apache BCEL | org.apache.bcel.verifier.exc.AssertionViolatedException | Unreported, could not replicate in latest version |
B5 (a0751e05c14ca8e3b181fa82e12d229a) | Apache BCEL | java.lang.StringIndexOutOfBoundsException | Open Issue: https://issues.apache.org/jira/browse/BCEL-357 |
B6 (2d262e47a1deecbb22dce9acb1b1932d) | Apache BCEL | org.apache.bcel.generic.ClassGenException | Open Issue: https://issues.apache.org/jira/browse/BCEL-359 |
C1 (d41d8cd98f00b204e9800998ecf8427e) | Google Closure | java.lang.NullPointerException | Previously discovered by JQF+Zest |
C2 (22509d2bf3b7799b06bbece2554dc1b5) | Google Closure | java.lang.NullPointerException | Previously discovered by JQF+Zest |
C3 (27accb608f215ec58570a4aeca21713f) | Google Closure | java.lang.NullPointerException | Previously discovered by JQF+Zest |
C4 (4489721f785b85b2cd8148c2faae86a1) | Google Closure | java.lang.NullPointerException | Open Issue: google/closure-compiler#3375 |
C5 (2a7803b9e720e63eca467d0f67ce7910) | Google Closure | java.lang.NullPointerException | Closed (fixed) Issue: google/closure-compiler#3455 |
C6 (36acef94490382daf0595a3bb151124b) | Google Closure | java.lang.IllegalArgumentException | Unreported, could not replicate in latest version |
C7 (51dda6312e972707672ea52006cf7640) | Google Closure | java.lang.RuntimeException | Acknowledged Issue: google/closure-compiler#3591 |
C8 (53b44a96971cecd5cc18a144523acae1) | Google Closure | java.lang.NullPointerException | Acknowledged Issue: google/closure-compiler#3861 |
C9 (575e7ba8cf7d3a7812365d3b7e854882) | Google Closure | java.lang.IllegalStateException | Previously discovered by JQF+Zest |
C10 (73b8d23db921a409a09ab8f2b839d635) | Google Closure | java.lang.RuntimException | Unreported, could not replicate in latest version |
C11 (ddb873152144d011c318bc4f876a650c) | Google Closure | java.lang.IllegalStateException | Closed Issue: google/closure-compiler#3857 |
C12 (ee4370e493e3adbe1b05a9615a5d2729) | Google Closure | java.lang.IllegalStateException | Closed Issue: google/closure-compiler#3859 (also google/closure-compiler#3860, google/closure-compiler#3858 ) |
C13 (f1ee694900f3f31a609f615b6e68c98f) | Google Closure | java.lang.IllegalStateException | Closed Issue: google/closure-compiler#3380 |
C14 (fb22e00f7355fec154b2a00d7ac5eb0d) | Google Closure | java.lang.IllegalStateException | Unreported, could not replicate in latest version |
C15 (b3b8932ce900301a9f0c268b87425ff6) | Google Closure | java.lang.IllegalStateException | Unreported, could not replicate in latest version |
C16 (803040f70ae852d02ae45301eaabdeb2 ) | Google Closure | java.lang.IllegalStateException | Unreported, could not replicate in latest version |
R1 (3d8d85967d89cdc7c737c19f94a996b5) | Mozilla Rhino | java.lang.ClassCastException | Previously discovered by JQF+Zest |
R2 (46a11d272584ad7579743db21b6eee33) | Mozilla Rhino | java.lang.IllegalStateException | Previously discovered by JQF+Zest |
R3 (6f067bbf5bdddb1c8a7b06ae252868e5) | Mozilla Rhino | java.lang.VerifyError | Previously discovered by JQF+Zest |
R4 (d41d8cd98f00b204e9800998ecf8427e) | Mozilla Rhino | java.lang.NullPointerException | Previously discovered by JQF+Zest |
R5 (0ad53ac094b70740424ca6e3f326f086) | Mozilla Rhino | java.lang.ArrayIndexOutOfBoundsException | Previously discovered by JQF+Zest |
Table 3: Inputs generated by mutation strategy and Table 4: Analysis of all saved inputs with global hints
These two tables are generated by a single R script, scripts/tabularize-forensics-tables3and4.R
, but there are several steps needed to generate the intermdiate data, as described below for tables 3 and 4.
The usage of this Rscript is: Rscript scripts/tabularize-forensics-tables3and4.R fuzzStatsCSVFile forensicsOutputDir
, where fuzzStatsCSVFile
is the name of the file output by extract-last-line-of-fuzz-stats.php
(table 3 info below), and forensicsOutputDir
is the output of collectExtendedHintInfo.php
(table 4 info below).
π Pre-bake available π You can run this script directly with the command Rscript scripts/tabularize-forensics-tables3and4.R prebake_icse_22_fuzz_stats.csv prebake_icse_22_forensics
to use the same exact intermediate results that we used for our paper (latex tables will be output to stdout), or follow the instructions below to re-generate all of the data from an entirely new fuzzing campaign:
Table 3 needs the collected statistics from each fuzzing run's plot_data
file. Run the script scripts/extract-last-line-of-fuzz-stats.php fuzzOutputDir outputFilename
, where fuzzOutputDir
is the collected fuzzing results (e.g. icse_22_fuzz_output
, local_eval_output
, prebake_shorter_fuzz_output
), and outputFilename
is a name you choose to use as the input to become fuzzStatsCSVFile
above.
For example, to process the ICSE 22 results, run php scripts/extract-last-line-of-fuzz-stats.php icse_22_fuzz_output generatedFuzzStats.csv
. This is expected to take 5-10 minutes, depending on the speed of your machine: it needs to process all of the big .tgz
files in the icse_22_fuzz_output
directory.
This table presents the results of an experiment to attempt to reproduce each of the inputs that CONFETTI generated that had been interesting at the time that they were generated (that is, running the input resulted in new branch probes being covered), but without using the global hints. This experiment is very time-intensive, and we estimate that it takes approximately 5-10 days to run (we did not record the exact duration of the experiment since timing information was not relevant to the RQ).
This experiment takes as input a fuzzing corpus (the inputs saved by the fuzzer), and outputs a .forensics-1k.csv
file.
To run this experiment, run the command: php scripts/collectExtendedHintInfo.php fuzzOutputDir forensicsOutputDir
, following the same conventions from the above scripts for setting fuzzOutputDir
. There may be a considerable amount of output from this script. π Pre-bake available π The forensics files generated from our ICSE 22 experiment are in the prebake_icse_22_forensics
directory.
π Shorter run option π Running this experiment on the 10 minute experiment dataset takes just a few minutes. To run it, execute the command php scripts/collectExtendedHintInfo.php local_eval_output shorter_forensics_output
(use prebake_shorter_fuzz_output
along with prebake_shorter_forensics_output
if you didn't spend the 3 hours to generate the local_eval_output).
To support our continued maintenance of CONFETTI and to make easy for us and others to execute performance evaluations of CONFETTI, we have designed a GitHub Actions workflow that automatically executes the entire evaluation that is described in this artifact. Frankly, we do not know what role such an artifact should play in artifact evaluation: the continuous integration workflow certainly makes CONFETTI easier to reuse, but it also includes significant coupling to GitHub Actions and our HPC cluster, which might make it more difficult for future researchers to use should either of those two resources dissapear.
We hope that this aspect of our artifact will be most useful in the immediate future, and provide our VM to support long-term replicability.
For example, we found this workflow extremely useful for preparing the final pull request that we made to the JQF maintainers to resolve the performance issues that are discussed in section 5 (lines 1021-1026), as it was necessary to compare several design alternatives to find the best performing solution. You can find several such reports linked on that pull request, or view one of the most recent reports. This report includes a comparison of two branches of JQF (fast-collision-free-coverage
and reporting-ci
), where fast-collision-free-coverage
(d4bdc3
) includes our performance fixes, and reporting-ci
is the baseline version of JQF (modified only to be compatible with our CI infrastructure).
The results of this workflow can be found on our neu-se/CONFETTI GitHub repository. Our template workflow defines the steps to conduct the evaluation, which is parameterized over the number of trials to conduct, the duration of each campaign, and the list of branches to include as comparisons in the final report. The workflow consists of the following jobs:
build-matrix
- Creates a JSON build matrix that outputs all of the fuzzing tasks to run, one for each desired repetition of each fuzz targetrun-fuzzer
- For each trial defined bybuild-matrix
,run-fuzzer
will run the fuzzer and archive the results. If provided with a server address and access token, therun-fuzzer
task will also start a telegraf monitoring agent, which will stream statistics from the machine running the fuzzer to a central database. We found this monitoring to be extremely useful to, for example, monitor overall machine memory usage, and visualize the aggregate performance of each fuzzing run while they were underway.repro-jacoco
- Collect all of the results from each of the fuzzing runs, and reproduce the entire fuzzing corpus with JaCoCo instrumentation in order to collect final branch coverage resultsbuild-site
- Builds an HTML and an MD report using the jon-bell/fuzzing-build-site-action,
We are happy to execute this workflow on our infrastructure for researchers who make pull requests on CONFETTI, and we are also excited to work with maintainers of other tools (like rohanpadhye's JQF) to bring continuous evaluation workflows into the wider community and develop best practices for their design and maintenance.
CONFETTI can also be built and run outside of this artifact VM. The README in the CONFETTI git repo explains how. We have also archived this git repository directly in our FigShare artifact to ensure long-term availability.
Please feel free to open an issue on GitHub if you run into any issues with CONFETTI. For other matters, please direct your emails to Jonathan Bell.
Cite this artifact as: Kukucka, James; Ganchinho de Pina, Luis Gabriel; Ammann, Paul; Bell, Jonathan (2022): CONFETTI: Amplifying Concolic Guidance for Fuzzers. figshare. Software. https://doi.org/10.6084/m9.figshare.16563776.
Or, in BibTex:
@misc{confettiArtifact,
title={{CONFETTI}: Amplifying Concolic Guidance for Fuzzers},
url={https://figshare.com/articles/software/CONFETTI_Amplifying_Concolic_Guidance_for_Fuzzers/16563776},
DOI={10.6084/m9.figshare.16563776},
publisher={figshare},
author={Kukucka, James and Ganchinho de Pina, Luis Gabriel and Ammann, Paul and Bell, Jonathan},
year={2022},
month={Jan}
}
CONFETTI is released under the BSD 2-clause license.