Skip to content

Mission Statement

Wayne Moses Burke edited this page Feb 1, 2018 · 1 revision

What does DRAT stand for?

Distributed Release Audit Tool - based on the shoulders of Apache Creadur's Release Audit Tool (RAT) this project tries to scale out license checks on a large scale.

What does DRAT want?

The Distributed Release Audit Tool (DRAT) improves over the Apache RAT code audit tool in several ways. RAT is a command line tool and Java API and Maven plugin that audits a code base and its declared OSS licenses - if you say it's Apache2, RAT will check whether or not your source is Apache2 and produce a report that states what files are/aren't and why. RAT has several problems, namely:

  • It doesn't scale to large code bases - running it on a 25k file and 10M LOC code base ran for ~4 weeks on a normal Linux server with 5GB memory and tons of hard disk and modern CPUs.
  • RAT's crawler is rudimentary and you have to use explicit white/black lists on what files to avoid or else it will be checking binary files for licenses.
  • RAT doesn't produce incremental output. It either completes and generates a log, or it doesn't.

DRAT improves upon RAT in several ways namely by addressing all of the above concerns. DRAT is a Map Reduce version of RAT using Apache Tika to automatically sort and classify the code base files; Apache OODT to index metadata and Tika information about those code files into Apache Solr; and OODT to produce a Map Reduce workflow that runs RAT incrementally on k-sized chunks of same-MIME-typed files (detected by Tika) and then producing incremental, per type logs, and then aggregating and reducing them into a combined log at the end.

What's the status of the project?

As of September 2017 the project was granted top-level status after being developed for a while on Github.

Clone this wiki locally