Skip to content

Latest commit

 

History

History
37 lines (31 loc) · 2.09 KB

README.md

File metadata and controls

37 lines (31 loc) · 2.09 KB

artemisia

Novel test for randomness based on digraphs.

Imagine a directed graph with 256 nodes and 256 edges, where a single arc points from each node to another node based on the value of a byte. If the graph is determined by 256 random bytes it might looks like this, like plants from the artemisia family.

2^8 node digraph

Many of the nodes have no incoming arcs (leaves), we say indegree=0. Many more have indegree=1, fewer with indegree=2 & 3, and there are three examples with indegree=4. Unlike plants, the digraph has cycles, including one with length=1. We can characterize the graph by the distribution of indegree values for all its nodes, and posit that graphs determined by randomness will have similar distributions. The principle extends well for graphs with billions of nodes, and serves to test large quantities of data.

With that motivation, here is the help for utility artemisia

Usage: artemisia8, artemisia16, artemisia24, artemisia32, artemisia <n>

Tests random data on stdin by constructing a directed graph with 2^n nodes
and 2^n edges. Each node has outdegree=1 and its target is determined by the
data. For random data, the nodes are observed to have indegree following a
distribution with
    P(indegree = d) = 1 / (e d!)
for 2^n large. Indegree is efficiently counted up to 3 for each node, expecting
    Pe0  = 1 / e        ~ 0.3679
    Pe1  = 1 / e        ~ 0.3679
    Pe2  = 1 / 2e       ~ 0.1839
    Pe3+ = 1 - (5 / 2e) ~ 0.0803
Four different bit lengths n are supported
    n  2^n            data required
    =  ===            =============
    8  256            256 B
    16 65,536         128 KB
    24 16,777,216     48 MB
    32 4,294,967,296  16 GB
Indegree counting is capped at 3 so that only two bits are used for each node,
allowing the 32-bit case to use just 1GB of memory. Once the required data is
read, the count for each expected value is totalled, and the totals compared
with the expected values using Pearsons's Chi Squared test. The p-value is
reported, and a test pass or fail is indicated at the 1% confidence level.