Skip to content
George Kastrinis edited this page Oct 12, 2020 · 29 revisions

Datalog Specification

Relations are the building blocks of a Datalog program. A relation is semantically similar to a database table. It has a unique name and represents a collection of entries conforming to the same structure. Entries have parameters, each with a specific type---either primitive or user-specified. Datalog programs comprise of a collection of rules. A rule joins various relations and infers new entries for other relations. The Datalog engine is responsible for evaluating rules until a fixpoint is reached---that is until no new entries can be inferred from the evaluation of rules.

Relations

The following relation represents a graph edge. It has three parameters. The first two parameters represent the two vertices being connected (with type string) while the last one represents the given weight for that edge (with type int). x, y and weight are variables that are bound to specific values from this relation. A relation has the semantics of a set (of entries).

Edge(x, y, weight)

Functional Relations

A different syntax is used for functional relations. Such relations have two parts. A list of key parameters and a single value parameter.

AnimalPopulation[species, region] = population

A functional relation has the semantics of a map. The combination of values in key positions uniquely identify the value in the value position. This implicit functional constraint will not allow the following two entries to co-exist at the same time.

AnimalPopulation["Dragon", "Avalon"] = 1
AnimalPopulation["Dragon", "Avalon"] = 7

Types

The following primitive types are supported:

  • string
  • int (decimal, octal, or hexadecimal values)
  • float (decimal or exponential values)
  • boolean (true, false)

A user could also define custom types. Types definitions use the annotation @type (case insensitive) and may declare a (non-primitive) supertype as well.

@type Animal
@Type Dog : Animal

Declarations

A declaration must include valid types for each parameter of the declared relation.

Edge(x, y, weight) : string, string, int
PetName(p, n) : Animal, string

It is not necessary to declare every single relation. paNda will infer types for undeclared relations based on their usage in rules.

Rules

A rule has two parts---a head and a body. Relations in the body act as filters combined either conjunctively (using ,) or disjunctively (using ;). Relations in the body can appear negated using !. When the body holds then the head is inferred to be true as well. Rules may directly or indirectly use recursion. That is, a relation used in the body may also appear in the head.

Path(x, y, weight) <- Edge(x, y, weight)
Path(x, y, weight) <- Edge(x, z, w1), Path(z, y, w2), weight = w1 + w2

All (*see constructors below) variables appearing in the head of a rule must be positively bound in the body. The special variable name _ can be used for variables of no importance.

Connection(x, y) <- Path(x, y, _)
IndirectConnection(x, y) <- Path(x, y, _), !Edge(x, y, _)

Non-primitive types can also appear in a rule's body (but not in the head) and are treated as normal relations.

OwnsCat(x, c) <- OwnsPet(x, c), Cat(c)

Asserting Facts

The simplest (but not the only) way to assert input facts is through trivial rules. Those are rules with trivial bodies. The following rules use valid syntax:

Edge("v1", "v2", 10)
Edge(x, y, w) <- x = "v1", y = "v4", w = 2

Aggregations

Aggregate predicates can be used to infer data for a collection of entries in relations. Currently, the supported predicates are count, min, max and sum.

EdgesFrom(x, c) <- c = count() { Edge(x, _, _) }
MaxWeight(w) <- w = max(weight) { Edge(x, _, weight), MultipleOut(x) }
TotalWeightFrom(x, s) <- s = sum(weight) { Edge(x, _, weight) }

Negation/Aggregation & Recursion

Negated relations or aggregations are not allowed to be part of a recursive cycle. The following are not valid rules.

P(x) <- Foo(x), !Bar(x)
Q(x) <- P(x), x > 10
Bar(x) <- Foo(x), Q(x)

or

Bar(x) <- x = count() { P(_) }

Constructors

Creating values of a user-specified type is done using constructors. Constructors are special functional relations declared using the @constructor annotation.

@Constructor
boughtAPet[owner, name] = pet : string, string, Animal

On a rule's head constructors have special semantics and syntax. A constructor is used to create a unique value based on the values of the key positions. For every different (owner, name) pair a unique Animal typed value is created and is bound to variable pet. Thus, variables appearing in the value position of constructors in a rule's head need not be bound in the rule's body. The type constructed in the rule may be any valid subtype of the declared type.

boughtAPet[owner, name] = pet new Animal,
hasAnimal(owner, pet) <- owner = "Mary", name = "Max"

boughtAPet[owner, name] = pet new Cat,
hasCat(owner, pet) <- owner = "John", name = "Maxinne"

When appearing in a rule's body normal relation semantics hold.

HasTwoPets(owner) <-
    boughtAPet[owner, _] = pet1,
    boughtAPet[owner, _] = pet2, pet1 != pet2.

Multiple constructors can be declared for the same type.

@constructor
gaveBirth[parent, order] = child : Animal, int, Animal

gaveBirth[maxinne, 1] = tom new Cat <- hasAnimal("John", maxinne)

All declared types that are the roots of the hierarchy also provide a default constructor <Type_Name>:byStr

Animal:byStr["Clio"] = clio new Cat

Input / Output

Input and output relations are supported through appropriate annotations.

@type Animal
@type @input Cat : Animal

@input catAge(c, age) : Cat, int

@output HasTwoPets

The above will generate code that expects by default to find two input files, Cat.facts and catAge.facts, when running the program. The @input annotation supports arguments to change the expected filename as well as the delimiter being used.

@input(filename="catColor.csv", delimiter="@")
catColor(c, color) : Cat, string

Similarly, the @output annotation will generate an output file HasTwoPets.csv. Currently, there is not support for changing the default filename.

String Concatenation

The operator + is overloaded to support string concatenation between strings and/or numbers. The following snippet

myStr(x) <- x = "[" + "abc" + ", " + a + b + "]", nums(a), nums(b)

will generate the following (souffle) code

myStr(cat(cat(cat(cat(cat("[", "abc"), ", "), to_string(a)), to_string(b)), "]")) :- nums(a), nums(b).

String to Ordinal Numbers

Each string is internally associated to an ordinal number. It is possible to get a hold to that number representation through the unary operator #. This is not equivalent to a lexicographic ordering. For semantics purposes, each ordinal number is arbitrarily, but uniquely, assign to each string value. The following code snippet

strID(x) <- myStr(y), x = #y
strID(x) <- x = #"abcd"
strID(#y) <- myStr2(x, y), #x < #y
strID(x) <- x = # ("abc" + "def")
strID(x) <- x = #(a + b), myStr(a), nums(b)

will generate the following (souffle) code

strID(ord(y)) :- myStr(y).
strID(ord("abcd")) :- 1=1.
strID(ord(y)) :- myStr2(x, y), ord(x) < ord(y).
strID(ord((cat("abc", "def")))) :- 1=1.
strID(ord((cat(a, to_string(b))))) :- myStr(a), nums(b).

Souffle

Functional Relations

In general, functional relations impose a constraint on the variable in the value position. However, this constraint cannot be enforced when the target language is Souffle. In that setup, functional relations simply act as normal ones.

Plan Annotation

Rule plans are supported through the (currently primitive) appropriate annotation.

@Plan(val = "1:(1,2), 2:(1,2)")
Planned(x) <- Foo(_, x), Bar(x)