Skip to content

Commit

Permalink
[DAPHNE-#755] Initial support for lists in DaphneDSL.
Browse files Browse the repository at this point in the history
- This PR adds basic support for lists of matrices of homogeneous physical data type and value type.
- Supported operations:
  - createList(), length(), append(), remove(), print()
  - append() and remove() do not modify the given list, but return the resulting list (consistent with the bahavior of matrix/frame operations)
- Concrete changes:
  - DaphneDSL
    - Four new built-in functions: createList(), length(), append(), remove().
    - Updated the DaphneDSL language reference and the list of built-in functions.
  - DaphneIR/DAPHNE compiler
    - A new custom MLIR type: List.
    - Four new DaphneIR operations: CreateListOp, LengthOp, AppendOp, RemoveOp.
    - Type inference for CreateListOp, AppendOp, and RemoveOp.
    - Consideration of list types in the kernel extension catalog.
    - Lowering of the four new ops to kernel calls.
    - Lowering of the List type in LowerToLLVMPass in the same way as matrices/frames.
  - DAPHNE runtime
    - Four new kernels: createList, length, append, remove.
    - Registration of the new kernels in kernels.json.
  - Garbage collection of list items
    - ManageObjRefsPass treats lists like matrices/frames.
    - Besides that:
      - The reference counter of a data object is increased when it is inserted into a list to ensure that the object is not freed as long as the list exists.
      - When a list is destroyed, the reference counter of all its elements is decreased by 1.
      - When an element is removed from a list, its reference counter is *not* decreased, because we return the removed element (it lives on).
  - Added script-level test cases for the usage of lists.
- Current limitations:
  - No support for heterogeneous data/value type (e.g., one cannot store DenseMatrix and CSRMatrix or DenseMatrix<double> and DenseMatrix<float> in the same list).
  - Not possible to create an empty list in DaphneDSL, because that would complicate type inference.
  - Lists are not supported in DaphneLib yet.
  - No get/set on list elements yet, only append and remove.
  - The append/remove kernels copy the input list instead of modifying it in place. While this is consistent with the behavior of matrix/frame ops, it is inefficient. However, compared to workarounds to lists, this is still much faster; Furthermore, we can use the same update-in-place mechanisms as for matrices/frames in the future, when the input list is not used anymore afterwards (see PR #609).
  - Information about interesting data properties gets lost by inserting and removing a matrix into/from a list.
  - Subclassing Structure may not be optimal; it might be better to subclass a new superclass of Structure, but I wanted to keep the refactoring overhead low for now.
- Effect on decision trees DaphneDSL script
  - The decision trees script uses queues to keep certain matrices around.
  - So far, we emulated these queues through matrix concatenation (cbind/rbind), which turned out to be very inefficient.
  - The script uses lists for the queues now, which leads to significant performance improvements (~15x for a concrete use-case pipeline on my machine).
- Closes #755.
  • Loading branch information
pdamme authored and philipportner committed Aug 9, 2024
1 parent bed9bcd commit 151e5a1
Show file tree
Hide file tree
Showing 35 changed files with 897 additions and 55 deletions.
22 changes: 22 additions & 0 deletions doc/DaphneDSL/Builtins.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ DaphneDSL's built-in functions can be categorized as follows:
- Input/output
- Data preprocessing
- Measurements
- List operations

## Data generation

Expand Down Expand Up @@ -652,3 +653,24 @@ These must be provided in a separate [`.meta`-file](/doc/FileMetaDataFormat.md).
- **`now`**`()`
Returns the current time since the epoch in nano seconds.
## List operations
- **`createList`**`(elm:matrix, ...)`
Creates and returns a new list from the given elements `elm`.
At least one element must be specified.
- **`length`**`(lst:list)`
Returns the number of elements in the given list `lst`.
- **`append`**`(lst:list, elm:matrix)`
Appends the given matrix `elm` to the given list `lst`.
Returns the result as a new list (the argument list stays unchanged).
- **`remove`**`(lst:list, idx:size)`
Removes the element at position `idx` (counting starts at zero) from the given list `lst`.
Returns (1) the result as a new list (the argument list stays unchanged), and (2) the removed element.
3 changes: 2 additions & 1 deletion doc/DaphneDSL/LanguageRef.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Variables are used to refer to values.

**Valid identifiers** start with a letter (`a-z`, `A-Z`) or an underscore (`_`) that can be followed by any number of letters (`a-z`, `A-Z`), underscores (`_`), and decimal digits (`0-9`).

The following reserved keywords must not be used as identifiers: `if`, `else`, `while`, `do`, `for`, `in`, `true`, `false`, `as`, `def`, `return`, `import`, `matrix`, `frame`, `scalar`, `f64`, `f32`, `si64`, `si8`, `ui64`, `ui32`, `ui8`, `str`, `nan`, and `inf`.
The following reserved keywords must not be used as identifiers: `if`, `else`, `while`, `do`, `for`, `in`, `true`, `false`, `as`, `def`, `return`, `import`, `matrix`, `frame`, `scalar`, `list`, `f64`, `f32`, `si64`, `si8`, `ui64`, `ui32`, `ui8`, `str`, `nan`, and `inf`.

*Examples:*

Expand All @@ -69,6 +69,7 @@ Currently, DaphneDSL supports the following *abstract* **data types**:
- `matrix`: homogeneous value type for all cells
- `frame`: a table with columns of potentially different value types
- `scalar`: a single value
- `list`: an ordered sequence of elements of homogeneous data/value type; currently, only matrices can be elements of lists

**Value types** specify the representation of individual values. We currently support:

Expand Down
63 changes: 19 additions & 44 deletions scripts/algorithms/decisionTree_.daph
Original file line number Diff line number Diff line change
Expand Up @@ -297,41 +297,20 @@ def decisionTree(X:matrix<f64>, y:matrix<f64>, ctypes:matrix<f64>,
# we (a) emulate lists by matrices to which we add and remove rows/columns (the "lists" part),
# and (b) split the data structure into its four components (the "of lists" part).
#queue = list(list(1,I,X2,y2)); # node IDs / data indicators
queue_nID = as.matrix(1); # "list" of scalars (one scalar per row)
queue_sizeni = as.matrix<si64>(ncol(I)); # "list" of scalars indicating the #cols of each element in queue_nI (one scalar per row)
queue_nI = I; # "list" of row-vectors with different #cols (all row-vectors in the list cbinded)
queue_sizeX2y2 = as.matrix<si64>(nrow(X2)); # "list" of scalars indicating the #rows of each element in queue_X2 and queue_y2 (one scalar per row)
queue_X2 = X2; # "list" of matrices with different #rows (all matrices in the list rbinded)
queue_y2 = y2; # "list" of matrices with different #rows (all matrices in the list rbinded)
queue_nID = createList([1]);
queue_nI = createList(I);
queue_X2 = createList(X2);
queue_y2 = createList(y2);
# TODO .0 should not be necessary.
maxPath = 1.0;
while( nrow(queue_nID) > 0 ) {
while( length(queue_nID) > 0 ) {
# pop next node from queue for splitting
nID = as.scalar(queue_nID[0, ]);
sizeni = as.scalar(queue_sizeni[0, ]);
nI = queue_nI[, :sizeni];
sizeX2y2 = as.scalar(queue_sizeX2y2[0, ]);
X2 = queue_X2[:sizeX2y2, ];
y2 = queue_y2[:sizeX2y2, ];
# TODO Instead of this if-then-else, it should be valid to slice rows 1:1 on 1 row, result should be 0 rows
if(nrow(queue_nID) > 1) {
queue_nID = queue_nID[1:, ];
queue_sizeni = queue_sizeni[1: ,];
queue_nI = queue_nI[, sizeni:];
queue_sizeX2y2 = queue_sizeX2y2[1:, ];
queue_X2 = queue_X2[sizeX2y2:, ];
queue_y2 = queue_y2[sizeX2y2:, ];
}
else {
# Create 0-row matrices for "lists" along the row axis
# and 0-col matrices for "lists" along the column axis.
queue_nID = fill(1, 0, 1);
queue_sizeni = fill(1, 0, 1);
queue_nI = fill(1.0, 1, 0);
queue_sizeX2y2 = fill(1, 0, 1);
queue_X2 = fill(1.0, 0, ncol(X2));
queue_y2 = fill(1.0, 0, ncol(y2));
}
queue_nID, nIDmat = remove(queue_nID, 0);
# TODO <si64> should not be necessary here.
nID = as.scalar<si64>(nIDmat);
queue_nI, nI = remove(queue_nI, 0);
queue_X2, X2 = remove(queue_X2, 0);
queue_y2, y2 = remove(queue_y2, 0);
if(verbose)
print("decisionTree: attempting split of node "+nID+" ("+sum(nI)+" rows)");

Expand Down Expand Up @@ -365,25 +344,21 @@ def decisionTree(X:matrix<f64>, y:matrix<f64>, ctypes:matrix<f64>,
# split data, finalize or recurse
if( validSplit ) {
if( sum(Ileft) >= min_split && floor(log(IDleft,2))+2 < max_depth ) {
queue_nID = rbind(queue_nID, as.matrix(IDleft));
queue_sizeni = rbind(queue_sizeni, as.matrix<si64>(ncol(Ileft)));
queue_nI = cbind(queue_nI, Ileft);
queue_sizeX2y2 = rbind(queue_sizeX2y2, as.matrix<si64>(nrow(X2)));
queue_X2 = rbind(queue_X2, X2);
queue_y2 = rbind(queue_y2, y2);
queue_nID = append(queue_nID, as.matrix(IDleft));
queue_nI = append(queue_nI, Ileft);
queue_X2 = append(queue_X2, X2);
queue_y2 = append(queue_y2, y2);
}
else {
# TODO as.bool() should not be necessary, should be casted automatically (see #661).
# TODO as.matrix() should not be necessary.
M[,2*IDleft - 1] = as.matrix(computeLeafLabel(y2, Ileft, as.bool(classify), verbose));
}
if( sum(Iright) >= min_split && floor(log(IDright,2))+2 < max_depth ) {
queue_nID = rbind(queue_nID, as.matrix(IDright));
queue_sizeni = rbind(queue_sizeni, as.matrix<si64>(ncol(Iright)));
queue_nI = cbind(queue_nI, Iright);
queue_sizeX2y2 = rbind(queue_sizeX2y2, as.matrix<si64>(nrow(X2)));
queue_X2 = rbind(queue_X2, X2);
queue_y2 = rbind(queue_y2, y2);
queue_nID = append(queue_nID, as.matrix(IDright));
queue_nI = append(queue_nI, Iright);
queue_X2 = append(queue_X2, X2);
queue_y2 = append(queue_y2, y2);
}
else {
# TODO as.bool() should not be necessary, should be casted automatically (see #661).
Expand Down
5 changes: 5 additions & 0 deletions src/compiler/lowering/LowerToLLVMPass.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -944,6 +944,11 @@ void DaphneLowerToLLVMPass::runOnOperation()
return LLVM::LLVMPointerType::get(
IntegerType::get(t.getContext(), 1));
});
typeConverter.addConversion([&](daphne::ListType t)
{
return LLVM::LLVMPointerType::get(
IntegerType::get(t.getContext(), 1));
});
typeConverter.addConversion([&](daphne::StringType t)
{
return LLVM::LLVMPointerType::get(
Expand Down
11 changes: 6 additions & 5 deletions src/compiler/lowering/ManageObjRefsPass.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -108,9 +108,9 @@ void processValue(OpBuilder builder, Value v) {
builder.setInsertionPointAfter(defOp);
builder.create<daphne::IncRefOp>(v.getLoc(), v);
}


if(!llvm::isa<daphne::MatrixType, daphne::FrameType, daphne::StringType>(v.getType()))
if (!llvm::isa<daphne::MatrixType, daphne::FrameType, daphne::ListType,
daphne::StringType>(v.getType()))
return;

Operation* decRefAfterOp = nullptr;
Expand Down Expand Up @@ -176,7 +176,7 @@ void processValue(OpBuilder builder, Value v) {

/**
* @brief Inserts an `IncRefOp` for the given value if its type is a DAPHNE
* data type (matrix, frame, string).
* data type (matrix, frame, list, string).
*
* If the type is unknown, throw an exception.
*
Expand All @@ -185,7 +185,7 @@ void processValue(OpBuilder builder, Value v) {
*/
void incRefIfObj(Value v, OpBuilder & b) {
Type t = v.getType();
if(llvm::isa<daphne::MatrixType, daphne::FrameType, daphne::StringType>(t))
if(llvm::isa<daphne::MatrixType, daphne::FrameType, daphne::ListType, daphne::StringType>(t))
b.create<daphne::IncRefOp>(v.getLoc(), v);
else if(llvm::isa<daphne::UnknownType>(t))
throw ErrorHandler::compilerError(
Expand All @@ -196,7 +196,8 @@ void incRefIfObj(Value v, OpBuilder & b) {

/**
* @brief Inserts an `IncRefOp` for each operand of the given operation whose
* type is a DAPHNE data type (matrix, frame, string), right before the operation.
* type is a DAPHNE data type (matrix, frame, list, string), right before the
* operation.
*
* @param op
* @param b
Expand Down
15 changes: 13 additions & 2 deletions src/compiler/lowering/RewriteToCallKernelOpPass.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ namespace
return 3;
if(llvm::isa<daphne::CreateFrameOp, daphne::SetColLabelsOp>(op))
return 2;
if(llvm::isa<daphne::DistributedComputeOp>(op))
if(llvm::isa<daphne::DistributedComputeOp, daphne::CreateListOp>(op))
return 1;

throw ErrorHandler::compilerError(
Expand All @@ -85,6 +85,15 @@ namespace
isVariadic[index]
);
}
if(auto concreteOp = llvm::dyn_cast<daphne::CreateListOp>(op)) {
auto idxAndLen = concreteOp.getODSOperandIndexAndLength(index);
static bool isVariadic[] = {true};
return std::make_tuple(
idxAndLen.first,
idxAndLen.second,
isVariadic[index]
);
}
if(auto concreteOp = llvm::dyn_cast<daphne::SetColLabelsOp>(op)) {
auto idxAndLen = concreteOp.getODSOperandIndexAndLength(index);
static bool isVariadic[] = {false, true};
Expand Down Expand Up @@ -148,12 +157,14 @@ namespace

mlir::Type adaptType(mlir::Type t, bool generalizeToStructure) const {
MLIRContext * mctx = t.getContext();
if(generalizeToStructure && t.isa<mlir::daphne::MatrixType, mlir::daphne::FrameType>())
if(generalizeToStructure && t.isa<mlir::daphne::MatrixType, mlir::daphne::FrameType, mlir::daphne::ListType>())
return mlir::daphne::StructureType::get(mctx);
if(auto mt = t.dyn_cast<mlir::daphne::MatrixType>())
return mt.withSameElementTypeAndRepr();
if(t.isa<mlir::daphne::FrameType>())
return mlir::daphne::FrameType::get(mctx, {mlir::daphne::UnknownType::get(mctx)});
if(auto lt = t.dyn_cast<mlir::daphne::ListType>())
return mlir::daphne::ListType::get(mctx, adaptType(lt.getElementType(), generalizeToStructure));
if(auto mrt = t.dyn_cast<mlir::MemRefType>())
// Remove any dimension information ({0, 0}), but retain the element type.
return mlir::MemRefType::get({0, 0}, mrt.getElementType());
Expand Down
8 changes: 8 additions & 0 deletions src/compiler/utils/CompilerUtils.h
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,14 @@ struct CompilerUtils {
return "Structure";
else
return "Frame";
else if(auto lstTy = t.dyn_cast<mlir::daphne::ListType>()) {
if(generalizeToStructure)
return "Structure";
else {
const std::string dtName = mlirTypeToCppTypeName(lstTy.getElementType(), angleBrackets, false);
return angleBrackets ? ("List<" + dtName + ">") : ("List_" + dtName);
}
}
else if(llvm::isa<mlir::daphne::StringType>(t))
// This becomes "const char *" (which makes perfect sense for
// strings) when inserted into the typical "const DT *" template of
Expand Down
3 changes: 3 additions & 0 deletions src/ir/daphneir/DaphneDialect.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -299,6 +299,9 @@ void mlir::daphne::DaphneDialect::printType(mlir::Type type,
os << '?';
os << '>';
}
else if (auto t = type.dyn_cast<mlir::daphne::ListType>()) {
os << "List<" << t.getElementType() << '>';
}
else if (auto handle = type.dyn_cast<mlir::daphne::HandleType>()) {
os << "Handle<" << handle.getDataType() << ">";
}
Expand Down
58 changes: 58 additions & 0 deletions src/ir/daphneir/DaphneInferTypesOpInterface.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -604,6 +604,64 @@ std::vector<Type> daphne::MaxPoolForwardOp::inferTypes() {
return {restype2, builder.getIndexType(), builder.getIndexType()};
}

std::vector<Type> daphne::CreateListOp::inferTypes() {
ValueRange elems = getElems();
const size_t numElems = elems.size();

if(numElems == 0)
throw ErrorHandler::compilerError(
getLoc(),
"InferTypesOpInterface",
"type inference for CreateListOp requires at least one argument"
);

// All elements must be matrices of the same value type.
// If the type of some element is (still) unknown or if the data type
// of some element is matrix, but the value type is (still) unknown,
// then we ignore this element for now.
Type etRes = nullptr;
for(size_t i = 0; i < numElems; i++) {
Type etCur = elems[i].getType();
if(etCur.isa<daphne::UnknownType>())
continue;
if(auto mtCur = etCur.dyn_cast<daphne::MatrixType>()) {
Type vtCur = mtCur.getElementType();
if(vtCur.isa<daphne::UnknownType>())
continue;
else if(!etRes)
etRes = mtCur.withSameElementType();
else if(etRes != mtCur.withSameElementType())
throw ErrorHandler::compilerError(
getLoc(),
"InferTypesOpInterface",
"all arguments to CreateListOp must be matrices of the same value type"
);
}
else
throw ErrorHandler::compilerError(
getLoc(),
"InferTypesOpInterface",
"the arguments of CreateListOp must be matrices"
);
}

return {daphne::ListType::get(getContext(), etRes)};
}

std::vector<Type> daphne::RemoveOp::inferTypes() {
// The type of the first result is the same as that of the argument list.
// The type of the second result is the element type of the argument list.
Type argListTy = getArgList().getType();
if(auto lt = argListTy.dyn_cast<daphne::ListType>())
return {lt, lt.getElementType()};
else
throw ErrorHandler::compilerError(
getLoc(),
"InferTypesOpInterface",
"RemoveOp expects a list as its first argument"
);
}

// ****************************************************************************
// Type inference function
// ****************************************************************************
Expand Down
46 changes: 43 additions & 3 deletions src/ir/daphneir/DaphneOps.td
Original file line number Diff line number Diff line change
Expand Up @@ -1352,7 +1352,7 @@ def Daphne_PrintOp : Daphne_Op<"print"> {
// TODO We might change it to only accept scalars here and enforce toString
// for matrices and frames. But currently, we need it like that for the
// rest of the program.
let arguments = (ins AnyTypeOf<[AnyScalar, MatrixOrFrame, AnyMemRef, Unknown]>:$arg, BoolScalar:$newline, BoolScalar:$err);
let arguments = (ins AnyTypeOf<[AnyScalar, MatrixOrFrame, List, AnyMemRef, Unknown]>:$arg, BoolScalar:$newline, BoolScalar:$err);
let results = (outs); // no results
}

Expand Down Expand Up @@ -1579,14 +1579,14 @@ def Daphne_StoreVariadicPackOp : Daphne_Op<"storeVariadicPack"> {
def Daphne_IncRefOp : Daphne_Op<"incRef"> {
let summary = "Increases the reference counter of the underlying runtime data object.";

let arguments = (ins MatrixOrFrameOrString:$arg);
let arguments = (ins AnyTypeOf<[MatrixOrFrameOrString, List]>:$arg);
let results = (outs); // no results
}

def Daphne_DecRefOp : Daphne_Op<"decRef"> {
let summary = "Decreases the reference counter of the underlying runtime data object and frees it if the reference counter becomes zero.";

let arguments = (ins MatrixOrFrameOrString:$arg);
let arguments = (ins AnyTypeOf<[MatrixOrFrameOrString, List]>:$arg);
let results = (outs); // no results
}

Expand All @@ -1608,6 +1608,46 @@ def Daphne_StopProfilingOp : Daphne_Op<"stopProfiling"> {
let results = (outs); // no results
}

// ****************************************************************************
// List operations
// ****************************************************************************

def Daphne_CreateListOp : Daphne_Op<"createList", [
DeclareOpInterfaceMethods<InferTypesOpInterface>
]> {
let summary = "Creates a new list from the given elements";

let arguments = (ins Variadic<MatrixOrU>:$elems);
let results = (outs ListOrU:$res);
}

def Daphne_LengthOp : Daphne_Op<"length", [
DataTypeSca, ValueTypeSize
]> {
let summary = "Returns the number of elements in the given list";

let arguments = (ins ListOrU:$arg);
let results = (outs Size:$res);
}

def Daphne_AppendOp : Daphne_Op<"append", [
TypeFromFirstArg
]> {
let summary = "Appends the given element to the end of the given list";

let arguments = (ins ListOrU:$argList, MatrixOrU:$elem);
let results = (outs ListOrU:$resList);
}

def Daphne_RemoveOp : Daphne_Op<"remove", [
DeclareOpInterfaceMethods<InferTypesOpInterface>
]> {
let summary = "Removes and returns the element at the specified index from the given list";

let arguments = (ins ListOrU:$argList, Size:$idx);
let results = (outs ListOrU:$resList, MatrixOrU:$elem);
}

// ****************************************************************************
// Old operations
// ****************************************************************************
Expand Down
Loading

0 comments on commit 151e5a1

Please sign in to comment.