Basic extension catalog for kernels, plus demo on early extensibility. #456

pdamme · 2022-11-01T21:28:06Z

Before I start implementing, I'd like to share an overview of my plans with everyone interested. Feel free to comment.

Goals

introduce initial version of basic kernel extension catalog, while not changing too many other things, for now (i.e., refactoring to enable extensibility afterwards)
assumption: types of all intermediates are known at compile-time (the current implementation assumes this too; we will relax that later)
functionality first, performance second, for now

Planned steps

introduce new class for the kernel extension catalog with very basic preliminary interface for adding new kernels and querying the registered kernels
- representation of registered kernels
  - key: (name of a) DaphneIR op (since we always want a kernel for a specific op)
  - value: complex object
    - list of input types
    - list of output types
    - processing backend (or "API", following the terminology in kernels.json) (CPP, CUDA, FPGAOPENCL, ...)
- interface
  - method to add a new kernel with its information
  - method to retrieve the information for a DaphneIR op
  - will be extended later
populate kernel extension catalog at system start-up based on the information in kernels.json
- parse that JSON file and enter information in the kernel registry
- no means for adding kernels exposed to users yet, just using the kernels whose pre-compilation is currently hard-coded
- requires kernels.json in release artifact (maybe move that file)
- anyway, using that file is just a temporary solution (see follow-up issues)
make DAPHNE compiler use the kernel extension catalog for lowering DaphneIR domain-specific ops to kernel calls
- recap of current design
  - MarkCUDAOpsPass and MarkFPGAOPENCLOpsPass mark ops to be executed on the respective processing backend if certain conditions are met
  - AdaptTypesToKernelsPass harmonizes input and output types for certain ops, assuming that kernels are usually available for homogeneous type combinations
  - RewriteToCallKernelOpPass rewrites domain-specific DaphneIR ops to CallKernelOp; the name of the kernel function is created by naming convention taking into account the op name, input and output types, and hints on the processing backend; but does not know if that kernel really exists
  - LowerToLLVMPass lowers VectorizedPipelineOp and DistributedPipelineOp to CallKernelOp
- planned design
  - MarkCUDAOpsPass and MarkFPGAOPENCLOpsPass remain as they are
  - AdaptTypesToKernelsPass remains as it is, but might not be used, for now (but could be useful again later)
  - SelectKernelsPass (new): queries the new kernel extension catalog by (op name, processing backend) and (1) decides which kernel function to call and adds a hint (attribute) to the op, and (2) inserts casts of inputs and outputs where necessary
    - raises an error if no kernel is registered for this operation
    - if a kernel for the required type combination is available, use that
    - otherwise, use any alternative kernel with the lowest cost; whereby the cost is (a) infinity if the kernel cannot be applied without loss of precision, and (b) the number of required casts to insert otherwise
    - this pass is separate from RewriteToCallKernelOpsPass, since the casts it introduces could offer potential for further optimization
  - RewriteToCallKernelOpPass will merely rewrite ops to CallKernelCallOp, based on decisions that have been made before, reflected by attributes of the op
  - LowerToLLVMPass stays as it is (those two ops shall have very generic kernels anyway)

Possible follow-up issues (not to be addressed by this PR)

steps towards extensibility features
- make kernel extension catalog accessible to user
- allow plugging in additional shared libs with more kernels
- multiple dispatch and support for kernels on super classes of DAPHNE data types
- add performance profiles (of whatever form) to the kernel registry for cost-based kernel/device selection
detailed improvements
- more sophisticated selection of alternative kernels by better modeling of the costs (e.g., take data sizes into account (logical, physical), try not to deflate data more than necessary, rather deflate small than large data objects, …)
- canonicalization for CastOp (remove cast from x to x; simplify cast from x to y to z to cast from x to z, where possible)

Closes #455.

pdamme · 2022-11-01T21:32:40Z

@EricMier: I think this could be interesting to you. Once we have the initial extension catalog for kernels, it could become the home of any performance profiles for kernel/device selection. The kernel extension catalog could be accessible throughout the DAPHNE compiler, such that any pass could query its information, e.g., a separate pass selecting a suitable device (as an alternative/improvement to MarkCUDAOpsPass etc.) Feel free to let me know your thoughts and if this sounds in-line with your goals.

pdamme · 2022-11-01T21:34:10Z

@corepointer: JFYI, since you've also worked on the lowering from DaphneIR ops to kernel calls, especially MarkCUDAOpsPass and MarkFPGAOPENCLOpsPass.

pdamme · 2024-04-22T18:21:25Z

This PR is ready to be merged now, from my point of view. It contains the basic extension catalog for kernels (see above, with a few minor deviations from the initial ideas) plus a few more commits that show its end-to-end use.

Of course, feedback is welcome, but I'm not asking for a code review. However, as this PR changes a few important things, I would like to give everyone some time to have a look before I merge it. Unless anyone speaks up, I will merge this PR next Monday (Apr 29). The commits in this PR are meant to be "rebased & merged", not ~~"squashed & merged"~~.

These are the most important changes (see the commit messages for more details):

Introduction of an initial extension catalog for kernels (a73dfbe)
The DAPHNE compiler finally knows which kernels exist in pre-compiled form, since the available kernels are registered in a data structure, the kernel extension catalog. This catalog is populated from a JSON file at DAPHNE start-up, the JSON file is automatically generated along with the kernel instantiations and resides in lib/ (such that it should already be covered by our scripts for deployment and release packaging). When generating calls to pre-compiled kernels, the DAPHNE compiler looks up the kernel function names in the catalog, instead of using a naming convention.
Automatic determination of the required kernel libs (894f563)
The DAPHNE compiler additionally knows in which shared lib each kernel is. Based on that, the DAPHNE compiler knows which shared libs we really need to link with for the DaphneDSL script at hand. The information on the shared libs is part of the kernel catalog JSON files. Thus, the way the paths to these libs were managed before could be simplified.
Initial support for kernel hints in DaphneDSL, a test case of a simple kernel extension, and initial docs on kernel extensions (9eb00f1, 62d8dc6, 131ed82)
These commits mainly add new material, so it's less likely they conflict with anyone else's interests. I decided to add them to this PR, since they showcase the end-to-end use and added value of the extension catalog. It is essentially a cleaned-up and extended version of the demo on extensibility I've shown at the DAPHNE General Assembly meeting in December 2023.

This PR is just the first step towards extensibility in DAPHNE and there are many more aspects and details we will work on in the future.

philipportner · 2024-04-22T18:46:20Z

I'm exited to play around with this in the next couple days. After only reading the new docs and looking at the examples, this already looks really awesome, great work @pdamme !

pdamme · 2024-04-22T20:33:46Z

Thanks. I'm sure it still has several limitations and can be improved from an efficiency and usability point of view, but it is a first step that we can build upon.

philipportner · 2024-04-29T12:42:39Z

I wrote a couple other extension and that, following the example in doc/Extensions.md worked quite well.
The only thing I would like to see in doc/Extensions.md is how you get the correct mnemonic of a Daphne op.
For example, when doing the same for the aggMax built-in, the correct mnemonic in the kernel_extensions.json would be maxAll. The only place where (besides at some points in the IR) I would be able to see that would be in DaphneOps.td where the op is defined. Maybe an explanation or link to figure out what the correct mnemonic is would be helpful.

Was also confused by the wrong results in our internal kernel, but I just saw you have already created an issue for that.

Otherwise, I think this looks great so far!

corepointer · 2024-04-30T18:44:27Z

Awesome work @pdamme 👍
I read through all five commits with adequate care (I hope) to understand what's going on but I did not compile and run it yet. So I don't have any particular question upfront without getting my hands dirty. But I'm looking forward to getting into the details and to see catalog.json 👀

pdamme · 2024-04-30T20:49:07Z

Thanks for the feedback, @philipportner. Happy to hear that you succeeded writing some little extensions. Indeed, the documentation is not comprehensive yet. I didn't want to spend to much time on that as we may still change a few things. However, I will add a short note (for developers) on where to get essential information to the docs (op mnemonics in DaphneOps.td and kernel interfaces in the generated kernels.cpp).

Thanks also to @corepointer. Merging it in is not super urgent, so simply let me know if you would like to have a closer look and try it out before I merge it. I would be happy to merge it by the end of this week, though (unless any significant concerns arise).

- The DAPHNE compiler usually lowers most domain-specific operations to calls to pre-compiled kernels. - So far, the DAPHNE compiler did not know which kernel instantiations are available in pre-compiled form. - Instead, it generated the expected function name of a kernel based on the DaphneIR operation's mnenomic, its result/argument types, and the processing backend (e.g., CPP or CUDA). - If the expected kernel was not available, an error of the form "JIT session error: Symbols not found: ..." occurred during LLVM JIT compilation. - This commit introduces an initial version of a kernel catalog that informs the DAPHNE compiler about the available pre-compiled kernels. - The kernel catalog stores a mapping from DaphneIR ops (represented by their mnemonic) to information on kernels registered for the op. - The information stored for each kernel currently comprises: the name of the pre-compiled C/C++ function, the result/argument types, the processing backend (e.g., CPP or CUDA). - The set of information will be extended in the future. - The kernel catalog provides methods for registering a kernel, retrieving the registered kernels for a specific op, and for dumping the catalog. - The kernel catalog is stored inside the DaphneUserConfig. - This makes sense since users will be able to configure the available kernels in the future. - That way, the kernel catalog is accessible in all parts of the DAPHNE compiler and runtime. - The information on the available kernels is currently stored in a JSON file named catalog.json (or CUDAcatalog.json). - Currently, catalog.json is generated by genKernelInst.py; thus, the system has access to the same kernel specializations as before. - catalog.json is read at DAPHNE system start-up in the coordinator and distributed workers. - Added a parser for the kernel catalog JSON file. - The concrete format of the catalog files may be changed in the future (e.g., to make it more efficient or intuitive). - RewriteToCallKernelOpPass uses the kernel catalog to obtain the kernel function name for an operation, instead of relying on a naming convention. - However, there are still a few points where kernel function names are built by convention (to be addressed later): - lowering of DistributedPipelineOp in RewriteToCallKernelOpPass - lowering of MapOp in LowerToLLVMPass - lowering of VectorizedPipelineOp in LowerToLLVMPass - Directly related misc changes: - DaphneIrExecutor has getters for its DaphneUserConfig. - CompilerUtils::mlirTypeToCppTypeName() allows generating either underscores (as before) or angle brackets (new) for template parameters. - This is a first step towards extensibility w.r.t. kernels, for now the main contribution is the representation of the available kernels in a data structure (the kernel catalog). - Closes #455, with an initial solution we can build upon in the future.

- The compiler knows in which shared lib each kernel is and makes use of this info. - Representation of this info: - The kernel catalog stores the path to the shared lib for each kernel. - The lib path is stored in the catalog files. - The catalog parser reads the path from the catalog files. - Utilization of this info: - In RewriteToCallKernelOpsPass, the compiler determines which kernels libs are really needed by the generated kernel calls and links only those during the JIT-compilation in DaphneIrExecutor. - The automatic determination of the required kernels libs allowed/required a refactoring of the config items/CLI args related to the lib paths: - So far, there used to be "libdir" and "library_paths"; their use was a bit scattered over multiple places in the code base, such that it was hard to understand which exact libs would be linked in the end. - Now, there is only "libdir", but with slightly modified semantics: it's the directory where the kernel catalogs reside. - The kernel catalog files were moved from "build/src/runtime/local/kernels/" to "lib/", the same directory where the compiled kernels libraries reside. - This is also better for releasing/deploying DAPHNE, since the "lib" directory is already taken into account for these purposes. - The config item/CLI arg "library_paths" was completely removed. - Paths are found as follows now: - From the "libdir" config item, the default kernel catalog files are found. - The paths of the kernels libs are stored in the catalog files, where they are specified relative to those. - To allow invoking DAPHNE from any pwd (as before), the "libdir" can be interpreted relative to the directory of the currently running executable (by using the prefix "{exedir}/"); this is done for the new default of "libdir".

- Expert users can optionally provide a hint on which concrete pre-compiled kernel function to use for a particular operation. - So far, this is only supported for DaphneDSL built-in functions. - Added a few script-level test cases. - Updated the DaphneDSL language reference. - The concrete syntax may be changed in the future. - As a side note: DaphneDSLBuiltins::build() should invoke getOperation() on ops with zero results before returning to allow assigning kernel hints in an op-agnostic way.

- Created a small example extension that is built and used as a test case. - The build of the extension is isolated from the DAPHNE build on purpose, since DAPHNE extensions can be separate code bases. - Thus, building/cleaning the extension is part of the test case itself. - Added a CLI arg for adding a kernel extension to DAPHNE at runtime. - DAPHNE does not need to be re-built to use the extension. - Slightly changed a few files in "src/runtime/local/datastructures/" by moving problematic includes from header to source files etc., in order to make including a few DAPHNE headers in the extension easy. - In the future, this aspect will need more attention.

- A very initial version of the documentation of implementing/building/using a custom kernel extension, as part of the user docs. - All files displayed in these docs are also in scripts/examples/extensions/myKernels/ for easy use.

pdamme · 2024-05-03T21:23:07Z

I've rebased this PR and tried to adapt the changes it contains to the new error handling from #706 as good as I could (hope I didn't miss anything). I've also added a note on where DAPHNE developers can find references of the op mnemonics and kernel interfaces to the docs on writing a custom kernel extension.

pdamme marked this pull request as draft November 1, 2022 21:28

pdamme self-assigned this Nov 1, 2022

pdamme force-pushed the 455-basic-kernel-extension-catalog branch from 075f051 to 43449e5 Compare December 19, 2022 21:09

pdamme force-pushed the 455-basic-kernel-extension-catalog branch from 43449e5 to f542242 Compare March 24, 2023 19:46

corepointer added this to the v0.3 milestone May 5, 2023

pdamme force-pushed the 455-basic-kernel-extension-catalog branch 2 times, most recently from 557d899 to 96b361e Compare September 7, 2023 14:20

pdamme force-pushed the 455-basic-kernel-extension-catalog branch 2 times, most recently from a7df434 to 3d6b2cc Compare April 10, 2024 12:55

pdamme force-pushed the 455-basic-kernel-extension-catalog branch 2 times, most recently from e6397e8 to 131ed82 Compare April 22, 2024 17:47

pdamme changed the title ~~Basic extension catalog for kernels.~~ Basic extension catalog for kernels, plus demo on early extensibility. Apr 22, 2024

pdamme marked this pull request as ready for review April 22, 2024 18:22

pdamme added 4 commits May 3, 2024 22:29

pdamme force-pushed the 455-basic-kernel-extension-catalog branch from 131ed82 to 84fa96f Compare May 3, 2024 20:52

[DOC] Initial docs on kernel extensions.

27b47f5

- A very initial version of the documentation of implementing/building/using a custom kernel extension, as part of the user docs. - All files displayed in these docs are also in scripts/examples/extensions/myKernels/ for easy use.

pdamme force-pushed the 455-basic-kernel-extension-catalog branch from 84fa96f to 27b47f5 Compare May 3, 2024 21:07

pdamme merged commit 18a1d45 into main May 3, 2024
2 checks passed

philipportner mentioned this pull request Sep 19, 2024

Experimental kernels folder #828

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic extension catalog for kernels, plus demo on early extensibility. #456

Basic extension catalog for kernels, plus demo on early extensibility. #456

pdamme commented Nov 1, 2022

pdamme commented Nov 1, 2022

pdamme commented Nov 1, 2022

pdamme commented Apr 22, 2024

philipportner commented Apr 22, 2024

pdamme commented Apr 22, 2024

philipportner commented Apr 29, 2024

corepointer commented Apr 30, 2024

pdamme commented Apr 30, 2024

pdamme commented May 3, 2024

Basic extension catalog for kernels, plus demo on early extensibility. #456

Basic extension catalog for kernels, plus demo on early extensibility. #456

Conversation

pdamme commented Nov 1, 2022

pdamme commented Nov 1, 2022

pdamme commented Nov 1, 2022

pdamme commented Apr 22, 2024

philipportner commented Apr 22, 2024

pdamme commented Apr 22, 2024

philipportner commented Apr 29, 2024

corepointer commented Apr 30, 2024

pdamme commented Apr 30, 2024

pdamme commented May 3, 2024