-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic extension catalog for kernels, plus demo on early extensibility. #456
Conversation
@EricMier: I think this could be interesting to you. Once we have the initial extension catalog for kernels, it could become the home of any performance profiles for kernel/device selection. The kernel extension catalog could be accessible throughout the DAPHNE compiler, such that any pass could query its information, e.g., a separate pass selecting a suitable device (as an alternative/improvement to |
@corepointer: JFYI, since you've also worked on the lowering from DaphneIR ops to kernel calls, especially |
075f051
to
43449e5
Compare
43449e5
to
f542242
Compare
557d899
to
96b361e
Compare
a7df434
to
3d6b2cc
Compare
e6397e8
to
131ed82
Compare
This PR is ready to be merged now, from my point of view. It contains the basic extension catalog for kernels (see above, with a few minor deviations from the initial ideas) plus a few more commits that show its end-to-end use. Of course, feedback is welcome, but I'm not asking for a code review. However, as this PR changes a few important things, I would like to give everyone some time to have a look before I merge it. Unless anyone speaks up, I will merge this PR next Monday (Apr 29). The commits in this PR are meant to be "rebased & merged", not These are the most important changes (see the commit messages for more details):
This PR is just the first step towards extensibility in DAPHNE and there are many more aspects and details we will work on in the future. |
I'm exited to play around with this in the next couple days. After only reading the new docs and looking at the examples, this already looks really awesome, great work @pdamme ! |
Thanks. I'm sure it still has several limitations and can be improved from an efficiency and usability point of view, but it is a first step that we can build upon. |
I wrote a couple other extension and that, following the example in Was also confused by the wrong results in our internal kernel, but I just saw you have already created an issue for that. Otherwise, I think this looks great so far! |
Awesome work @pdamme 👍 |
Thanks for the feedback, @philipportner. Happy to hear that you succeeded writing some little extensions. Indeed, the documentation is not comprehensive yet. I didn't want to spend to much time on that as we may still change a few things. However, I will add a short note (for developers) on where to get essential information to the docs (op mnemonics in Thanks also to @corepointer. Merging it in is not super urgent, so simply let me know if you would like to have a closer look and try it out before I merge it. I would be happy to merge it by the end of this week, though (unless any significant concerns arise). |
- The DAPHNE compiler usually lowers most domain-specific operations to calls to pre-compiled kernels. - So far, the DAPHNE compiler did not know which kernel instantiations are available in pre-compiled form. - Instead, it generated the expected function name of a kernel based on the DaphneIR operation's mnenomic, its result/argument types, and the processing backend (e.g., CPP or CUDA). - If the expected kernel was not available, an error of the form "JIT session error: Symbols not found: ..." occurred during LLVM JIT compilation. - This commit introduces an initial version of a kernel catalog that informs the DAPHNE compiler about the available pre-compiled kernels. - The kernel catalog stores a mapping from DaphneIR ops (represented by their mnemonic) to information on kernels registered for the op. - The information stored for each kernel currently comprises: the name of the pre-compiled C/C++ function, the result/argument types, the processing backend (e.g., CPP or CUDA). - The set of information will be extended in the future. - The kernel catalog provides methods for registering a kernel, retrieving the registered kernels for a specific op, and for dumping the catalog. - The kernel catalog is stored inside the DaphneUserConfig. - This makes sense since users will be able to configure the available kernels in the future. - That way, the kernel catalog is accessible in all parts of the DAPHNE compiler and runtime. - The information on the available kernels is currently stored in a JSON file named catalog.json (or CUDAcatalog.json). - Currently, catalog.json is generated by genKernelInst.py; thus, the system has access to the same kernel specializations as before. - catalog.json is read at DAPHNE system start-up in the coordinator and distributed workers. - Added a parser for the kernel catalog JSON file. - The concrete format of the catalog files may be changed in the future (e.g., to make it more efficient or intuitive). - RewriteToCallKernelOpPass uses the kernel catalog to obtain the kernel function name for an operation, instead of relying on a naming convention. - However, there are still a few points where kernel function names are built by convention (to be addressed later): - lowering of DistributedPipelineOp in RewriteToCallKernelOpPass - lowering of MapOp in LowerToLLVMPass - lowering of VectorizedPipelineOp in LowerToLLVMPass - Directly related misc changes: - DaphneIrExecutor has getters for its DaphneUserConfig. - CompilerUtils::mlirTypeToCppTypeName() allows generating either underscores (as before) or angle brackets (new) for template parameters. - This is a first step towards extensibility w.r.t. kernels, for now the main contribution is the representation of the available kernels in a data structure (the kernel catalog). - Closes #455, with an initial solution we can build upon in the future.
- The compiler knows in which shared lib each kernel is and makes use of this info. - Representation of this info: - The kernel catalog stores the path to the shared lib for each kernel. - The lib path is stored in the catalog files. - The catalog parser reads the path from the catalog files. - Utilization of this info: - In RewriteToCallKernelOpsPass, the compiler determines which kernels libs are really needed by the generated kernel calls and links only those during the JIT-compilation in DaphneIrExecutor. - The automatic determination of the required kernels libs allowed/required a refactoring of the config items/CLI args related to the lib paths: - So far, there used to be "libdir" and "library_paths"; their use was a bit scattered over multiple places in the code base, such that it was hard to understand which exact libs would be linked in the end. - Now, there is only "libdir", but with slightly modified semantics: it's the directory where the kernel catalogs reside. - The kernel catalog files were moved from "build/src/runtime/local/kernels/" to "lib/", the same directory where the compiled kernels libraries reside. - This is also better for releasing/deploying DAPHNE, since the "lib" directory is already taken into account for these purposes. - The config item/CLI arg "library_paths" was completely removed. - Paths are found as follows now: - From the "libdir" config item, the default kernel catalog files are found. - The paths of the kernels libs are stored in the catalog files, where they are specified relative to those. - To allow invoking DAPHNE from any pwd (as before), the "libdir" can be interpreted relative to the directory of the currently running executable (by using the prefix "{exedir}/"); this is done for the new default of "libdir".
- Expert users can optionally provide a hint on which concrete pre-compiled kernel function to use for a particular operation. - So far, this is only supported for DaphneDSL built-in functions. - Added a few script-level test cases. - Updated the DaphneDSL language reference. - The concrete syntax may be changed in the future. - As a side note: DaphneDSLBuiltins::build() should invoke getOperation() on ops with zero results before returning to allow assigning kernel hints in an op-agnostic way.
- Created a small example extension that is built and used as a test case. - The build of the extension is isolated from the DAPHNE build on purpose, since DAPHNE extensions can be separate code bases. - Thus, building/cleaning the extension is part of the test case itself. - Added a CLI arg for adding a kernel extension to DAPHNE at runtime. - DAPHNE does not need to be re-built to use the extension. - Slightly changed a few files in "src/runtime/local/datastructures/" by moving problematic includes from header to source files etc., in order to make including a few DAPHNE headers in the extension easy. - In the future, this aspect will need more attention.
131ed82
to
84fa96f
Compare
- A very initial version of the documentation of implementing/building/using a custom kernel extension, as part of the user docs. - All files displayed in these docs are also in scripts/examples/extensions/myKernels/ for easy use.
84fa96f
to
27b47f5
Compare
I've rebased this PR and tried to adapt the changes it contains to the new error handling from #706 as good as I could (hope I didn't miss anything). I've also added a note on where DAPHNE developers can find references of the op mnemonics and kernel interfaces to the docs on writing a custom kernel extension. |
Before I start implementing, I'd like to share an overview of my plans with everyone interested. Feel free to comment.
Goals
Planned steps
kernels.json
) (CPP
,CUDA
,FPGAOPENCL
, ...)kernels.json
kernels.json
in release artifact (maybe move that file)MarkCUDAOpsPass
andMarkFPGAOPENCLOpsPass
mark ops to be executed on the respective processing backend if certain conditions are metAdaptTypesToKernelsPass
harmonizes input and output types for certain ops, assuming that kernels are usually available for homogeneous type combinationsRewriteToCallKernelOpPass
rewrites domain-specific DaphneIR ops toCallKernelOp
; the name of the kernel function is created by naming convention taking into account the op name, input and output types, and hints on the processing backend; but does not know if that kernel really existsLowerToLLVMPass
lowersVectorizedPipelineOp
andDistributedPipelineOp
toCallKernelOp
MarkCUDAOpsPass
andMarkFPGAOPENCLOpsPass
remain as they areAdaptTypesToKernelsPass
remains as it is, but might not be used, for now (but could be useful again later)SelectKernelsPass
(new): queries the new kernel extension catalog by (op name, processing backend) and (1) decides which kernel function to call and adds a hint (attribute) to the op, and (2) inserts casts of inputs and outputs where necessaryRewriteToCallKernelOpsPass
, since the casts it introduces could offer potential for further optimizationRewriteToCallKernelOpPass
will merely rewrite ops toCallKernelCallOp
, based on decisions that have been made before, reflected by attributes of the opLowerToLLVMPass
stays as it is (those two ops shall have very generic kernels anyway)Possible follow-up issues (not to be addressed by this PR)
Closes #455.