Chameleon is a C library providing parallel algorithms to perform BLAS/LAPACK operations exploiting fully modern architectures.
Chameleon dense linear algebra software relies on sequential task-based algorithms where sub-tasks of the overall algorithms are submitted to a Runtime system. Such a system is a layer between the application and the hardware which handles the scheduling and the effective execution of tasks on the processing units. A Runtime system such as StarPU is able to manage automatically data transfers between not shared memory area (CPUs-GPUs, distributed nodes).
This kind of implementation paradigm allows to design high performing linear algebra algorithms on very different type of architecture: laptop, many-core nodes, CPUs-GPUs, multiple nodes. For example, Chameleon is able to perform a Cholesky factorization (double-precision) at 80 TFlop/s on a dense matrix of order 400 000 (i.e. 4 min). Chameleon is a sub-project of MORSE specifically dedicated to dense linear algebra.
To use last development states of Chameleon, please clone the master
branch. Note that Chameleon contains a git submodule
morse_cmake.
To get sources please use these commands:
# if git version >= 1.9
git clone --recursive [email protected]:solverstack/chameleon.git
cd chameleon
# else
git clone [email protected]:solverstack/chameleon.git
cd chameleon
git submodule init
git submodule update
Last releases of Chameleon are hosted on the gforge.inria.fr for now. Future releases will be available on this gitlab project.
The user guide is available directly in the sources as emacs orgmode files, see :
- Introduction : description of the scientific context
- Installing :
- Getting Chameleon
- Prerequisites for installing Chameleon
- Distribution of Chameleon using Spack
- Build and install Chameleon with CMake
- Using :
- Linking an external application with Chameleon libraries
- Using Chameleon executables
- Chameleon API
This documentation could also be generated in html and/or pdf :
# build the doc with cmake (emacs with orgmode and latex are required), e.g. cmake .. -DCHAMELEON_ENABLE_DOC=ON make doc
There is no up-to-date documentation of Chameleon. We would like to provide a doxygen documentation hosted on gitlab in the future. Please refer to the section 2.1 of READMEDEV to get information about the documentation generation.
Please refer to the READMEDEV page.
To contact the developers send an email to [email protected]
First, since the Chameleon library started as an extension of the PLASMA library to support multiple runtime systems, all developpers of the PLASMA library are developpers of the Chameleon library.
The following people contributed to the development of Chameleon:
- Emmanuel Agullo, PI
- Olivier Aumage
- Cedric Castagnede
- Terry Cojean
- Mathieu Faverge, PI
- Nathalie Furmento
- Reazul Hoque
- Hatem Ltaief
- Gregoire Pichon
- Florent Pruvost, PI
- Marc Sergent
- Guillaume Sylvand
- Samuel Thibault
- Stanimire Tomov
- Omar Zenati
If we forgot your name, please let us know that we can fix that mistake.
Feel free to use the following publications to reference Chameleon:
- Original paper that initiated Chameleon and the principles:
- Agullo, Emmanuel and Augonnet, Cédric and Dongarra, Jack and Ltaief, Hatem and Namyst, Raymond and Thibault, Samuel and Tomov, Stanimire, Faster, Cheaper, Better – a Hybridization Methodology to Develop Linear Algebra Software for GPUs, GPU Computing Gems, First Online: 17 December 2010.
- Design of the QR algorithms:
- Agullo, Emmanuel and Augonnet, Cédric and Dongarra, Jack and Faverge, Mathieu and Ltaief, Hatem and Thibault, Samuel an Tomov, Stanimire, QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 25th IEEE International Parallel & Distributed Processing Symposium, First Online: 16 December 2010.
- Design of the LU algorithms:
- Agullo, Emmanuel and Augonnet, Cédric and Dongarra, Jack and Faverge, Mathieu and Langou, Julien and Ltaief, Hatem and Tomov, Stanimire, LU Factorization for Accelerator-based Systems, 9th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 11), First Online: 21 December 2011.
- Regarding distributed memory:
- Agullo, Emmanuel and Aumage, Olivier and Faverge, Mathieu and Furmento, Nathalie and Pruvost, Florent and Sergent, Marc and Thibault, Samuel, Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model, Research Report, First Online: 16 June 2016.