Taproot contains code to evaluate the creation of a software stack for performing scientific data analysis. The stack can currently be built to evaluate anlyzing scientific data with various packages. Some of those packages require complex build system configurations.
The available analysis stacks are:
Apache Lucene
Lawrence Berkeley Lab's Fastbit
LevelDB
Apache Arrow and Parquet
DuckDB
TrinoDB
Assuming you have everything else configured, maybe you just want to build taproot.
Do this: Edit gradle.properties to point at the install path for components
./gradlew build
mkdir build
cd build
cmake ..
cmake --build .
cd data/1m_points
gzip *
cd ../../build
mfem2parquet -s 0 ../data/1m_points test.parquet
duckdb test.parquet
If you haven't built Taproot before, you'll need to access several supporting packages to generate data, etc.
You will need to either build Laghos and all of its pre-requisites to generate a Laghos-formatted MFEM mesh, or if you already have a Laghos mesh stored in files you can simply build a serial version of MFEM and use that to construct arrays of Laghos mesh data.
- Note that to use MFEM with Lucene it is necessary to build MFEM so that it can
- be linked dynamically. By default these steps do not occur.
wget -c https://bit.ly/mfem-4-5 -O mfem-4.5.tgz
tar xvfz mfem-4.5.tgz
cd mfem-4.5
edit config/defaults.mk PREFIX=${SWHOME}, SHARED=yes
make config BUILD_DIR=build
cd build
make -j 4 serial
make PREFIX=$SWHOME SHARED=YES install
make STATIC=YES PREFIX=$SWHOME install
NOTE: The following CMake steps do not work with this version of MFEM.
CMake installation does not work, these instructions will not enable -fPIC
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=$swhome -DCMAKE_INSTALL_RPATH=$swhome/lib -DMFEM_USE_MPI=no ..
make install
git clone [email protected]:apache/arrow.git
cd arrow
git checkout apache-arrow-13.0.0
cd cpp
mkdir -p build
cd build
cmake .. --preset ninja-debug -DARROW_BUILD_STATIC=ON -DARROW_BUILD_EXAMPLES=ON -DPARQUET_BUILD_EXECUTABLES=ON
cmake --build . -j 2
cmake --install . --prefix $SWHOME
The following steps describe how to build the taproot Lucene data analytics package
Install curl
curl -s "https://get.sdkman.io" | bash
sdk install gradle 8.1.1
Install OpenJDK 17 or 18 (on ubuntu, usually update-alternatives --config java is sufficient)
git clone https://github.com/apache/lucene
cd lucene
./gradlew
Fastbit and TrinoDB support are not yet enabled.
cd taproot Edit gradle.properties to set the path to MFEM and Arrow
./gradlew build
./gradlew runTest
./gradlew build
./gradlew test
- Stop all existing gradle daemons
./gradlew --stop
- Even though we specify no daemon, one daemon runs and exits at shutdown (we'd prefer no Daemon at all)
./gradlew run --no-daemon
sudo apt-get install libboost-all-dev
cd TAPROOT_DIR
mkdir build
cd build
cmake .. -DARROW_DIR=$SWHOME -DMFEM_DIR=$SWHOME
cmake --build . -j 1
$SWHOME/bin/mfem2parquet -s flat|custom|trino mfem_dir parquet_dir
TODO: ./taproot-create-fastbit-index -i <mesh_dir> -o TODO: ./taproot-select-fastbit -i -q