Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: [DO NOT MERGE] introduce libraft wheels #2531

Draft
wants to merge 34 commits into
base: branch-25.02
Choose a base branch
from

Conversation

jameslamb
Copy link
Member

@jameslamb jameslamb commented Dec 17, 2024

Replaces #2306, contributes to rapidsai/build-planning#33.

Proposes packaging libraft as a wheel, which is then re-used by:

Notes for Reviewers

If you see this note, that means this is not ready for review.

Wheel contents

libraft:

  • libraft.so (shared library)
  • RAFT headers
  • vendored dependencies:
    • fmt
    • CCCL (thrust, cub, libcudacxx)
    • cuco
    • cute
    • cutlass

pylibraft:

  • pylibraft Python / Cython code and compiled Cython extensions

raft-dask:

  • raft-dask Python / Cython code and compiled Cython extension

Dependency Flows

libcugraph and libcuml take build-time dependencies on libraft, and call libraft.load_library() at runtime (in Python) to load libraft.so from the libraft wheel.

---
title: Build dependencies
---
flowchart LR
    A[libraft] -->B[pylibraft]
    A --> C[raft-dask]
    B --> C

    D[libcugraph] --> E[pylibcugraph]
    D --> F[cugraph]
    E --> F
    A --> D
    A --> E
    A --> F
    B --> E
    B --> F

    G[libcuml] --> H[cuml]
    G --> H
    A --> G
    A --> H
    B --> H
Loading

cugraph and cuml would need libraft at runtime, to dynamically load libraft.so.

---
title: Runtime dependencies
---
flowchart LR
    A[libraft] -->B[pylibraft]
    A --> C[raft-dask]
    B --> C

    D[libcugraph] --> E[pylibcugraph]
    D --> F[cugraph]
    E --> F
    A --> D
    C --> F
    B --> E
    B --> F

    G[libcuml] --> H[cuml]
    G --> H
    A --> G
    B --> H
    C --> H
Loading

Presumably wholegraph could follow a similar pattern

Size changes (CUDA 12, Python 3.12, x86_64)

wheel num files (before) num files (this PR) size (before) size (this PR)
libraft. --- 3167 --- 18M
pylibraft 64 62 11M 1M
raft-dask 29 28 188M 188M
libcugraph --- 1762 --- 903M
pylibcugraph 190 187 901M 2M
cugraph 315 313 899M 3.0M
libcuml --- 1766 --- 289M
cuml 442 --- 517M ---
TOTAL 1,040 7,265 2,516M 1,404M

NOTES: size = compressed, "before" = 2025-01-13 nightlies (rapidsai/cugraph@8507cbf, ), cugraph libraries from rapidsai/cugraph#4804

how I calculated those (click me)
docker run \
    --rm \
    --network host \
    --env RAPIDS_NIGHTLY_DATE=2025-01-13 \
    --env CUGRAPH_NIGHTLY_SHA=8507cbf63db2f349136b266d3e6e787b189f45a0 \
    --env CUGRAPH_PR="pull-request/4804" \
    --env CUGRAPH_PR_SHA="8fe1d33cbcaf1f40a6b3d06ec48cc699c47f8b44" \
    --env CUML_NIGHTLY_SHA=7c715c494dff71274d0fdec774bdee12a7e78827 \
    --env CUML_PR="pull-request/6199" \
    --env CUML_PR_SHA="2ef32eaa006a84c0bd16220bb8e8af34198fbee8" \
    --env RAFT_NIGHTLY_SHA=1b62c4117a35b11ce3c830daae248e32ebf75e3f \
    --env RAFT_PR="pull-request/2531" \
    --env RAFT_PR_SHA="d275c995fb51310d1340fe2fd6d63d0bfd43cafa" \
    --env RAPIDS_PY_CUDA_SUFFIX=cu12 \
    --env WHEEL_DIR_BEFORE=/tmp/wheels-before \
    --env WHEEL_DIR_AFTER=/tmp/wheels-after \
    -it rapidsai/ci-wheel:cuda12.5.1-rockylinux8-py3.12 \
    bash

# --- nightly wheels --- #
mkdir -p ./wheels-before

export RAPIDS_BUILD_TYPE=branch
export RAPIDS_REF_NAME="branch-25.02"

# pylibraft
RAPIDS_PY_WHEEL_NAME="pylibraft_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/raft \
RAPIDS_SHA=${RAFT_NIGHTLY_SHA} \
    rapids-download-wheels-from-s3 python ./wheels-before

# raft-dask
RAPIDS_PY_WHEEL_NAME="raft_dask_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/raft \
RAPIDS_SHA=${RAFT_NIGHTLY_SHA} \
    rapids-download-wheels-from-s3 python ./wheels-before

# cugraph
RAPIDS_PY_WHEEL_NAME="cugraph_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cugraph \
RAPIDS_SHA=${CUGRAPH_NIGHTLY_SHA} \
    rapids-download-wheels-from-s3 python ./wheels-before

# pylibcugraph
RAPIDS_PY_WHEEL_NAME="pylibcugraph_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cugraph \
RAPIDS_SHA=${CUGRAPH_NIGHTLY_SHA} \
    rapids-download-wheels-from-s3 python ./wheels-before

# cuml
RAPIDS_PY_WHEEL_NAME="cuml_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cuml \
RAPIDS_SHA=${CUML_NIGHTLY_SHA} \
    rapids-download-wheels-from-s3 python ./wheels-before

# --- wheels from CI --- #
mkdir -p ./wheels-after

export RAPIDS_BUILD_TYPE="pull-request"

# libraft
RAPIDS_PY_WHEEL_NAME="libraft_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/raft \
RAPIDS_REF_NAME="${RAFT_PR}" \
RAPIDS_SHA="${RAFT_PR_SHA}" \
    rapids-download-wheels-from-s3 cpp ./wheels-after

# pylibraft
RAPIDS_PY_WHEEL_NAME="pylibraft_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/raft \
RAPIDS_REF_NAME="${RAFT_PR}" \
RAPIDS_SHA="${RAFT_PR_SHA}" \
    rapids-download-wheels-from-s3 python ./wheels-after

# raft-dask
RAPIDS_PY_WHEEL_NAME="raft_dask_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/raft \
RAPIDS_REF_NAME="${RAFT_PR}" \
RAPIDS_SHA="${RAFT_PR_SHA}" \
    rapids-download-wheels-from-s3 python ./wheels-after

# libcugraph
RAPIDS_PY_WHEEL_NAME="libcugraph_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cugraph \
RAPIDS_REF_NAME="${CUGRAPH_PR}" \
RAPIDS_SHA="${CUGRAPH_PR_SHA}" \
    rapids-download-wheels-from-s3 cpp ./wheels-after

# pylibcugraph
RAPIDS_PY_WHEEL_NAME="pylibcugraph_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cugraph \
RAPIDS_REF_NAME="${CUGRAPH_PR}" \
RAPIDS_SHA="${CUGRAPH_PR_SHA}" \
    rapids-download-wheels-from-s3 python ./wheels-after

# cugraph
RAPIDS_PY_WHEEL_NAME="cugraph_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cugraph \
RAPIDS_REF_NAME="${CUGRAPH_PR}" \
RAPIDS_SHA="${CUGRAPH_PR_SHA}" \
    rapids-download-wheels-from-s3 python ./wheels-after

# libcuml
RAPIDS_PY_WHEEL_NAME="libcuml_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cuml \
RAPIDS_REF_NAME="${CUML_PR}" \
RAPIDS_SHA="${CUML_PR_SHA}" \
    rapids-download-wheels-from-s3 cpp ./wheels-after

# cuml
RAPIDS_PY_WHEEL_NAME="cuml_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cuml \
RAPIDS_REF_NAME="${CUML_PR}" \
RAPIDS_SHA="${CUML_PR_SHA}" \
    rapids-download-wheels-from-s3 python ./wheels-after

pip install pydistcheck
pydistcheck \
    --inspect \
    --select 'distro-too-large-compressed' \
    ./wheels-before/*.whl \
| grep -E '^checking|files: | compressed' \
> ./before.txt

# get more exact sizes
du -sh ./wheels-before/*

pydistcheck \
    --inspect \
    --select 'distro-too-large-compressed' \
    ./wheels-after/*.whl \
| grep -E '^checking|files: | compressed' \
> ./after.txt

# get more exact sizes
du -sh ./wheels-after/*

How I tested this

These other PRs:

@jameslamb jameslamb added 5 - DO NOT MERGE Hold off on merging; see PR for details improvement Improvement / enhancement to an existing function non-breaking Non-breaking change 2 - In Progress Currenty a work in progress labels Dec 17, 2024

This comment was marked as resolved.

@jameslamb jameslamb changed the title WIP: introduce libraft wheels WIP: [DO NOT MERGE] introduce libraft wheels Dec 17, 2024
rapids-bot bot pushed a commit that referenced this pull request Jan 7, 2025
…nup (#2532)

Proposes some cleanup of packaging details, noticed while I was working on #2531

* removes runtime dependencies on `joblib` and `numba` for `raft-dask`
   - *`raft-dask` doesn't directly import from these libraries, and the git blame didn't suggest any other reason that they were being pinned here*
   - *checked with `git grep -E 'joblib|numba'`
* removes `setup.cfg` files
   - *these are currently being ignored by tools, in favor of identical configuration in `pyproject.toml` and `.flake8` files*
   - e.g. https://github.com/rapidsai/raft/blob/bfd190687ee396374b7106d9ac26add73b57b22a/.pre-commit-config.yaml#L16-L19
* packages license files in conda packages
  - *think these were just missed in the round of PRs like this: rapidsai/cuml#6061
* removes some outdated / inaccurate comments in packaging configs

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #2532
@github-actions github-actions bot added the cpp label Jan 7, 2025
@jameslamb
Copy link
Member Author

/ok to test

@jameslamb
Copy link
Member Author

/ok to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currenty a work in progress 5 - DO NOT MERGE Hold off on merging; see PR for details ci CMake cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant