-
Notifications
You must be signed in to change notification settings - Fork 83
Pull requests: NVIDIA/NeMo-Curator
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Add blocksize to
DocumentDataset.read_*
that uses dd.from_map
#374
opened Nov 15, 2024 by
praateekmahajan
•
Draft
3 tasks
Synthetic data generation for Retriever Evaluation
#370
opened Nov 14, 2024 by
vinay-raman
Loading…
3 tasks done
Remove lingering
DASK_DATAFRAME__QUERY_PLANNING
environment variables
#346
opened Nov 5, 2024 by
sarahyurick
Loading…
Synthetic Data Generation for Retriever Evaluation
#338
opened Oct 30, 2024 by
vinay-raman
Loading…
3 tasks done
Convert
translation_example.py
into a Jupyter Notebook tutorial
#336
opened Oct 29, 2024 by
sarahyurick
•
Draft
Add READMEs to
examples/
and nemo_curator/scripts
directories
#332
opened Oct 28, 2024 by
sarahyurick
Loading…
Add codepath for computing buckets without int conversion
#326
opened Oct 25, 2024 by
ayushdg
Loading…
3 tasks done
Dapt data curation tutorial fuzzy and semantic dedupe
gpuci
Run GPU CI/CD on PR
#322
opened Oct 24, 2024 by
ruchaa-apte
Loading…
[WIP] Retiring Run GPU CI/CD on PR
text_bytes_aware_shuffle
to use shuffle
directly
gpuci
#316
opened Oct 21, 2024 by
praateekmahajan
•
Draft
3 tasks
MinHash improvement using minhash_permuted
enhancement
New feature or request
gpuci
Run GPU CI/CD on PR
#313
opened Oct 18, 2024 by
praateekmahajan
Loading…
3 tasks
[DRAFT] Passing meta to map_partitions for read_data
#291
opened Oct 9, 2024 by
praateekmahajan
•
Draft
3 tasks
Add blocksize to
DocumentDataset.read_*
that uses dask_cudf.read_*
#285
opened Oct 8, 2024 by
praateekmahajan
Loading…
3 tasks
Added example notebook for translation with ct2 model.
documentation
Improvements or additions to documentation
Add Multiple Model Classification example
documentation
Improvements or additions to documentation
#173
opened Jul 30, 2024 by
sarahyurick
Loading…
Previous Next
ProTip!
Follow long discussions with comments:>50.