You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
f"\nWe encountered an OverflowError and will retry "
f"the current batch with {parts_per_text_batch_retry} "
f"text partitions instead of {parts_per_text_batch_use}.",
flush=True,
)
continue
Update 10/28
We have WIP PR #316 however we noticed a performance regression (10-20% slower at scale). Fixing it the right way will require more investigation which means I'll put this at risk for this release (slack thread).
We should retire text_bytes_aware_shuffle as we have #77 merged in now .
That will mean we refactor below code .
NeMo-Curator/nemo_curator/modules/fuzzy_dedup.py
Lines 1119 to 1144 in c2f296c
Update 10/28
We have WIP PR #316 however we noticed a performance regression (10-20% slower at scale). Fixing it the right way will require more investigation which means I'll put this at risk for this release (slack thread).
Update 11/04
Working on it this week in #316
The text was updated successfully, but these errors were encountered: