Skip to content

Pull requests: NVIDIA/NeMo-Curator

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Ruff : nemo_curator/utils
#655 opened Apr 10, 2025 by praateekmahajan Loading…
3 tasks
Ruff : examples and nemo_curator/scripts
#654 opened Apr 10, 2025 by praateekmahajan Loading…
3 tasks
Ruff : Tutorials
#651 opened Apr 10, 2025 by praateekmahajan Loading…
3 tasks
Ruff - Base PR
#650 opened Apr 9, 2025 by praateekmahajan Loading…
3 tasks
Unpin Rapids versions for development
#642 opened Apr 9, 2025 by ayushdg Loading…
ci: Fix code-freeze workflow
#638 opened Apr 8, 2025 by ko3n1g Loading…
3 tasks
text modality doc updates for 24.04
#635 opened Apr 7, 2025 by lbliii Loading…
Version bump to 0.8.0rc3.dev0
#634 opened Apr 7, 2025 by github-actions bot Loading…
Enable Ruff
#628 opened Apr 4, 2025 by praateekmahajan Draft
3 tasks
Change prompt to try and get only topic names
#623 opened Apr 3, 2025 by abhinavg4 Loading…
3 tasks
[WIP] Remote I/O in SemDedup
#621 opened Apr 2, 2025 by praateekmahajan Draft
3 tasks
Add more tests to test_dataset gpuci Run GPU CI/CD on PR
#594 opened Mar 17, 2025 by sarahyurick Loading…
Nvingest curator tutorial
#584 opened Mar 11, 2025 by ruchaa-apte Loading…
Add Regex Modifier
#568 opened Feb 24, 2025 by shuoyangd Loading…
3 tasks done
Add option to skip data by adding a flag instead of removing them
#566 opened Feb 22, 2025 by shuoyangd Loading…
1 of 3 tasks
Add a way to pass expected language to FastTextLangId filter
#565 opened Feb 21, 2025 by shuoyangd Loading…
2 of 3 tasks
Remove minhash conditional for 25.02
#558 opened Feb 18, 2025 by praateekmahajan Loading…
3 tasks
Create FastText classifier module
#546 opened Feb 13, 2025 by sarahyurick Draft
Hard negative mining for Retriever fine-tuning
#523 opened Feb 5, 2025 by vinay-raman Loading…
3 tasks done
Added LookUp error handling during encoding detection.
#502 opened Jan 30, 2025 by ggcr Loading…
Clean up Pandas, cuDF, Dask, and Dask-cuDF DocumentDataset type logic gpuci Run GPU CI/CD on PR
#494 opened Jan 23, 2025 by sarahyurick Loading…
Standardize text_field and id_field terminology gpuci Run GPU CI/CD on PR
#485 opened Jan 17, 2025 by sarahyurick Loading…
ProTip! no:milestone will show everything without a milestone.