-
Notifications
You must be signed in to change notification settings - Fork 55
Issues: Lightning-AI/litdata
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
How to optimimize dataset for pretraining from HuggingFace
bug
Something isn't working
question
Further information is requested
#482
opened Feb 21, 2025 by
TheLukaDragar
Add support for GCS
enhancement
New feature or request
priority 0
#476
opened Feb 18, 2025 by
tchaton
Add pytest fixture to limit max time a test can take
bug
Something isn't working
help wanted
Extra attention is needed
#475
opened Feb 17, 2025 by
deependujha
RuntimeError: Trying to resize storage that is not resizable
#472
opened Feb 17, 2025 by
VedantKalbag
Bug: zstd.Error: Decompression error: Src size is incorrect
bug
Something isn't working
help wanted
Extra attention is needed
#463
opened Feb 5, 2025 by
lilavocado
CI error: Unexpected segmentation fault encountered in worker
bug
Something isn't working
help wanted
Extra attention is needed
#459
opened Jan 31, 2025 by
deependujha
Optimize tokens throws seg fault
bug
Something isn't working
help wanted
Extra attention is needed
#454
opened Jan 22, 2025 by
tclements-usgs
drop_last is not respected
bug
Something isn't working
help wanted
Extra attention is needed
#442
opened Jan 3, 2025 by
robmarkcole
Latest tag
bug
Something isn't working
wontfix
This will not be worked on
#441
opened Jan 3, 2025 by
robmarkcole
CI error: Something isn't working
help wanted
Extra attention is needed
All chunks should've been deleted
keeps coming back
bug
#437
opened Dec 20, 2024 by
deependujha
Restart training with new data, mid-epoch
enhancement
New feature or request
#436
opened Dec 17, 2024 by
schopra8
Advanced Batching Logic with CombinedStreamingDataset
enhancement
New feature or request
#434
opened Dec 13, 2024 by
schopra8
Question: Is there a list for publicly available s3 links of datasets of Further information is requested
litdata.StreamingDataset
format?
question
#430
opened Dec 2, 2024 by
2catycm
Question: Is litdata faster when loading local dataset or network storage s3 dataset?
question
Further information is requested
#428
opened Nov 30, 2024 by
2catycm
Add 'New contributors' section to main README
enhancement
New feature or request
#426
opened Nov 28, 2024 by
robmarkcole
Clear Examples of use with different dataset types and code changes.
enhancement
New feature or request
#409
opened Nov 4, 2024 by
Woodr7
incorrect dataloader length when Something isn't working
help wanted
Extra attention is needed
drop_last=False
bug
#402
opened Oct 28, 2024 by
grez72
Improve CombinedStreamingDataset to handle multiple subdatasets efficiently
enhancement
New feature or request
#386
opened Oct 2, 2024 by
bhimrazy
The config isn't consistent between chunks
bug
Something isn't working
help wanted
Extra attention is needed
#370
opened Sep 17, 2024 by
gluonfield
How can I shut down automatically distributing data when using StreamingDataset?
enhancement
New feature or request
question
Further information is requested
#368
opened Sep 12, 2024 by
ygtxr1997
Failed to Resume Training w/ CombinedStreamingDataset
bug
Something isn't working
duplicate
This issue or pull request already exists
help wanted
Extra attention is needed
#363
opened Sep 5, 2024 by
schopra8
StreamingDataset causes NCCL timeout when using multiple nodes
bug
Something isn't working
help wanted
Extra attention is needed
#340
opened Aug 26, 2024 by
hubenjm
StreamingDataset intermittently fails due to lack of index.json
bug
Something isn't working
help wanted
Extra attention is needed
#337
opened Aug 20, 2024 by
plra
Bug: Inconsistent Behavior with StreamingDataloader loading states (specific to CombinedStreamingDataset)
bug
Something isn't working
help wanted
Extra attention is needed
#331
opened Aug 14, 2024 by
bhimrazy
Use different batch sizes in CombinedStreamingDataset
enhancement
New feature or request
help wanted
Extra attention is needed
#327
opened Aug 10, 2024 by
schopra8
Previous Next
ProTip!
Adding no:label will show everything without a label.