You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please check that this issue hasn't been reported before.
I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
Should stream a hf dataset
Current behaviour
[2024-01-01 11:53:55,332] [INFO] [axolotl.load_model:517] [PID:20811] [RANK:2] GPU memory usage after model load: 2.062GB (+0.087GB cache, +1.755GB misc)
[2024-01-01 11:53:55,340] [INFO] [axolotl.load_model:552] [PID:20811] [RANK:2] converting modules to torch.bfloat16 for flash attention
Traceback (most recent call last):
File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/workspace/axolotl/src/axolotl/cli/train.py", line 38, in <module>
fire.Fire(do_cli)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/workspace/axolotl/src/axolotl/cli/train.py", line 34, in do_cli
train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
File "/workspace/axolotl/src/axolotl/train.py", line 136, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train
return inner_training_loop(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in _inner_training_loop
train_dataloader = self.get_train_dataloader()
File "/workspace/axolotl/src/axolotl/core/trainer_builder.py", line 210, in get_train_dataloader
sampler = self._get_train_sampler()
File "/workspace/axolotl/src/axolotl/core/trainer_builder.py", line 161, in _get_train_sampler
RandomSampler(self.train_dataset),
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/data/sampler.py", line 106, in __init__
if not isinstance(self.num_samples, int) or self.num_samples <= 0:
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/data/sampler.py", line 114, in num_samples
return len(self.data_source)
TypeError: object of type 'IterableDataset' has no len()
Please check that this issue hasn't been reported before.
Expected Behavior
Should stream a hf dataset
Current behaviour
Steps to reproduce
Run the yaml
Config yaml
Possible solution
Datasets may have been updated and broken the functionality.
Which Operating Systems are you using?
Python Version
docker
axolotl branch-commit
main
Acknowledgements
The text was updated successfully, but these errors were encountered: