Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

pretrain_dataset broken #1026

Closed
6 of 8 tasks
mhenrichsen opened this issue Jan 1, 2024 · 2 comments
Closed
6 of 8 tasks

pretrain_dataset broken #1026

mhenrichsen opened this issue Jan 1, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@mhenrichsen
Copy link
Collaborator

mhenrichsen commented Jan 1, 2024

Please check that this issue hasn't been reported before.

  • I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

Should stream a hf dataset

Current behaviour

[2024-01-01 11:53:55,332] [INFO] [axolotl.load_model:517] [PID:20811] [RANK:2] GPU memory usage after model load: 2.062GB (+0.087GB cache, +1.755GB misc)
[2024-01-01 11:53:55,340] [INFO] [axolotl.load_model:552] [PID:20811] [RANK:2] converting modules to torch.bfloat16 for flash attention
Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/workspace/axolotl/src/axolotl/cli/train.py", line 38, in <module>
    fire.Fire(do_cli)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/workspace/axolotl/src/axolotl/cli/train.py", line 34, in do_cli
    train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
  File "/workspace/axolotl/src/axolotl/train.py", line 136, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train
    return inner_training_loop(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in _inner_training_loop
    train_dataloader = self.get_train_dataloader()
  File "/workspace/axolotl/src/axolotl/core/trainer_builder.py", line 210, in get_train_dataloader
    sampler = self._get_train_sampler()
  File "/workspace/axolotl/src/axolotl/core/trainer_builder.py", line 161, in _get_train_sampler
    RandomSampler(self.train_dataset),
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/data/sampler.py", line 106, in __init__
    if not isinstance(self.num_samples, int) or self.num_samples <= 0:
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/data/sampler.py", line 114, in num_samples
    return len(self.data_source)
TypeError: object of type 'IterableDataset' has no len()

Steps to reproduce

Run the yaml

Config yaml

base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T

model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: true

load_in_8bit: false
load_in_4bit: false
strict: false

pretraining_dataset:
  - path: mhenrichsen/terra
    type: completion

#datasets:
#  - path: DDSC/dagw_reddit_filtered_v1.0.0
#    type: completion
dataset_prepared_path:
val_set_size: 0.001
output_dir: ./tiny

sequence_len: 2048
sample_packing: true
pad_to_sequence_len: true

wandb_project: tiny-danskgpt
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 16
num_epochs: 2
max_steps: 200000
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.00005

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
saves_per_epoch: 4
debug:
deepspeed: deepspeed/zero2.json
weight_decay: 0.1
fsdp:
fsdp_config:
special_tokens:
  bos_token: "<s>"
  eos_token: "</s>"
  unk_token: "<unk>"

Possible solution

Datasets may have been updated and broken the functionality.

Which Operating Systems are you using?

  • Linux
  • macOS
  • Windows

Python Version

docker

axolotl branch-commit

main

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this bug has not been reported yet.
  • I am using the latest version of axolotl.
  • I have provided enough information for the maintainers to reproduce and diagnose the issue.
@mhenrichsen mhenrichsen added the bug Something isn't working label Jan 1, 2024
@NanoCode012
Copy link
Collaborator

NanoCode012 commented Jan 8, 2024

Hey, I'm not sure if you uploaded an old config. It should be the below,

pretraining_dataset: mhenrichsen/terra

Am I right?

Edit: Also, the linked PR seem to be merged. Was your issue solved?

@NanoCode012
Copy link
Collaborator

NanoCode012 commented Mar 30, 2024

Closing as it's stale. Feel free to reopen if this is still happening.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants