Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

LeRobotDataset v2.1 #711

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Conversation

aliberts
Copy link
Collaborator

@aliberts aliberts commented Feb 10, 2025

What this does

This PR introduces aims to improve the usability of LeRobotDataset. We increase CODEBASE_VERSION from v2.0 to v2.1 as changes are backward compatible with v2.0.

What do I need to do?

Simply run this script on your dataset to update the stats

python lerobot/common/datasets/v21/convert_dataset_v20_to_v21.py \
    --repo-id=repo/id

This will:

  • Generate per-episodes stats and writes them in episodes_stats.jsonl
  • Check consistency between these new stats and the old ones.
  • Remove the deprecated stats.json.
  • Update codebase_version in info.json.
  • Push this new version to the hub on the main branch and tags it with v2.1.

Changes

  • Replaces global stats.json with per-episode stats episodes_stats.jsonl. Episodes stats are then aggregated over selected episodes at initialization of the dataset. Stats computation speed is greatly improved. Per-episode stats #521
dataset_root/
  ├── data
  ├── meta
  │   ├── episodes.jsonl
+ │   ├── episodes_stats.jsonl
  │   ├── info.json
- │   ├── stats.json
  │   └── tasks.jsonl
  └── videos

TODOs in later PRs

  • Use standard hf_dataset.set_format("torch") instead of custom hf_dataset.set_transform(hf_transform_to_torch)
  • Multi dataset, features mapping
  • Update visualization for multi-task episodes

How it was tested

  • Improves test_datasets
  • Adds test_compute_stats

@aliberts aliberts marked this pull request as ready for review February 19, 2025 15:03
@aliberts aliberts added ✨ Enhancement New feature or request 🗃️ Dataset Something dataset-related labels Feb 20, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
🗃️ Dataset Something dataset-related ✨ Enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants