Due to YouTube License, we could not directly offer our processed data. However, you can follow the steps below to download the raw data and process it by yourself.
[ NEW❗️]: We just released the OpenDV-mini subset!
Please feel free to try the mini subset by following steps. Necessary information is also contained in our OpenDV-YouTube Google Sheet (marked as Mini
in the column Mini / Full Set
).
-
The complete dataset OpenDV-YouTube is the largest driving video dataset to date, containing more than 1700 hours of real-world driving videos and being 300 times larger than the widely used nuScenes dataset.
-
The mini subset, OpenDV-mini, contains about 28 hours of videos, with diverse geographical distribution and various camera settings. Among these videos, 25 hours are used as
mini-train
and the other 3 hours are used asmini-val
.
We recommend to process the dataset in Linux
environment since Windows
may have issues with the file paths.
Install the required packages by running the following command.
conda create -n opendv python=3.10 -y
conda activate opendv
pip install -r requirements.txt
In case the meta data of videos downloaded are fragmented, we recommend installing ffmpeg<=3.4.9
. Instead of using the following commands, you can also directly clone and build from their official repository.
# 1. prepare yasm for ffmpeg. If it is already satisfied by your machine, skip to the next step.
wget https://tortall.net/projects/yasm/releases/yasm-1.3.0.tar.gz
tar -xzvf yasm-1.3.0.tar.gz
cd yasm-1.3.0
./configure
make
make install
# 2. install ffmpeg<=3.4.9.
wget https://ffmpeg.org/releases/ffmpeg-3.4.9.tar.gz
tar -xzvf ffmpeg-3.4.9.tar.gz
cd ffmpeg-3.4.9
./configure
make
make install
# 3. check the installation. Sometimes you may need to reactivate the conda environment to see it working.
ffprobe
First, download the OpenDV-YouTube Google Sheet as a csv
file. For default setting, you should save the file as meta/OpenDV-YouTube.csv
. You could change it to whatever path you want as long as you change the csv_path
in the command in the next step.
Then, run the following command to preprocess the meta data. The default value for --csv_path
(or -i
) and --json_path
(or -o
) are meta/OpenDV-YouTube.csv
and meta/OpenDV-YouTube.json
respectively.
python scripts/meta_preprocess.py -i CSV_PATH -o JSON_PATH
To download the raw data from YouTube, you should first change the configures in configs/download.json
.
Note that the script supports multi-threading download, so please set the num_workers
to a proper value according to your hardware and network condition.
Also, the format
key in the config file should strictly obey the format selection rules of the youtube-dl
package. We do not recommend changing it unless you are familiar with the package.
Now you can run the following command to download the raw video data.
python scripts/youtube_download.py >> download_output.txt
The download will take about
If you wish to use the mini subset, just simply add the mini
option in your command, i.e. run the following command.
python scripts/youtube_download.py --mini >> download_output.txt
You may refer to the download_exceptions.txt
to check whether the download is successful or not. The file will be automatically generated by the script in the root of the opendv
codebase.
If downloading with youtube-dl
is not successful, you can change the method
in config from youtube-dl
to yt-dlp
.
When the download is finished, you can first set the configures in configs/video2img.json
to those you expect. The script also supports multi-threading processing, so you can set the num_workers
to a proper value according to your hardware condition.
Note that if you want to align with the annotations we provide, frame_rate
should not be changed.
Then, you can run the following command to preprocess the raw video data.
python scripts/video2img.py >> vid2img_output.txt
The preprocessing will take about
If you wish to use the mini subset, just simply add the mini
option in your command, i.e. run the following command.
python scripts/video2img.py --mini >> vid2img_output.txt
You may refer to the vid2img_exceptions.txt
to check the status.
The full annotation data, including commands and contexts of video clips, is available at OpenDV-YouTube-Language. The files are in json
format, with total size of about 14GB.
The annotation data is aligned with the structure of the preprocessed data. You can use the following code to load in annotations respectively.
import json
# for train
full_annos = []
for split_id in range(10):
split = json.load(open("10hz_YouTube_train_split{}.json".format(str(split_id)), "r"))
full_annos.extend(split)
# for val
val_annos = json.load(open("10hz_YouTube_val.json", "r"))
Annotations will be loaded in full_annos
as a list where each element contains annotations for one video clip. All elements in the list are dictionaries of the following structure.
{
"cmd": <int> -- command, i.e. the command of the ego vehicle in the video clip.
"blip": <str> -- context, i.e. the BLIP description of the center frame in the video clip.
"folder": <str> -- the relative path from the processed OpenDV-YouTube dataset root to the image folder of the video clip.
"first_frame": <str> -- the filename of the first frame in the clip. Note that this file is included in the video clip.
"last_frame": <str> -- the filename of the last frame in the clip. Note that this file is included in the video clip.
}
The command, i.e. the cmd
field, can be converted to natural language using the map_category_to_caption
function. You may refer to cmd2caption.py for details.
The context, i.e. the blip
field, is the description of the center frame in the video generated by BLIP2
.
If you find our work helpful, please cite the following paper.
@inproceedings{yang2024genad,
title={Generalized Predictive Model for Autonomous Driving},
author={Jiazhi Yang and Shenyuan Gao and Yihang Qiu and Li Chen and Tianyu Li and Bo Dai and Kashyap Chitta and Penghao Wu and Jia Zeng and Ping Luo and Jun Zhang and Andreas Geiger and Yu Qiao and Hongyang Li},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024}
}