This repository contains the code and data for the paper Coarse or Fine? Recognising Action End States without Labels.
The paper was accepted at The Eleventh Workshop on Fine-Grained Visual Categorisation workshop hosted at CVPR 24.
If you find our work useful, please cite our paper:
@inproceedings{moltisanti24coarse,
author = {Moltisanti, Davide and Bilen, Hakan and Sevilla-Lara, Laura and Keller, Frank},
title = {{Coarse or Fine? Recognising Action End States without Labels}},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops},
year = {2024}}
You can find our paper on arXiv.
- Davide Moltisanti (University of Bath, work done while at Edinburgh)
- Hakan Bilen (University of Edinburgh)
- Laura Sevilla-Lara (University of Edinburgh)
- Frank Keller (University of Edinburgh)
We provide a conda environment to install all the necessary libraries (see file environment.yml
).
Note: lama-cleaner
needs a Rust compiler to build the tokeninzer
library.
This will fail if you use a very recent version of Python (3.8 is fine).
You can download the dataset we augmented from VOST here. This is the dataset we generated as detailed in the paper, which we use for all experiments.
Note that images are scaled to 256x144
pixels to save space.
If you want full resolution images you can use the script scripts/augment_vost.py
to generate new images
from the original VOST dataset at the desired resolution.
- Download the VOST dataset from this link
- Adjust the paths in the script located at
scripts/augment_vost.py
- Launch the script, adjusting the number of processes if needed. The script will spawn
n_proc
processes on the GPU to generated images in parallel.
By default, the script will generate the augmented dataset using the same parameters as in our released dataset. However, as random sampling is involved during augmentation, your generated images may not be identical to our released ones even if you use the same parameters.
Get in touch via email.
We provide training and testing scripts in the scripts/
folder to train and test our model and the main baselines.
Don't hesitate to get in touch by opening a GitHub issue if you need help!