Capreolus: Reranking robust04 with PARADE

This page contains instructions for running Capreolus' PARADE implementation on the robust04 ad-hoc retrieval benchmark.

Setup

This section contains instructions for installing Capreolus. Do not install Capreolus via pip, because we want a copy of the master branch that can be modified locally.

Ensure Python 3.6+ and Java 11 are installed. See the installation guide for help.
Install PyTorch 1.6.0. If possible, choose CUDA 10.1 to match our environment.
Clone the Capreolus repository: git clone https://github.com/capreolus-ir/capreolus
You should now have a capreolus folder that contains various files as well as another capreolus folder, which contains the actual capreolus Python package. This is a common layout for Python packages; the inside folder (i.e., capreolus/capreolus) corresponds to the Python package.
Install dependencies: pip install -r capreolus/requirements.txt

Testing installation

To run capreolus, cd into the top-level capreolus directory and run python -m capreolus.run. This is equivalent to the capreolus command available when pip-installed. You should see a help message.
Let's try one more command to ensure everything is setup correctly: python -m capreolus.run rank.print_config. This should print a description of the default ranking config.
Briefly read about configuring Capreolus. The main thing to note is that results will be stored in ~/.capreolus by default.

Running PARADE

This requires a 48GB GPU or a TPU. It has been tested on NVIDIA Quadro RTX 8000s and Google Cloud TPUs.

Make sure you have an available GPU and are in the top-level capreolus directory.
Train and evaluate PARADE on each of the five robust04 folds (splits s1-s5):
This can be done with TensorFlow:
python -m capreolus.run rerank.traineval with file=docs/reproduction/config_parade.txt fold=s1
python -m capreolus.run rerank.traineval with file=docs/reproduction/config_parade.txt fold=s2
python -m capreolus.run rerank.traineval with file=docs/reproduction/config_parade.txt fold=s3
python -m capreolus.run rerank.traineval with file=docs/reproduction/config_parade.txt fold=s4
python -m capreolus.run rerank.traineval with file=docs/reproduction/config_parade.txt fold=s5
Or PyTorch:
python -m capreolus.run rerank.traineval with file=docs/reproduction/config_paradept.txt fold=s1
python -m capreolus.run rerank.traineval with file=docs/reproduction/config_paradept.txt fold=s2
python -m capreolus.run rerank.traineval with file=docs/reproduction/config_paradept.txt fold=s3
python -m capreolus.run rerank.traineval with file=docs/reproduction/config_paradept.txt fold=s4
python -m capreolus.run rerank.traineval with file=docs/reproduction/config_paradept.txt fold=s5
Each command will take a long time; approximately 36 hours on a Quadro 8000 (much faster on TPU). As above, per-fold metrics are displayed after each fold completes.
When the final fold completes, cross-validated metrics are also displayed.

Expected results

Note that results will vary slightly with your environment.

Environment	mAP	P@20	NDCG@20
Pytorch 1.6 (GPU)	0.3687	0.4851	0.5533
Pytorch 1.7 (GPU)	0.3687	0.4851	0.5533
Pytorch 1.8 (GPU)	0.3666	0.4783	0.5478
Tensorflow 2.4 (TPU)	0.3722	0.4783	0.5528
Tensorflow 2.5 (TPU)	0.3626	0.4739	0.5449

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PARADE.md

PARADE.md

Capreolus: Reranking robust04 with PARADE

Setup

Testing installation

Running PARADE

Expected results

Files

PARADE.md

Latest commit

History

PARADE.md

File metadata and controls

Capreolus: Reranking robust04 with PARADE

Setup

Testing installation

Running PARADE

Expected results