Skip to content

Commit

Permalink
Update HC4 Regressions documentation (#1914)
Browse files Browse the repository at this point in the history
* Fix HC4 Regression Link

* Update HC4 Regressions Documentation - Add corpus Download
  • Loading branch information
ToluClassics authored Jun 18, 2022
1 parent f592832 commit dde86a9
Show file tree
Hide file tree
Showing 6 changed files with 90 additions and 2 deletions.
17 changes: 16 additions & 1 deletion docs/regressions-hc4-v1.0-fa.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Anserini Regressions: HC4 (v1.0) — Persian

This page documents BM25 regression experiments for [HC4 (v1.0) — Persian](https://github.com/hltcoe/HC4).
This page documents BM25 regression experiments for [HC4 (v1.0) — Persian](https://arxiv.org/pdf/2201.09992.pdf).

The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/hc4-v1.0-fa.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/hc4-v1.0-fa.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
Expand All @@ -11,6 +11,19 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf
python src/main/python/run_regression.py --index --verify --search --regression hc4-v1.0-fa
```

## Corpus Download

The HC4 corpus can be downloaded following the instructions [here](https://github.com/hltcoe/HC4).

After download, verify that all and only specified documents have been downloaded by running the code [provided here](https://github.com/hltcoe/HC4#postprocessing-of-the-downloaded-documents).

With the corpus downloaded, unpack into `collections/` and run the following command to perform the remaining steps below:

```bash
python src/main/python/run_regression.py --index --verify --search --regression hc4-v1.0-fa \
--corpus-path collections/hc4-v1.0-fa
```

## Indexing

Typical indexing command:
Expand Down Expand Up @@ -62,3 +75,5 @@ With the above commands, you should be able to reproduce the following results:
|:-------------------------------------------------------------------------------------------------------------|-----------|
| [HC4 (Persian): dev-topic title](https://github.com/hltcoe/HC4) | 0.2919 |
| [HC4 (Persian): dev-topic description](https://github.com/hltcoe/HC4) | 0.3188 |

The Above results are reproduction of the BM25 title queries run in [table 7 of this paper](https://arxiv.org/pdf/2201.08471.pdf)
14 changes: 14 additions & 0 deletions docs/regressions-hc4-v1.0-ru.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,20 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf
python src/main/python/run_regression.py --index --verify --search --regression hc4-v1.0-ru
```

## Corpus Download

The HC4 corpus can be downloaded following the instructions [here](https://github.com/hltcoe/HC4).

After download, verify that all and only specified documents have been downloaded by running the code [provided here](https://github.com/hltcoe/HC4#postprocessing-of-the-downloaded-documents).

With the corpus downloaded, unpack into `collections/` and run the following command to perform the remaining steps below:

```bash
python src/main/python/run_regression.py --index --verify --search --regression hc4-v1.0-ru \
--corpus-path collections/hc4-v1.0-ru
```


## Indexing

Typical indexing command:
Expand Down
15 changes: 15 additions & 0 deletions docs/regressions-hc4-v1.0-zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,19 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf
python src/main/python/run_regression.py --index --verify --search --regression hc4-v1.0-zh
```

## Corpus Download

The HC4 corpus can be downloaded following the instructions [here](https://github.com/hltcoe/HC4).

After download, verify that all and only specified documents have been downloaded by running the code [provided here](https://github.com/hltcoe/HC4#postprocessing-of-the-downloaded-documents).

With the corpus downloaded, unpack into `collections/` and run the following command to perform the remaining steps below:

```bash
python src/main/python/run_regression.py --index --verify --search --regression hc4-v1.0-zh \
--corpus-path collections/hc4-v1.0-zh
```

## Indexing

Typical indexing command:
Expand Down Expand Up @@ -62,3 +75,5 @@ With the above commands, you should be able to reproduce the following results:
|:-------------------------------------------------------------------------------------------------------------|-----------|
| [HC4 (Chinese): dev-topic title](https://github.com/hltcoe/HC4) | 0.2914 |
| [HC4 (Chinese): dev-topic description](https://github.com/hltcoe/HC4) | 0.1983 |

The Above results are reproduction of the BM25 title queries run in [table 7 of this paper](https://arxiv.org/pdf/2201.08471.pdf)
17 changes: 16 additions & 1 deletion src/main/resources/docgen/templates/hc4-v1.0-fa.template
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Anserini Regressions: HC4 (v1.0) — Persian

This page documents BM25 regression experiments for [HC4 (v1.0) — Persian](https://github.com/hltcoe/HC4).
This page documents BM25 regression experiments for [HC4 (v1.0) — Persian](https://arxiv.org/pdf/2201.09992.pdf).

The exact configurations for these regressions are stored in [this YAML file](${yaml}).
Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
Expand All @@ -11,6 +11,19 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf
python src/main/python/run_regression.py --index --verify --search --regression ${test_name}
```

## Corpus Download

The HC4 corpus can be downloaded following the instructions [here](https://github.com/hltcoe/HC4).

After download, verify that all and only specified documents have been downloaded by running the code [provided here](https://github.com/hltcoe/HC4#postprocessing-of-the-downloaded-documents).

With the corpus downloaded, unpack into `collections/` and run the following command to perform the remaining steps below:

```bash
python src/main/python/run_regression.py --index --verify --search --regression ${test_name} \
--corpus-path collections/${corpus}
```

## Indexing

Typical indexing command:
Expand Down Expand Up @@ -41,3 +54,5 @@ ${eval_cmds}
With the above commands, you should be able to reproduce the following results:

${effectiveness}

The Above results are reproduction of the BM25 title queries run in [table 7 of this paper](https://arxiv.org/pdf/2201.08471.pdf)
14 changes: 14 additions & 0 deletions src/main/resources/docgen/templates/hc4-v1.0-ru.template
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,20 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf
python src/main/python/run_regression.py --index --verify --search --regression ${test_name}
```

## Corpus Download

The HC4 corpus can be downloaded following the instructions [here](https://github.com/hltcoe/HC4).

After download, verify that all and only specified documents have been downloaded by running the code [provided here](https://github.com/hltcoe/HC4#postprocessing-of-the-downloaded-documents).

With the corpus downloaded, unpack into `collections/` and run the following command to perform the remaining steps below:

```bash
python src/main/python/run_regression.py --index --verify --search --regression ${test_name} \
--corpus-path collections/${corpus}
```


## Indexing

Typical indexing command:
Expand Down
15 changes: 15 additions & 0 deletions src/main/resources/docgen/templates/hc4-v1.0-zh.template
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,19 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf
python src/main/python/run_regression.py --index --verify --search --regression ${test_name}
```

## Corpus Download

The HC4 corpus can be downloaded following the instructions [here](https://github.com/hltcoe/HC4).

After download, verify that all and only specified documents have been downloaded by running the code [provided here](https://github.com/hltcoe/HC4#postprocessing-of-the-downloaded-documents).

With the corpus downloaded, unpack into `collections/` and run the following command to perform the remaining steps below:

```bash
python src/main/python/run_regression.py --index --verify --search --regression ${test_name} \
--corpus-path collections/${corpus}
```

## Indexing

Typical indexing command:
Expand Down Expand Up @@ -41,3 +54,5 @@ ${eval_cmds}
With the above commands, you should be able to reproduce the following results:

${effectiveness}

The Above results are reproduction of the BM25 title queries run in [table 7 of this paper](https://arxiv.org/pdf/2201.08471.pdf)

0 comments on commit dde86a9

Please # to comment.