Skip to content

Commit

Permalink
Readme update
Browse files Browse the repository at this point in the history
  • Loading branch information
fradav committed Jul 10, 2019
1 parent b151549 commit 5a51cda
Show file tree
Hide file tree
Showing 3 changed files with 94 additions and 93 deletions.
76 changes: 36 additions & 40 deletions README-ORIG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,42 +24,54 @@ Libraries we use :

As a mention, we use our own implementation of LDA and PLS from [@friedman2001elements{81, 114}].

There is two sets of binaries, one for model choice [```ModelChoice```](#model-choice), another for parameter estimation [```EstimParam```](#parameter-estimation). Each set contains a Macos/Linux/Windows (x64 only) binary for each platform.
There is one set of binaries, which contains a Macos/Linux/Windows (x64 only) binary for each platform.
There are available within the "[Releases](https://github.com/fradav/abcranger/releases)" tab, under "Assets" section (unfold it to see the list).

Those are pure command line binaries, and they are no prerequisites or library dependencies in order to run them. Just download them and launch them from your terminal software of choice. The usual caveats with command line executable apply there : if you're not proficient with the command line interface of your platform, please learn some basics or ask someone who might help you in those matters.
This is pure command line binary, and they are no prerequisites or library dependencies in order to run it. Just download them and launch them from your terminal software of choice. The usual caveats with command line executable apply there : if you're not proficient with the command line interface of your platform, please learn some basics or ask someone who might help you in those matters.

As a note, we may add a graphical interface in a near future.

# Model Choice

## Usage
# Usage

```text
- ABC Random Forest/Model choice command line options
- ABC Random Forest - Model choice or parameter estimation command line options
Usage:
ModelChoice [OPTION...]
abcranger [OPTION...]
-h, --header arg Header file (default: headerRF.txt)
-r, --reftable arg Reftable file (default: reftableRF.bin)
-b, --statobs arg Statobs file (default: statobsRF.txt)
-o, --output arg Prefix output (default: modelchoice_out)
-o, --output arg Prefix output (modelchoice_out or estimparam_out by
default)
-n, --nref arg Number of samples, 0 means all (default: 0)
-m, --minnodesize arg Minimal node size. 0 means 1 for classification or
5 for regression (default: 0)
-t, --ntree arg Number of trees (default: 500)
-j, --threads arg Number of threads, 0 means all (default: 0)
-s, --seed arg Seed, 0 means generated (default: 0)
-s, --seed arg Seed, generated by default (default: 0)
-c, --noisecolumns arg Number of noise columns (default: 5)
-l, --lda Enable LDA (default: true)
--nolinear Disable LDA for model choice or PLS for parameter
estimation
--chosenscen arg Chosen scenario (mandatory for parameter
estimation)
--ntest arg number of testing samples (mandatory for parameter
estimation)
--parameter arg name of the parameter of interest (mandatory for
parameter estimation)
--help Print help
```

- If you provide `--chosenscen`, `--parameter` and `--ntest`, parameter estimation mode is selected.
- Otherwise by default it's model choice mode.
- Linear additions are LDA for model choice and PLS for parameter estimation, "--nolinear" options disables them in both case.

# Model Choice

## Example

Example :

`ModelChoice -t 10000 -j 8`
`abcranger -t 10000 -j 8`

Header, reftable and statobs files should be in the current directory.

Expand All @@ -74,47 +86,31 @@ Four files are created :

# Parameter Estimation

Note : The Pls components are selected within 99% of the explained variance of the output.
As in for the $m$th component and for $N$ samples and $M$ features:
## A note about PLS heuristic

The Pls components are selected within _at least_ 99% of the maximum explained variance of the output.

$$Yvar^m = \frac{\sum_{i=1}^{N}{(\hat{y}^{m}_{i}-\bar{y})^2}}{\sum_{i=1}^{N}{(y_{i}-\hat{y})^2}}$$

where $\hat{y}^{m}$ is the $Y$ scored by the pls for the $m$th component.
We take only the first $n_{comp}$ components as in :
We take only the first $n_{heur}$ components, we stop when :

$$n_{comp} = \underset{Yvar^m \leq{} 0.99*Yvar^M, }{\operatorname{argmax}}$$
$$\frac{Yvar^{k+1}+Yvar^{k}}{2} \geq 0.99(N-k)\left(Yvar^{k+1}-Yvar^ {k}\right)$$

## Usage
We can easily prove than $n_{heur}$ is superior or equal to $n_{comp}$ :
$$n_{heur} \ge n_{comp} = \underset{Yvar^m \leq{} 0.99*Yvar^M, }{\operatorname{argmax}}$$

```text
- ABC Random Forest/Model parameter estimation command line options
Usage:
EstimParam [OPTION...]
In practice, we find $n_{heur}$ close enough to $n_{comp}.

-h, --header arg Header file (default: headerRF.txt)
-r, --reftable arg Reftable file (default: reftableRF.bin)
-b, --statobs arg Statobs file (default: statobsRF.txt)
-o, --output arg Prefix output (default: estimparam_out)
-n, --nref arg Number of samples, 0 means all (default: 0)
-m, --minnodesize arg Minimal node size. 0 means 1 for classification or
5 for regression (default: 0)
-t, --ntree arg Number of trees (default: 500)
-j, --threads arg Number of threads, 0 means all (default: 0)
-s, --seed arg Seed, 0 means generated (default: 0)
-c, --noisecolumns arg Number of noise columns (default: 5)
-p, --pls Enable PLS (default: true)
--chosenscen arg Chosen scenario (mandatory)
--ntrain arg number of training samples (mandatory)
--ntest arg number of testing samples (mandatory)
--parameter arg name of the parameter of interest (mandatory)
--help Print help
```
## The signification of the `ntest` parameter

Computing the whole OOB set for weights predictions (see [@raynal2016abc]), is very costly, memory and cpu-wise, so we advise to compute them for only choose a subset of size `ntest`.

## Example

Example (working with the dataset in `test/data`) :

`EstimParam -t 1000 -j 8 --parameter ra --chosenscen 1 --ntrain 1000 --ntest 50`
`abcranger -t 1000 -j 8 --parameter ra --chosenscen 1 --ntest 50`

Header, reftable and statobs files should be in the current directory.

Expand Down Expand Up @@ -143,7 +139,7 @@ if pls enabled :

## C++ standalone

- [ ] Merge the two methodologies in a single executable with the (almost) the same options
- [X] Merge the two methodologies in a single executable with the (almost) the same options
- [ ] \(Optional) Possibly move to another options parser (CLI?)

## External interfaces
Expand Down
109 changes: 57 additions & 52 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
- [Usage](#usage)
- [Model Choice](#model-choice)
- [Parameter Estimation](#parameter-estimation)
- [TODO](#todo)
- [References](#references)

<!-- pandoc -f markdown README-ORIG.md -t gfm -o README.md --bibliography=ref.bib -s --toc --toc-depth=1 -->

<!-- pandoc --atx-headers -f markdown README-ORIG.md -t gfm -o README.md --bibliography=ref.bib -s --toc --toc-depth=1 --webtex=https://latex.codecogs.com/png.latex? -->

[![Build
Status](https://travis-ci.com/fradav/abcranger.svg)](https://travis-ci.com/fradav/abcranger)

Expand All @@ -28,52 +31,63 @@ As a mention, we use our own implementation of LDA and PLS from
(Friedman, Hastie, and Tibshirani [2001](#ref-friedman2001elements),
1:81, 114).

There is two sets of binaries, one for model choice
[`ModelChoice`](#model-choice), another for parameter estimation
[`EstimParam`](#parameter-estimation). Each set contains a
Macos/Linux/Windows (x64 only) binary for each platform. There are
available within the
There is one set of binaries, which contains a Macos/Linux/Windows (x64
only) binary for each platform. There are available within the
[Releases](https://github.com/fradav/abcranger/releases)” tab, under
“Assets” section (unfold it to see the list).

Those are pure command line binaries, and they are no prerequisites or
library dependencies in order to run them. Just download them and launch
This is pure command line binary, and they are no prerequisites or
library dependencies in order to run it. Just download them and launch
them from your terminal software of choice. The usual caveats with
command line executable apply there : if you’re not proficient with the
command line interface of your platform, please learn some basics or ask
someone who might help you in those matters.

As a note, we may add a graphical interface in a near future.

# Model Choice

## Usage
# Usage

``` text
- ABC Random Forest/Model choice command line options
- ABC Random Forest - Model choice or parameter estimation command line options
Usage:
ModelChoice [OPTION...]
abcranger [OPTION...]
-h, --header arg Header file (default: headerRF.txt)
-r, --reftable arg Reftable file (default: reftableRF.bin)
-b, --statobs arg Statobs file (default: statobsRF.txt)
-o, --output arg Prefix output (default: modelchoice_out)
-o, --output arg Prefix output (modelchoice_out or estimparam_out by
default)
-n, --nref arg Number of samples, 0 means all (default: 0)
-m, --minnodesize arg Minimal node size. 0 means 1 for classification or
5 for regression (default: 0)
-t, --ntree arg Number of trees (default: 500)
-j, --threads arg Number of threads, 0 means all (default: 0)
-s, --seed arg Seed, 0 means generated (default: 0)
-s, --seed arg Seed, generated by default (default: 0)
-c, --noisecolumns arg Number of noise columns (default: 5)
-l, --lda Enable LDA (default: true)
--nolinear Disable LDA for model choice or PLS for parameter
estimation
--chosenscen arg Chosen scenario (mandatory for parameter
estimation)
--ntest arg number of testing samples (mandatory for parameter
estimation)
--parameter arg name of the parameter of interest (mandatory for
parameter estimation)
--help Print help
```

- If you provide `--chosenscen`, `--parameter` and `--ntest`,
parameter estimation mode is selected.
- Otherwise by default it’s model choice mode.
- Linear additions are LDA for model choice and PLS for parameter
estimation, “–nolinear” options disables them in both case.

# Model Choice

## Example

Example :

`ModelChoice -t 10000 -j 8`
`abcranger -t 10000 -j 8`

Header, reftable and statobs files should be in the current directory.

Expand All @@ -90,11 +104,10 @@ Four files are created :

# Parameter Estimation

Note : The Pls components are selected within 99% of the explained
variance of the output. As in for the
![m](https://latex.codecogs.com/png.latex?m "m")th component and for
![N](https://latex.codecogs.com/png.latex?N "N") samples and
![M](https://latex.codecogs.com/png.latex?M "M") features:
## A note about PLS heuristic

The Pls components are selected within *at least* 99% of the maximum
explained variance of the output.


![Yvar^m =
Expand All @@ -106,46 +119,38 @@ where
"\\hat{y}^{m}") is the ![Y](https://latex.codecogs.com/png.latex?Y "Y")
scored by the pls for the ![m](https://latex.codecogs.com/png.latex?m
"m")th component. We take only the first
![n\_{comp}](https://latex.codecogs.com/png.latex?n_%7Bcomp%7D
"n_{comp}") components as in :
![n\_{heur}](https://latex.codecogs.com/png.latex?n_%7Bheur%7D
"n_{heur}") components, we stop when :


![n\_{comp} = \\underset{Yvar^m \\leq{} 0.99\*Yvar^M,
}{\\operatorname{argmax}}](https://latex.codecogs.com/png.latex?n_%7Bcomp%7D%20%3D%20%5Cunderset%7BYvar%5Em%20%5Cleq%7B%7D%200.99%2AYvar%5EM%2C%20%7D%7B%5Coperatorname%7Bargmax%7D%7D
"n_{comp} = \\underset{Yvar^m \\leq{} 0.99*Yvar^M, }{\\operatorname{argmax}}")
![\\frac{Yvar^{k+1}+Yvar^{k}}{2} \\geq 0.99(N-k)\\left(Yvar^{k+1}-Yvar^
{k}\\right)](https://latex.codecogs.com/png.latex?%5Cfrac%7BYvar%5E%7Bk%2B1%7D%2BYvar%5E%7Bk%7D%7D%7B2%7D%20%5Cgeq%200.99%28N-k%29%5Cleft%28Yvar%5E%7Bk%2B1%7D-Yvar%5E%20%7Bk%7D%5Cright%29
"\\frac{Yvar^{k+1}+Yvar^{k}}{2} \\geq 0.99(N-k)\\left(Yvar^{k+1}-Yvar^ {k}\\right)")

## Usage
We can easily prove than
![n\_{heur}](https://latex.codecogs.com/png.latex?n_%7Bheur%7D
"n_{heur}") is superior or equal to
![n\_{comp}](https://latex.codecogs.com/png.latex?n_%7Bcomp%7D
"n_{comp}") :
![n\_{heur} \\ge n\_{comp} = \\underset{Yvar^m \\leq{} 0.99\*Yvar^M,
}{\\operatorname{argmax}}](https://latex.codecogs.com/png.latex?n_%7Bheur%7D%20%5Cge%20n_%7Bcomp%7D%20%3D%20%5Cunderset%7BYvar%5Em%20%5Cleq%7B%7D%200.99%2AYvar%5EM%2C%20%7D%7B%5Coperatorname%7Bargmax%7D%7D
"n_{heur} \\ge n_{comp} = \\underset{Yvar^m \\leq{} 0.99*Yvar^M, }{\\operatorname{argmax}}")

``` text
- ABC Random Forest/Model parameter estimation command line options
Usage:
EstimParam [OPTION...]
In practice, we find
![n\_{heur}](https://latex.codecogs.com/png.latex?n_%7Bheur%7D
"n_{heur}") close enough to $n\_{comp}.

-h, --header arg Header file (default: headerRF.txt)
-r, --reftable arg Reftable file (default: reftableRF.bin)
-b, --statobs arg Statobs file (default: statobsRF.txt)
-o, --output arg Prefix output (default: estimparam_out)
-n, --nref arg Number of samples, 0 means all (default: 0)
-m, --minnodesize arg Minimal node size. 0 means 1 for classification or
5 for regression (default: 0)
-t, --ntree arg Number of trees (default: 500)
-j, --threads arg Number of threads, 0 means all (default: 0)
-s, --seed arg Seed, 0 means generated (default: 0)
-c, --noisecolumns arg Number of noise columns (default: 5)
-p, --pls Enable PLS (default: true)
--chosenscen arg Chosen scenario (mandatory)
--ntrain arg number of training samples (mandatory)
--ntest arg number of testing samples (mandatory)
--parameter arg name of the parameter of interest (mandatory)
--help Print help
```
## The signification of the `ntest` parameter

Computing the whole OOB set for weights predictions (see (Raynal et al.
[2016](#ref-raynal2016abc))), is very costly, memory and cpu-wise, so we
advise to compute them for only choose a subset of size `ntest`.

## Example

Example (working with the dataset in `test/data`) :

`EstimParam -t 1000 -j 8 --parameter ra --chosenscen 1 --ntrain 1000
--ntest 50`
`abcranger -t 1000 -j 8 --parameter ra --chosenscen 1 --ntest 50`

Header, reftable and statobs files should be in the current directory.

Expand Down Expand Up @@ -180,7 +185,7 @@ if pls enabled :

## C++ standalone

- [ ] Merge the two methodologies in a single executable with the
- [x] Merge the two methodologies in a single executable with the
(almost) the same options
- [ ] (Optional) Possibly move to another options parser (CLI?)

Expand Down
2 changes: 1 addition & 1 deletion abcranger.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ int main(int argc, char* argv[]) {
("c,noisecolumns","Number of noise columns",cxxopts::value<size_t>()->default_value("5"))
("nolinear","Disable LDA for model choice or PLS for parameter estimation")
("chosenscen","Chosen scenario (mandatory for parameter estimation)", cxxopts::value<size_t>())
("ntest","number of testing samples (mandatory for parameter estimation)",cxxopts::value<size_t>())
("ntest","number of oob testing samples (mandatory for parameter estimation)",cxxopts::value<size_t>())
("parameter","name of the parameter of interest (mandatory for parameter estimation)",cxxopts::value<std::string>())
("help", "Print help")
;
Expand Down

0 comments on commit 5a51cda

Please # to comment.