Skip to content

Commit 1c18754

Browse files
committed
Squash commits for FCS version
1 parent 3f3f179 commit 1c18754

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+2858
-1022
lines changed

.gitignore

+6-5
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,12 @@ __pycache__
22
.idea/
33
.vscode/
44
.DS_STORE
5-
playground.ipynb
6-
playground.py
7-
^data/
8-
out/
5+
playground*.ipynb
6+
playground*.py
7+
/data/
8+
/out/
99
logs/
1010
tb_logs/
1111
get_baostock_data.ipynb
12-
get_baostock_data.py
12+
get_baostock_data.py
13+
/utils/

README.md

+18-17
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,20 @@
66

77
Automatic formulaic alpha generation with reinforcement learning.
88

9-
Paper *Generating Synergistic Formulaic Alpha Collections via Reinforcement Learning* accepted by [KDD 2023](https://kdd.org/kdd2023/), Applied Data Science (ADS) track.
9+
This repository contains the code for our paper *Generating Synergistic Formulaic Alpha Collections via Reinforcement Learning* accepted by [KDD 2023](https://kdd.org/kdd2023/), Applied Data Science (ADS) track, publically available on [ACM DL](https://dl.acm.org/doi/10.1145/3580305.3599831). Some extensions upon this work are also included in this repo.
1010

11-
Paper available on [ACM DL](https://dl.acm.org/doi/10.1145/3580305.3599831) or [arXiv](https://arxiv.org/abs/2306.12964).
11+
## Repository Structure
12+
13+
- `/alphagen` contains the basic data structures and the essential modules for starting an alpha mining pipeline;
14+
- `/alphagen_qlib` contains the qlib-specific APIs for data preparation;
15+
- `/alphagen_generic` contains data structures and utils designed for our baselines, which basically follow [gplearn](https://github.com/trevorstephens/gplearn) APIs, but with modifications for quant pipeline;
16+
- `/alphagen_llm` contains LLM client abstractions and a set of prompts useful for LLM-based alpha generation, and also provides some LLM-based automatic iterative alpha-generation routines.
17+
- `/gplearn` and `/dso` contains modified versions of our baselines;
18+
- `/scripts` contains several scripts for running the experiments.
1219

13-
## How to reproduce?
20+
## Result Reproduction
1421

15-
Note that you can either use our builtin alpha calculation pipeline(see Choice 1), or implement an adapter to your own pipeline(see Choice 2).
22+
Note that you can either use our builtin alpha calculation pipeline (see Choice 1), or implement an adapter to your own pipeline (see Choice 2).
1623

1724
### Choice 1: Stock data preparation
1825

@@ -80,13 +87,14 @@ These parameters will define a RL run:
8087
- save_path (Path for checkpoints)
8188
- tensorboard_log (Path for TensorBoard)
8289

83-
### Run!
90+
### Run the experiments
8491

85-
```shell
86-
python train_maskable_ppo.py --seed=SEED --pool=POOL_CAPACITY --code=INSTRUMENTS --step=NUM_STEPS
87-
```
92+
Please run the individual scripts at the root directory of this project as modules, i.e. `python -m scripts.NAME ARGS...`.
93+
Use `python -m scripts.NAME -h` for information on the arguments.
8894

89-
Where `SEED` is random seed, e.g., `1` or `1,2`, `POOL_CAPACITY` is the size of combination model and, `NUM_STEPS` is the limit of RL steps.
95+
- `scripts/rl.py`: Main experiments of AlphaGen/HARLA
96+
- `scripts/llm_only.py`: Alpha generator based solely on iterative interactions with an LLM.
97+
- `scripts/llm_test_validity.py`: Tests on how the system prompt affects the valid alpha rate of an LLM.
9098

9199
### After running
92100

@@ -105,13 +113,6 @@ Where `SEED` is random seed, e.g., `1` or `1,2`, `POOL_CAPACITY` is the size of
105113

106114
[DSO](https://github.com/brendenpetersen/deep-symbolic-optimization) is a mature deep learning framework for symbolic optimization tasks. We maintained a minimal version of DSO to make it compatiable with our task. The corresponding experiment scipt is [dso.py](dso.py)
107115

108-
## Repository Structure
109-
110-
- `/alphagen` contains the basic data structures and the essential modules for starting an alpha mining pipeline;
111-
- `/alphagen_qlib` contains the qlib-specific APIs for data preparation;
112-
- `/alphagen_generic` contains data structures and utils designed for our baselines, which basically follow [gplearn](https://github.com/trevorstephens/gplearn) APIs, but with modifications for quant pipeline;
113-
- `/gplearn` and `/dso` contains modified versions of our baselines.
114-
115116
## Trading (Experimental)
116117

117118
We implemented some trading strategies based on Qlib. See [backtest.py](backtest.py) and [trade_decision.py](trade_decision.py) for demos.
@@ -147,4 +148,4 @@ Thanks to the following contributors:
147148

148149
Thanks to the following in-depth research on our project:
149150

150-
- *因子选股系列之九十五:DFQ强化学习因子组合挖掘系统*
151+
- *因子选股系列之九十五: DFQ强化学习因子组合挖掘系统*

alphagen/config.py

+4-3
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
1+
from typing import Type
12
from alphagen.data.expression import *
23

34

4-
MAX_EXPR_LENGTH = 20
5+
MAX_EXPR_LENGTH = 15
56
MAX_EPISODE_LENGTH = 256
67

7-
OPERATORS = [
8+
OPERATORS: List[Type[Operator]] = [
89
# Unary
910
Abs, # Sign,
1011
Log,
@@ -19,7 +20,7 @@
1920
Cov, Corr
2021
]
2122

22-
DELTA_TIMES = [10, 20, 30, 40, 50]
23+
DELTA_TIMES = [1, 5, 10, 20, 40]
2324

2425
CONSTANTS = [-30., -10., -5., -2., -1., -0.5, -0.01, 0.01, 0.5, 1., 2., 5., 10., 30.]
2526

alphagen/data/calculator.py

+95-6
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,10 @@
11
from abc import ABCMeta, abstractmethod
2-
from typing import List, Tuple
2+
from typing import Tuple, Optional, Sequence
3+
from torch import Tensor
4+
import torch
35

46
from alphagen.data.expression import Expression
7+
from alphagen.utils.correlation import batch_pearsonr, batch_spearmanr
58

69

710
class AlphaCalculator(metaclass=ABCMeta):
@@ -13,25 +16,111 @@ def calc_single_IC_ret(self, expr: Expression) -> float:
1316
def calc_single_rIC_ret(self, expr: Expression) -> float:
1417
'Calculate Rank IC between a single alpha and a predefined target.'
1518

16-
@abstractmethod
1719
def calc_single_all_ret(self, expr: Expression) -> Tuple[float, float]:
18-
'Calculate both IC and Rank IC between a single alpha and a predefined target.'
20+
return self.calc_single_IC_ret(expr), self.calc_single_rIC_ret(expr)
1921

2022
@abstractmethod
2123
def calc_mutual_IC(self, expr1: Expression, expr2: Expression) -> float:
2224
'Calculate IC between two alphas.'
2325

2426
@abstractmethod
25-
def calc_pool_IC_ret(self, exprs: List[Expression], weights: List[float]) -> float:
27+
def calc_pool_IC_ret(self, exprs: Sequence[Expression], weights: Sequence[float]) -> float:
2628
'First combine the alphas linearly,'
2729
'then Calculate IC between the linear combination and a predefined target.'
2830

2931
@abstractmethod
30-
def calc_pool_rIC_ret(self, exprs: List[Expression], weights: List[float]) -> float:
32+
def calc_pool_rIC_ret(self, exprs: Sequence[Expression], weights: Sequence[float]) -> float:
3133
'First combine the alphas linearly,'
3234
'then Calculate Rank IC between the linear combination and a predefined target.'
3335

3436
@abstractmethod
35-
def calc_pool_all_ret(self, exprs: List[Expression], weights: List[float]) -> Tuple[float, float]:
37+
def calc_pool_all_ret(self, exprs: Sequence[Expression], weights: Sequence[float]) -> Tuple[float, float]:
3638
'First combine the alphas linearly,'
3739
'then Calculate both IC and Rank IC between the linear combination and a predefined target.'
40+
41+
42+
class TensorAlphaCalculator(AlphaCalculator):
43+
def __init__(self, target: Optional[Tensor]) -> None:
44+
self._target = target
45+
46+
@property
47+
@abstractmethod
48+
def n_days(self) -> int: ...
49+
50+
@property
51+
def target(self) -> Tensor:
52+
if self._target is None:
53+
raise ValueError("A target must be set before calculating non-mutual IC.")
54+
return self._target
55+
56+
@abstractmethod
57+
def evaluate_alpha(self, expr: Expression) -> Tensor:
58+
'Evaluate an alpha into a `Tensor` of shape (days, stocks).'
59+
60+
def make_ensemble_alpha(self, exprs: Sequence[Expression], weights: Sequence[float]) -> Tensor:
61+
n = len(exprs)
62+
factors = [self.evaluate_alpha(exprs[i]) * weights[i] for i in range(n)]
63+
return torch.sum(torch.stack(factors, dim=0), dim=0)
64+
65+
def _calc_IC(self, value1: Tensor, value2: Tensor) -> float:
66+
return batch_pearsonr(value1, value2).mean().item()
67+
68+
def _calc_rIC(self, value1: Tensor, value2: Tensor) -> float:
69+
return batch_spearmanr(value1, value2).mean().item()
70+
71+
def _IR_from_batch(self, batch: Tensor) -> float:
72+
mean, std = batch.mean(), batch.std()
73+
return (mean / std).item()
74+
75+
def _calc_ICIR(self, value1: Tensor, value2: Tensor) -> float:
76+
return self._IR_from_batch(batch_pearsonr(value1, value2))
77+
78+
def _calc_rICIR(self, value1: Tensor, value2: Tensor) -> float:
79+
return self._IR_from_batch(batch_spearmanr(value1, value2))
80+
81+
def calc_single_IC_ret(self, expr: Expression) -> float:
82+
return self._calc_IC(self.evaluate_alpha(expr), self.target)
83+
84+
def calc_single_IC_ret_daily(self, expr: Expression) -> Tensor:
85+
return batch_pearsonr(self.evaluate_alpha(expr), self.target)
86+
87+
def calc_single_rIC_ret(self, expr: Expression) -> float:
88+
return self._calc_rIC(self.evaluate_alpha(expr), self.target)
89+
90+
def calc_single_all_ret(self, expr: Expression) -> Tuple[float, float]:
91+
value = self.evaluate_alpha(expr)
92+
target = self.target
93+
return self._calc_IC(value, target), self._calc_rIC(value, target)
94+
95+
def calc_mutual_IC(self, expr1: Expression, expr2: Expression) -> float:
96+
return self._calc_IC(self.evaluate_alpha(expr1), self.evaluate_alpha(expr2))
97+
98+
def calc_mutual_IC_daily(self, expr1: Expression, expr2: Expression) -> Tensor:
99+
return batch_pearsonr(self.evaluate_alpha(expr1), self.evaluate_alpha(expr2))
100+
101+
def calc_pool_IC_ret(self, exprs: Sequence[Expression], weights: Sequence[float]) -> float:
102+
with torch.no_grad():
103+
value = self.make_ensemble_alpha(exprs, weights)
104+
return self._calc_IC(value, self.target)
105+
106+
def calc_pool_rIC_ret(self, exprs: Sequence[Expression], weights: Sequence[float]) -> float:
107+
with torch.no_grad():
108+
value = self.make_ensemble_alpha(exprs, weights)
109+
return self._calc_rIC(value, self.target)
110+
111+
def calc_pool_all_ret(self, exprs: Sequence[Expression], weights: Sequence[float]) -> Tuple[float, float]:
112+
with torch.no_grad():
113+
value = self.make_ensemble_alpha(exprs, weights)
114+
target = self.target
115+
return self._calc_IC(value, target), self._calc_rIC(value, target)
116+
117+
def calc_pool_all_ret_with_ir(self, exprs: Sequence[Expression], weights: Sequence[float]) -> Tuple[float, float, float, float]:
118+
"Returns IC, ICIR, Rank IC, Rank ICIR"
119+
with torch.no_grad():
120+
value = self.make_ensemble_alpha(exprs, weights)
121+
target = self.target
122+
ics = batch_pearsonr(value, target)
123+
rics = batch_spearmanr(value, target)
124+
ic_mean, ic_std = ics.mean().item(), ics.std().item()
125+
ric_mean, ric_std = rics.mean().item(), rics.std().item()
126+
return ic_mean, ic_mean / ic_std, ric_mean, ric_mean / ric_std

alphagen/data/exception.py

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
class InvalidExpressionException(ValueError):
2+
pass

0 commit comments

Comments
 (0)