Anomaly Detection with Reinforcement Learning

Intro

The anomaly detection problem cab be modeled as a markov decision process (MDP) in RL. An MPD is represented by $\langle S, A, P_a, R_a, \gamma \rangle$, where:

$S$ is the set of environment states (at $t$ the system is in $s_t \in S$)
$A$ is the set of actions (at $t$ the agents chooses the action $a_t \in A$)
$P_a$ is the probability to do the action $a$ which leads to the state $s_{t+1}$ $\rightarrow$ $P_a = P(s_{t+1}| S=s_t, A=a_t)$
$R_a$ is the instant reward received by the agent from taking the action $a$
$\gamma \in [0,1]$ is the discount factor

The agent tries to learn the policy $\pi:S \to A$ that maximizes the future reward $\sum_t^{T_\infty} \gamma^t R_t$.

RL Elements Definitions

Time series anomaly detection could be considered as an MDP because the decision of normal or abnormal at the current time step will change the environment by whether it triggers an anomaly detection or not. And the next decision will be influenced by the changing environment.

State: a single state is a time series with $n$ time steps [$s_i, \dots, s_{i+t}$] with an associated label 0/1 that represents a normal/abnormal behaviour. Maybe it is also possible to construct a second part of the state $s_{actions} = [a_i, \dots, a_{i+n}]$ which collects the sequence of previous actions.
In our case, the idea is for instance to aggregate the dataset in 1 minute timestamps, and then to create time series of 1 hour of duration.
- How to define the label? Either use timeseries of Sunday [9:00-16:00] as anomalous or just look at the last timestamp if it is in the givern range
- Or we can assign a binary label to each timestamp of the timeseries
Action: the action space is $A={0,1}$, where 0 is normal and 1 is anomaly.
Reward: $$ R(s,a) = \begin{cases} A & \text{if action is TP} \ -B & \text{if action is FP} \ -C & \text{if action is FN} \ D & \text{if action is TN} \ \end{cases} $$

DQN Algorithm

Points are taken from slide 17 (ML4MobCom_04c).

Preprocess and feed the state s to our DQN, which will return the Q-values of all possible actions in the state
- state s is a time series
- DQN is LSTM Network with binary prob output $\to$ it returns Q-values for the 2 possible actions (or a binary value for each timestamp?)
Select an action using the epsilon-greedy policy
- Actions are ${0,1}$: with probability $1-\epsilon$ choose max Q-value from DQN, otherwise random
Perform action in s and move to s' to receive reward. Store transition $\langle s, a, r, s' \rangle$ to replay buffer.
- how do I select next state s' given a, since our states are time series?
  - Do I need to calculate a sort of similarity between them?
  - if action is 1 go to similar ts, otherwise to the most different? Need to create pseudo-labels to decide where to move?
  - need ton populate replay buffer before starting RL algorithm (maybe using other methods - luminol/prophet)?
sample some random batches of transitions from the replay buffer and calculate the loss
- target DQN is copy of DQN but updated less frequent
- need to pre-populate replay buffer (otherwise in first episodes I can't sample batches)?
gradient descent with respect to our actual network parameters
- update DQN
After every N iterations, copy our actual network weights to the target network weights
- update target DQN copying DQN state every N iters
Repeat these steps for M number of episodes

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.vscode		.vscode
config		config
custom		custom
docs		docs
kerasRL		kerasRL
stableBaselinev3		stableBaselinev3
unsupervised		unsupervised
.gitignore		.gitignore
Exam_2022.pdf		Exam_2022.pdf
README.md		README.md
dataset.py		dataset.py
eda.ipynb		eda.ipynb
env.py		env.py
main-sb3.py		main-sb3.py
tmp.ipynb		tmp.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Anomaly Detection with Reinforcement Learning

Intro

RL Elements Definitions

DQN Algorithm

References

About

Releases

Packages

Languages

Agos95/Anomaly-Detection-with-RL

Folders and files

Latest commit

History

Repository files navigation

Anomaly Detection with Reinforcement Learning

Intro

RL Elements Definitions

DQN Algorithm

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages