Markov Decision Process (WIP)

Implementation of MDP in Java using Deep Q-Learning

Classes:

The class to be used is Agent.
Constructor for Agent expects the state_size and action_size as input, to initialize the neural network for predicting q-values.
Agent.get_action(...) accepts a current state and outputs the action index taken. The action can either be random, or determined by the q-table.
Agent.update_EXPLORATION_PROB() updates the probability with which the agent takes deterministic actions.
Agent.add_replay(...) accepts and stores the tuple (s_t, a_t, r_{t+1}, s_{t+1}) in the memory buffer
Agent.train() samples a random batch of BATCH_SIZE episodes and runs the neural network training on them

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Agent.java		Agent.java
Matrix.java		Matrix.java
Memory.java		Memory.java
NeuralNetwork.java		NeuralNetwork.java
README.md		README.md
Vector.java		Vector.java