An implementation of the value iteration algorithm based on "Reinforcement Learning: An Introduction (Second edition)." Here, we iterate the Q table.
We are using the settings similar to the settings for Figure 17.1 on page 646 in the Artificial Intelligence: A Modern Approach (Third Edition) (But not the same!).
“A simple 4 × 3 environment that presents the agent with a sequential decision problem.”
“The "intended" outcome occurs with probability 0.8, but with probability 0.2 the agent moves at right angles to the intended direction. A collision with a wall results in no movement. The two terminal states have reward +1 and -1, respectively, and all other states have a reward of -0.04." (Artificial Intelligence: A Modern Approach (Third Edition), P646)