Title | Action Type | Action Shape | Action Values | Observation Shape | Observation Values | Average Total Reward | Import |
---|---|---|---|---|---|---|---|
Repeat Copy | Discrete | (3,) | [(0, 1),(0,1),(0,base-1)] | (1,) | (0,base) | from gym.envs.algorithmic import repeat_copy |
This task involves copying content from the input tape to the output tape in normal order, reverse order and normal order, for example for input [x1 x2 …xk] the required output is [x1 x2 …xk xk …x2 x1 x1 x2 …xk] . This task was originally used in the paper Learning Simple Algorithms from Examples.
The model has to learn:
- correspondence between input and output symbols.
- executing the move left and right action on input tape.
The agent take a 3-element vector for actions.
The action space is (x, w, v)
, where:
x
is used for left/right movement. It can take values (0,1).w
is used for writing to output tape or not. It can take values (0,1).r
is used for selecting the value to be written on output tape.
The observation space size is (1,)
.
Rewards:
Rewards are issued similar to other Algorithmic Environments. Reward schedule:
- write a correct character: +1
- write a wrong character: -.5
- run out the clock: -1
- otherwise: 0
gym.make('RepeatCopy-v0', base=5)
base
: Number of distinct characters to read/write.
- v0: Initial versions release (1.0.0)