-
Notifications
You must be signed in to change notification settings - Fork 6
Home
Welcome to the Intrepid: Interactive Representation Discovery wiki!
This wiki contains detailed instructions on how to use this repository. We begin with a quick summary. If you have any questions then please raise an issue and tag it as [Question].
Intrepid is a repository that contains a list of decision-making algorithms (which includes bandits and reinforcement learning as special cases). A decision-making algorithm helps a decision-making agent to well simply make decisions.
A core focus of Intrepid is on Decision Making which requires learning a latent state/representation of the world. E.g., consider an agent that is navigating in an image-based environment. The observation here is the image generated by its camera while a good latent state could be the position of the agent in the world, along with any dynamic obstacles.
Intrepid consists of the following components:
-
Core learning algorithms. These are mostly located in
./src/learning/core_learner
with algorithm-specific util functionality insrc/learning/learning_utils
. E.g., the Homer algorithm is implemented in./src/learning/core_learner/homer.py
. See the algorithm page for full list and description of these algorithms. The learning utils for example consist of a generic learner class in./src/learning/learning_utils/generic_learner.py
or routines that perform independence test. -
Useful Decision-Making Tools: This includes a variety of packages that are routinely sued across algorithms. This includes:
- methods for generating episode (
./src/learning/core_learner/policy_roll_in
) - methods for policy search given either offline data or a set of exploration policies (
./src/learning/core_learner/policy_search
) - a variety of self-supervised learning objectives for learning latent states (
./src/learning/core_learner/state_abstraction
). This includes autoencoder, inverse dynamics, temporal contrastive learning, and multi-step inverse dynamics. For legacy reasons, at times the inverse dynamics is referred to as inverse kinematics in the code.
- methods for generating episode (
-
We include a large list of models that includes various encoders, inverse dynamics models, and generic classifiers (
./src/model
).- A list of policies that map an observation (or history including time) can be found in
./src/model/policy
- A list of encoders that map observation to a latent state representation (either discrete or continuous) can be found in
./src/model/encoders
- A list of decoders that map latent state representation to observation can be found in
./src/model/decoders
- A list of classifiers can be found in
./src/model/classifiers
- A list of models for forward dynamics that map a given observation and action to the next observation can be found in
./src/model/forward_model
- A list of models for inverse dynamics that map a given ordered pair of observations to the action that can take the agent from the former observation to the latter can be found in
./src/model/inverse_dynamics
- A list of policies that map an observation (or history including time) can be found in
-
Set of environments and environment wrappers for popular existing domains (to be installed separately).
-
We include some challenging exploration problems with relatively simple observational space for quick proof-of-concept studies where the focus is not on realistic observational noise but on exploration and planning. (
./src/environments/rl_acid_env
) -
We also include several grid world instances built on top of the Minigrid environment (
./src/environments/minigrid
). You will have to install minigrid using requirements file or on your own.
-