Keeping a ball in the air by bouncing it off a quadcopter for as many times as possible. We wanted to explore reinforcement learning algorithms.
![]() |
![]() |
![]() |
- Tanishka Singh (
- Deepak Kala Vasudevan (
- Nikhil Agarwal (
We recommend using Ubuntu 16 to run the code.
- Install latest version of V-Rep Pro Edu
- Python 2.7 is required
- Install latest version of tensorflow using pip install tensorflow
Navigate to where simulator is downloaded and use path of provided environment file and run:
./ quad_env.ttt
To run in headless mode
./ -h quad_env.ttt
Download and unzip the code, navigate to the unzipped folder and run:
$ python [algorithm] [action] [number of episodes] [steps per episode]
options | values |
algorithm | pg or vpg or ppo |
action | eval or train |
number of episodes | default = 200 |
steps per episode | default = 50 |
- contains code that we worked on initially and later abandoned as we could not resolve issues. (uses ros, gazebo, sphinx)
A class of reinforcement learning techniques that rely upon optimizing parametrized policies with respect to the expected return (long-term cumulative reward) by gradient descent. The actor directly learns the policy function that map states to actions
- Simple Policy Gradient (SARSA)
- Vanilla Policy Gradient (VPG)
- Proximal Policy Optimization (PPO)