Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

fix reward to one episode #9

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

fix reward to one episode #9

wants to merge 1 commit into from

Conversation

qrh1
Copy link

@qrh1 qrh1 commented Oct 29, 2018

Rewards should not discounted across different episodes. maybe episodes and steps are confused here?

@titu1994
Copy link
Owner

for t in reversed(range(0, rewards.size)):
            if rewards[t] != 0:
                running_add = 0
            running_add = running_add * self.discount_factor + rewards[t]
            discounted_rewards[t] = running_add
        return discounted_rewards[-1]

This us the discounted reward, which returns your values anyway.

@qrh1
Copy link
Author

qrh1 commented Oct 30, 2018

hi Somshubra, I still don't get it, could you please expain more?
In my understanding, each action is a step, the 8 actions is a episode. In RL, we usually discout rewards for history steps, but for each independent episode, rewards are caculated independent.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants