Reinforcement Learning

Temporal Difference Learning - SARSA & Q-Learning

LINK@KoreaTech
Febraury, 14, 2019
Email: link.koreatech at gmail.com
This site is made by using the source codes shared from the site, REINFORCEjs

SARSA


  • Sum of all state values: -1
  • Total number of steps over all episodes: -1
  • Status: RESET
  • Epsilon: 0.2
  • Discount Factor (γ): 0.75
  • Initial epsilon value for ε-greedy policy (ε): 0.2
  • Epsilon decay rate (η): 0.02
  • Learning rate (α): 0.1
Sum of each step's reward over episodes:
Episode: 0



Q-learning


  • Sum of all state values: -1
  • Total number of steps over all episodes: -1
  • Status: RESET
  • Epsilon: 0.2
  • Discount Factor (γ): 0.75
  • Initial epsilon value for ε-greedy policy (ε): 0.2
  • Epsilon decay rate (η): 0.02
  • Learning rate (α): 0.1
Sum of each step's reward over episodes:
Episode: 0


Laboratory Partners