“Student-professor relationships are based on trust. Acts,
which violate this trust, undermine the educational process.
Your classmates and the professor will not tolerate violations
of academic integrity”
# | Date | Paper Presentation | Lecture Subject | Notice |
---|---|---|---|---|
01 | 03월 08일(화) |
- 강의 소개 - Introduction to Entropy, Cross Entropy, KL Divergence, Likelihood, Maximum Likelihood Estimation (MLE), and Maximum a Posteriori Estimation (MAP) 강의 노트 |
- link_rl 프레임워크 사용법 소개 | - |
02 | 03월 15일(화) |
- Introduction to Deep Reinforcement Learning (1/4)
강의 노트 - Introduction to Deep Reinforcement Learning (2/4) 강의 노트 |
Homework #1 . Gym Cartpole & Lunarlander control by using DQN agent on the 'link_rl' framework (Due Date: April 6, 23:59:59) Guide for Homework #1 | |
03 | 03월 22일(화) |
- Introduction to Deep Reinforcement Learning (3/4)
강의 노트 - Introduction to Deep Reinforcement Learning (4/4) 강의 노트 |
- Tabular Q-Learning | |
04 | 03월 29일(화) | - Volodymyr Mnih et. al., "Playing Atari with Deep Reinforcement Learning," NIPS 2013. (발표자: 유연휘, 한호준) 발표 자료 | - DQN (Deep Q-Networks) OFF-POLICY 강의 노트 | |
05 | 04월 5일(화) |
- Hado van Hasselt, Arthur Guez, and David Silver, "Deep Reinforcement Learning with Double Q-learning," AAAI 2016. (발표자: 최대준, 최상원)
발표 자료
- Ziyu Wang, et. al., "Dueling Network Architectures for Deep Reinforcement Learning," ICML 2016. (발표자: 김규령, 유승관, 최요한) 발표 자료 |
- Double DQN OFF-POLICY
- Dueling DQN OFF-POLICY - Double Dueling DQN OFF-POLICY |
|
06 | 04월 12일(화) | - Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver, "Prioritized Experience Replay," ICLR 2016. (발표자: 권준형, 김범영) 발표 자료 | - PER (Prioritized Experience Replay) 강의 노트 | Homework #2 . Pong control by using four DQN algorithms on the 'link_rl' framework (Due Date: April 30, SAT, 23:59:59) Guide for Homework #2 |
07 | 04월 19일(화) | - Chapter 2. BACKGROUND in "Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs," Schulman, University of California, Berkeley, 2016. (발표자: 김현재, 상희민, 석영준) 발표 자료 | - PG (Policy Gradient) & REINFORCE ON-POLICY 강의 노트 | |
08 | 04월 26일(화) | - Volodymyr Mnih et. al., "Asynchronous Methods for Deep Reinforcement Learning," ICML 2016. (발표자: 김성현, 이재승) 발표 자료 |
- Discrete A2C (Advantage Actor-Critic) ON-POLICY
강의 노트
&
- Discrete A3C (Asynchronous Advantage Actor-Critic) ON-POLICY |
|
09 | 05월 03일(화) | - John Schulman et. al., "Proximal Policy Optimization Algorithms," CoRR, abs/1707.06347, 2017. (발표자: 박효경, 이서영) 발표 자료 | - Discrete PPO (Proximal Policy Gradient) ON-POLICY 강의 노트 [New & Complete @ 2022] | |
10 | 05월 10일(화) | - John Schulman et. al., "High-Dimensional Continuous Control Using Generalized Advantage Estimation", ICLR 2016. (발표자: 김회창, 차민혁, 최호빈) 발표 자료 |
- Continuous A2C ON-POLICY
- Continuous A3C ON-POLICY - Continuous PPO ON-POLICY |
|
11 | 05월 17일(화) | - Timothy P. Lillicrap et. al., "Continuous Control with Deep Reinforcement Learning," arXiv preprint arXiv:1509.02971, 2015. (발표자: 이성준, 유진) 발표 자료 | - DDPG (Deep Deterministic Policy Gradient) OFF-POLICY 강의 노트 | Homework #3 . Bipedal Walker control by using A2C/A3C/PPO/TD3/SAC on the 'link_rl' framework (Due Date: June 11, SAT, 23:59:59) Guide for Homework #3 |
12 | 05월 24일(화) | - Scott Fujimoto, Herke van Hoof, and David Meger, "Addressing Function Approximation Error in Actor-Critic Methods," ICML 2018. (발표자: 강병창, 현주영) 발표 자료 | - TD3 (Deep Deterministic Policy Gradient) OFF-POLICY 강의 노트 | |
13 | 05월 31일(화) | - Tuomas Haarnoja et. al., "Soft Actor-Critic Algorithms and Applications," CoRR abs/1812.05905, 2018. (발표자: 김동환, 임상훈) 발표 자료 | - SAC (Soft Actor-Critic) OFF-POLICY 강의 노트 | |
14 | 06월 07일(화) | - Deepak Pathak et. al., "Curiosity-driven Exploration by Self-supervised Prediction," ICML 2017. (발표자: 용성중, 송병진) 발표 자료 | - SAC (Soft Actor-Critic) - Alpha Tuning OFF-POLICY | |
15 | 06월 14일(화) | 기말 고사 |