한국기술교육대학교 LINK 연구실 위치

Advanced Artificial Intelligence (인공지능 특강 [240030-01], Spring Semester, 2022)

“Student-professor relationships are based on trust. Acts, which violate this trust, undermine the educational process. Your classmates and the professor will not tolerate violations of academic integrity”

1. Course Schedule & Lecture Notes

[공지사항]

[2022.02.13] 본 수업을 추가적으로 신청하고자 하는 학생(또는 이미 수강신청을 완료한 학생)들에게 공지합니다. 본 수업은 심층강화학습을 뼈대가 되는 주요 내용 및 알고리즘을 15번의 수업에 걸쳐서 학습하는 내용을 담고 있습니다. 모든 학생들은 1) 아래 열거된 핵심 논문의 일부 내용을 공부하고 자료를 구성하여 직접 수업시간에 여러 대학원생들 앞에서 발표를 해야 하며, 2) 수업시간에 소개하는 LINK_RL 프레임워크를 이해하고 이를 기반으로 주요 심층강화학습 알고리즘을 이해하고 이를 기반으로 3번에 걸친 코딩 숙제 및 실험 평가 분석을 하여 리포트를 제출해야 하며, 3) 기말고사를 통하여 수업에서 다룬 심층강화학습 내용 및 알고리즘 전반에 걸친 이해도를 평가받게 됩니다.
[2022.02.13] 본 수업을 수강하기 위한 꼭 필요한 선수 지식: 1) 학부과정에서 알고리즘 등의 과목을 통하여 또는 과제/업무 수행을 통하여 컴퓨터 프로그래밍 기반으로 주어진 문제를 해결해본 경험, 2) 파이썬을 통하여 Tensorflow 또는 Pytorch 기반으로 딥러닝 관련 코딩 수행 경험, 3) 파이썬을 통한 가상 환경 구축 및 라이브러리 구축 경험
[2022.02.13] 학점은 A+/A, B+/B, C+/C, F 이렇게 총 4개의 그룹으로 나누어 부여할 예정이며, F로 평가될 학생이 없다면 각 3개 그룹당 학점의 분포는 40%, 40%, 20%로 나누어 부여할 예정이지만 강의가 종료된 이후 전반적인 학업성취도를 가늠하여 변경될 수 있습니다.
[2022.02.11] 강의에서 사용하는 심층강화학습 프레임워크 Github Repository:

#	Date	Paper Presentation	Lecture Subject	Notice
01	03월 08일(화)	- 강의 소개 - Introduction to Entropy, Cross Entropy, KL Divergence, Likelihood, Maximum Likelihood Estimation (MLE), and Maximum a Posteriori Estimation (MAP) 강의 노트	- link_rl 프레임워크 사용법 소개	-
02	03월 15일(화)	- Introduction to Deep Reinforcement Learning (1/4) 강의 노트 - Introduction to Deep Reinforcement Learning (2/4) 강의 노트		Homework #1 . Gym Cartpole & Lunarlander control by using DQN agent on the 'link_rl' framework (Due Date: April 6, 23:59:59) Guide for Homework #1
03	03월 22일(화)	- Introduction to Deep Reinforcement Learning (3/4) 강의 노트 - Introduction to Deep Reinforcement Learning (4/4) 강의 노트	- Tabular Q-Learning
04	03월 29일(화)	- Volodymyr Mnih et. al., "Playing Atari with Deep Reinforcement Learning," NIPS 2013. (발표자: 유연휘, 한호준) 발표 자료	- DQN (Deep Q-Networks) OFF-POLICY 강의 노트
05	04월 5일(화)	- Hado van Hasselt, Arthur Guez, and David Silver, "Deep Reinforcement Learning with Double Q-learning," AAAI 2016. (발표자: 최대준, 최상원) 발표 자료 - Ziyu Wang, et. al., "Dueling Network Architectures for Deep Reinforcement Learning," ICML 2016. (발표자: 김규령, 유승관, 최요한) 발표 자료	- Double DQN OFF-POLICY - Dueling DQN OFF-POLICY - Double Dueling DQN OFF-POLICY
06	04월 12일(화)	- Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver, "Prioritized Experience Replay," ICLR 2016. (발표자: 권준형, 김범영) 발표 자료	- PER (Prioritized Experience Replay) 강의 노트	Homework #2 . Pong control by using four DQN algorithms on the 'link_rl' framework (Due Date: April 30, SAT, 23:59:59) Guide for Homework #2
07	04월 19일(화)	- Chapter 2. BACKGROUND in "Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs," Schulman, University of California, Berkeley, 2016. (발표자: 김현재, 상희민, 석영준) 발표 자료	- PG (Policy Gradient) & REINFORCE ON-POLICY 강의 노트
08	04월 26일(화)	- Volodymyr Mnih et. al., "Asynchronous Methods for Deep Reinforcement Learning," ICML 2016. (발표자: 김성현, 이재승) 발표 자료	- Discrete A2C (Advantage Actor-Critic) ON-POLICY 강의 노트 & - Discrete A3C (Asynchronous Advantage Actor-Critic) ON-POLICY
09	05월 03일(화)	- John Schulman et. al., "Proximal Policy Optimization Algorithms," CoRR, abs/1707.06347, 2017. (발표자: 박효경, 이서영) 발표 자료	- Discrete PPO (Proximal Policy Gradient) ON-POLICY 강의 노트 [New & Complete @ 2022]
10	05월 10일(화)	- John Schulman et. al., "High-Dimensional Continuous Control Using Generalized Advantage Estimation", ICLR 2016. (발표자: 김회창, 차민혁, 최호빈) 발표 자료	- Continuous A2C ON-POLICY - Continuous A3C ON-POLICY - Continuous PPO ON-POLICY
11	05월 17일(화)	- Timothy P. Lillicrap et. al., "Continuous Control with Deep Reinforcement Learning," arXiv preprint arXiv:1509.02971, 2015. (발표자: 이성준, 유진) 발표 자료	- DDPG (Deep Deterministic Policy Gradient) OFF-POLICY 강의 노트	Homework #3 . Bipedal Walker control by using A2C/A3C/PPO/TD3/SAC on the 'link_rl' framework (Due Date: June 11, SAT, 23:59:59) Guide for Homework #3
12	05월 24일(화)	- Scott Fujimoto, Herke van Hoof, and David Meger, "Addressing Function Approximation Error in Actor-Critic Methods," ICML 2018. (발표자: 강병창, 현주영) 발표 자료	- TD3 (Deep Deterministic Policy Gradient) OFF-POLICY 강의 노트
13	05월 31일(화)	- Tuomas Haarnoja et. al., "Soft Actor-Critic Algorithms and Applications," CoRR abs/1812.05905, 2018. (발표자: 김동환, 임상훈) 발표 자료	- SAC (Soft Actor-Critic) OFF-POLICY 강의 노트
14	06월 07일(화)	- Deepak Pathak et. al., "Curiosity-driven Exploration by Self-supervised Prediction," ICML 2017. (발표자: 용성중, 송병진) 발표 자료	- SAC (Soft Actor-Critic) - Alpha Tuning OFF-POLICY
15	06월 14일(화)	기말 고사

2. Course Information

- Lecturer: 한연희 교수 (Rm. 2공학관 423호, Email: yhhan@koreatech.ac.kr)
- Classes: 화요일 (19:00 ~ 21:50, 11A ~ 13B)
- Lecture Room: 409호
- Prerequisites: 머신러닝 및 딥러닝 기본 지식, PyTorch/Tensorflow 기본 코딩 경험

3. Paper Presentation Evaluation

- 내용 이해도 (50%), 발표 자료 충실도 (30%), 발표 역량 (20%)

4. Home Work Guide

- 추후 구체적인 코딩 가이드 라인 제공
- PyTorch/Tensorflow 기반 강화학습 코딩 및 실험 결과 담은 리포트 제출

5. References

[주교재]

- 수업 시간 PDF로 제공
- 심층강화학습 핵심 논문

[부교재]

- 심층 강화학습 인 액션 : 기본 개념부터 파이썬 기반의 최신 알고리즘 구현까지
- 파이썬 기반 강화학습 알고리듬 DP, Q-Learning, AC, DQN, TRPO, PPO, DDPG, TD3 | Imitation Learning, ESBAS 알아보기
- 심층강화학습 주요 논문 모음:
- PyTorch 튜토리얼:

6. Logistics

- Attendance: one class absence will result in the deduction of two points out of 100 points. Five absences will not result in ten points deduction, but “failure” (i.e., grade ‘F’) in this course.
- Homework: much intensive homework will be set. Any cheating (or copying) will result in grade ‘F’.
- Exam: there will be the final examination for the evaluation of the knowledge learned from the class.

7. Lecture Evaluation

Attendance (10%), Paper Presentation (20%), Homework #1/#2/#3 Reports (30%), Final Exam. (40%)

LINK@KOREATECH