Data Science (2015 Fall)

“Student-teacher relationships are based on trust. Acts, which violate this trust, undermine the educational process. Your classmates and the instructor will not tolerate violations of academic integrity”

1. Course Schedule & Lecture Notes

  • Sept. 17 – Supervised vs. Unsupervised Learning, Predictive Model, Data Handling and EDA (Exploratory data analysis)

  • Oct. 01 – Data Handling with Spark (산업체 강의)

    • 강의 전 준비사항
      • 2개 이상의 멀티 코어/프로세서가 장착된 개인별 노트북
      • OS는 리눅스 권장 (Virtual Box 등으로 리눅스 설치 후 실습 가능)
      • 해당 OS에 JDK 7 이상이 설치되어야 함
      • 사전에 https://spark.apache.org/downloads.html 로부터 Spark Release – 1.5.0 및 Package Type – Pre-built for Hadoop 2.6 and later을 선택하여 다운받아 미리 설치하고 테스트

  • Oct. 08 – Review

    • Mushroom Data Set
    • Entropy and IG
    • Assignment – Due Date: 10월 14일 23시 59분
      • 제출 방법: ipython notebook 에서 작업 이후 산출된 ipynb 파일을 github 등에 올리고 해당 URL을 메일로 보냄

  • Oct. 22 – [중간고사 기간 휴강]

  • Oct. 29 – Logistic Regression

    • Lecture Note (ipython notebook)
    • Assignment – Due Date: 11월 4일 23시 59분
      • 제출 방법: ipython notebook 에서 작업 이후 산출된 ipynb 파일을 github 등에 올리고 해당 URL을 메일로 보냄

  • Dec. 03 – [기말고사]

2. Course Information

3. Logistics

  • Attendance – One class absence will result in the deduction of two points out of 100 points. Three absences will not result in ten points deduction, but “failure” (i.e., grade ‘F’) in this course.
  • Exam – There will be the final exam for the evaluation of the knowledge learned from the class.
  • Book Report – Students should read one of books listed in the references, and submit a book report.
  • Presentation – Much evaluation mark will be counted.

4. Evaluation

  • Attendance (10%)
  • Book Report (20%)
  • Presentation (20%)
  • Final Examination (50%)