MATH 574M - Statistical Machine Learning and Data Mining


Announcements
  • First class on 08/25/2020.


    Course Information
    Lectures: Tue. and Thu. 9:30-10:45am, D2L Online | Syllabus
    Office Hours: Tuesday 2-3pm, ENR2 S323. Or by appointment.
    TA Office Hours: TBA.
    Textbooks: The Element of Statistical Learning:data miming, inference, and prediction Hastie, Tibshirani, and Friedman (2001).
    Reference Books:
  • Principle and Theory for Data Mining and Machine Learning by Clark, Forkoue, Zhang (2009)
  • Pattern Recognition and Neural Networks by B. Ripley (1996)
  • Learning with Kernels by Scholkopf and Smola (2000)
  • The Nature of Statistical Learning Theory by Vapnik (1998)
  • An overview of statistical learning theory, Vapnik (1999)

    Useful Links:
  • Kernel Machines
  • Hastie's Software and Data

    R Resouces:
  • Download R (CRAN)
  • Introduction to R | R for Beginners | R reference card
  • Introduction to RStudio

    Statistics Prerequisites:
  • Basic Topics | Joe Watkins' 363 Notes | Joe Watkins' MATH 464 Notes

    Course Activities
    Week 1-2 (August 24-Sep 6) Read Chapter 1: Overview of Data Mining Lecture 1: Introduction
    Get familiar with R and RStudio R Intro, RStudio Intro
    Supplementary Reading: Data mining and statistics: what is the connection? Friedman (1997) Homework 1 PDF, LaTex. Assigned on August 25, due on Sep 8.
    Week 3 (Sep 7 - Sep 13) Read Chapter 2: Theory of Supervised Learning Lecture 2: Statistical Decision Theory (I)
    Lecture 3: Statistical Decision Theory (II)
    Homework 2 PDF, Latex. Assigned on Sep 10, due on Sep 29.
    Week 4 (Sep 14 - 20 ) Read Chapter 4.2-4.4: Linear Classification Methods for Binary Problems Lecture 4: Binary Classification (I): Basics
    Week 5 (Sep 21 - Sep 27) Supplementary Reading: Choosing Between Logistic Regression and Discriminant Analysis, Press, S. and Wilson, S. (1978) Lecture 5: Binary Classification (II): Logistic Regression and Discriminant Analysis
    Week 6 (Sep 28 - Oct 4) Curse of Dimensionality; Linear Binary Classification for High Dimensional Problems Lecture 6: Binary Classification (III): Extension to High Dimensional Classification Problems
    Homework 3 PDF file, Latex File. Assigned on Sep 29, due on Oct 13.
    Homework 3 Solution PDF file, R code
    Week 7 (Oct 5 - 11) Read Chapter 4.1: Nonparametric Regression Lecture 6.2: Parametric vs Nonparametric Regression
    Read Chapter 4.1: Nonlinear Classification Methods Lecture 7: K nearest neighbor (Knn) methods
    Week 8 (Oct 12 - 18) Topic: Introduction to Multiclass Classification Lecture 8: Multiclass Classification
    Supplementary Reading: Diagnosis of multiple cancer types by shrunkencentroids of gene expression Homework 4 PDF file, Latex File. Assigned on Oct 12, due on Nov 3.
    Homework 4 Solution PDF file, Rcode
    Week 9 (Oct 19 - 25) Supplementary Reading: Leave-out-one Cross Validation Lecture 9: Model Selection and Assessment
    Read Chapter 3: Linear Regression & Variable Selection Lecture 10: Linear Regression and Variable Selection
    Supplementary Reading: Linear Model Theory
    Week 10 (Oct 26 - Nov 1) Read Chapter 3 : Variable Selection for Linear Regression Lecture 11: Shrinkage Methods by LASSO
    Reading: Regression Shrinkage and Selection via the LASSO, Final Project: Project assigned on Oct 29, due on Dec 16
    Final Project Suggested Reference List
    Lecture: Principal Component Analysis: PCA
    Lecture: Quadratic Component Analysis: QDA
    Week 9-10 (Nov 2 - 15) Supplementary Reading: Regularization and variable selection via the elastic net Lecture 12: Shrinkage Methods - Beyond LASSO
    Homework 5 PDF file, Latex File. Assigned on Nov 3, due on Nov 19.
    Homework 5 Solution PDF file, R code
    Week 11 (Nov 16 - 22) Read Chapter 12: Support Vector Machines Lecture 13: Support Vector Machines
    Supplementary Reading: The Entire Regularization Path for the Support Vector Machine Homework 6: Latex, PDF, assigned on Nov 19, due on Dec 8
    Multiclass Support Vector Machines Lecture 14: Multiclass Support Vector Machines
    Week 12 (Nov 23 - 29) Read Chapter 9 (9.2) : Tree-based Methods Lecture 15: Classification and Regression Trees
    Week 13 (Nov 30 - Dec 6) Read Chapter 8.7 : Bootstrap and Bagging
    Supplementary Reading: Explaining Adaboost Lecture 16: Bagging and Boost


    Auditing
  • Auditors are expected to attend class regularly and submit homework on the same schedule as the other students.

    Policy on Academic Integrity
  • The University policy on academic integrity is spelled out in UA Code of Student Conduct.

    Students with Disabilities
  • Reasonable accommodations will be made for students with verifiable disabilities.