MATH 574M - Statistical Machine Learning and Data Mining


Announcements
  • First class on 01/10.


    Course Information
    Lectures: Tue. and Thu. 9:30-10:45am, Bio. Sci. West 210| Syllabus
    Office Hours: Tuesday 11-12pm, ENR2 S323. Or by appointment.
    Textbooks: The Element of Statistical Learning:data miming, inference, and prediction Hastie, Tibshirani, and Friedman (2001).
    Reference Books:
  • Principle and Theory for Data Mining and Machine Learning by Clark, Forkoue, Zhang (2009)
  • Pattern Recognition and Neural Networks by B. Ripley (1996)
  • Learning with Kernels by Scholkopf and Smola (2000)
  • The Nature of Statistical Learning Theory by Vapnik (1998)
  • An overview of statistical learning theory, Vapnik (1999)

    Useful Links:
  • Kernel Machines
  • Hastie's Software and Data

    R Resouces:
  • Download R (CRAN)
  • Introduction to R | R for Beginners | R reference card

    Statistics Prerequisites:
  • Basic Topics | Joe Watkins' 363 Notes | Joe Watkins' MATH 464 Notes

    Course Activities
    Week 1 (Jan 9-13) Read Chapter 1: Overview of Data Mining Lecture 1: Introduction
    Get Familiar with Software: Intrudction to R R Brief Intro, R Guide For Reginners
    Supplementary Reading: Data mining and statistics: what is the connection? Friedman (1997) Homework 1. Assigned on Jan 15, due on Jan 29.
    Week 2-3 (Jan 14-27) Read Chapter 2: Theory of Supervised Learning Lecture 2: Statistical Decision Theory (I)
    Lecture 3: Statistical Decision Theory (II)
    Week 4 (Jan 28-Feb 3) Read Chapter 4.2-4.4: Linear Classificaton Methods for Binary Problems Lecture 4: Binary Classification (I): Basics
    Homework 2 Assignment. Assigned on Jan 29, due on Feb 12.
    Homework 2 Solution, Code
    Week 6 (Feb 4 - Feb 10) Supplementary Reading: Choosing Between Logistic Regression and Discriminant Analysis, Press, S. and Wilson, S. (1978) Lecture 5: Binary Classification (II): Logistic Regression and Discriminant Analysis
    Curse of Dimensionality; Linear Binary Classification for High Dimensional Problems Lecture 6: Binary Classification (III): Extension to High Dimensional Classification Problems
    Week 4 (Feb 11 - Feb 17) Read Chapter 4.1: Nonlinear Classification Methods Lecture 7: K nearest neighbor (Knn) methods
    Topic: Introduction to Multiclass Classifiction Lecture 8: Multiclass Classification
    Homework 3 Assignment.Assigned on Feb 12, due on Feb 26
    Homework 3 Solution, Code
    Week 5 (Feb 18 - Feb 24) Topic: Nonlinear Discriminant Analysis Lecture 9: QDA and RDA
    Supplementary Reading: LDA for improved large vocabulary continuous speech recognition Lecture 10: PCA
    Week 6 (Feb 25 - March 3) Topic: Linear Regression Models Lecture 11: Linear Regression
    Read Chapter 3 : Linear Regression, Supplementary Reading: Linear Model Theory
    Homework 4 Assignment. Assigned on March 5, due on March 26.
    Week 7 (March 11 - March 17) Read Chapter 3 : Variable Selection for Linear Regression Lecture 12: Variable Selection (I)
    Reading: Regression Shrinkage and Selection via the LASSO,
    Week 8 (March 18 - March 24) Lecture 13: Shrinkage Methdods by LASSO
    Supplementary Reading: Regularization and variable selection via the elastic net
    Week 9 (March 25 - March 31) Final Project: Project assigned on March 26, due on May 12
    Final Project Suggested Reading List Homework 5, Prostate data set, data info. Assigned on March 26th, due on April 9
    Lecture 14: Beyond LASSO
    Week 10 (April 1 - 7) Lecture 15: Model Selection and Assessment
    Supplenmentary Reading: Leave-out-one Cross Validation
    Week 11 (April 8 - 14) Read Chapter 4 (4.5) Lecture 16: Modern Classification vis Separting Hyperplanes
    Read Chapter 12 Lecture 17: Support Vector Machines
    Supplementary Reading: The Entire Regularization Path for the Support Vector Machine Lecture 18: Multiclass Support Vector Machines
    Read Chapter 12 Lecture 19: Optimization Programming
    Week 12 (April 15 - 21) Read Chapter 9 (9.2) : Tree-based Methods Lecture 20: Classification and Regression Trees
    Homework 6, assigned on April 16, due on April 30.
    Read Chapter 8.7 : Bootstrap and Bagging Supplenmentary Reading: Explaining Adaboost Lecture 21: Bagging and Boost
    Week 13 (April 22 - 28) Read Chapter 14 (14.1-14.4) : Unsupervised Learning Lecture 22: Cluster Analysis
    Recommender Systems
    Week 14 (April 29- 30) Graphical Models and Network Analysis


    Auditing
  • Auditors are expected to attend class regularly and submit homework on the same schedule as the other students.

    Policy on Academic Integrity
  • The University policy on academic integrity is spelled out in UA Code of Student Conduct.

    Students with Disabilities
  • Reasonable accommodations will be made for students with verifiable disabilities.