Statistics & Data Science Colloquium
Title: Assessment of Case Influence in Support Vector Machine
Abstract: Support vector machine (SVM) is a very popular technique for classification. A key property of SVM is that its discriminant function depends only on a subset of data points called support vectors. This comes from the representation of the discriminant function as a linear combination of kernel functions associated with individual cases. Despite the direct relation between each case and the corresponding coefficient in the representation, the influence of cases and outliers on the classification rule has not been examined formally. Borrowing ideas from regression diagnostics, we define case influence measures for SVM and study how the classification rule changes as each case is perturbed. To measure case sensitivity, we introduce a weight parameter for each case and reduce the weight from one to zero to link the full data solution to the leave-one-out solution. We develop an efficient algorithm to generate case-weight adjusted solution paths for SVM. The solution paths and the resulting case influence graphs facilitate evaluation of the influence measures and allow us to examine the relation between the coefficients of individual cases in SVM and their influences comprehensively. We present numerical results to illustrate the benefit of this approach.