

Research Ph.D. ThesesMatrix Sampling Algorithms for Topics in Machine Learning
By Saurabh Paul
We study the application of matrix sampling algorithms on four problems in machine learning, namely: (i) Feature Selection for Linear Support Vector Machines, (ii) Feature Selection for Ridge Regression, (iii) Coreset Construction for Canonical Correlation Analysis and (iv) Adaptive Sampling algorithm for matrix reconstruction. We provide both theoretical performance guarantees and empirical evidence to indicate the effectiveness of our methods. A more detailed description is given below. 1. Feature Selection for Linear Support vector Machines. We present feature selection algorithms for linear Support Vector Machines (SVM), which can be used in an unsupervised or supervised setting. We prove that the margin in the feature space is preserved to within epsilonrelative error of the margin in the full feature space in the supervised setting. In the unsupervised setting, we also provide worstcase guarantees of the radius of the minimum enclosing ball, thereby ensuring comparable generalization as in the full feature space. We also present feature extraction techniques for linear SVM, which preserve both margin and dataradius, upto epsilonrelative error. 2. Feature Selection for Ridge Regression. We introduce an unsupervised feature selection technique for regularized least squares classification (RLSC), (which is the classification analogue to ridge regression) and provide worstcase guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We provide risk bounds for feature selection methods for ridge regression in the fixed design setting. 3. Coreset Construction for Canonical Correlation Analysis (CCA). We introduce two algorithms for coreset construction from CCA. The algorithms select a subset of datapoints from the pair of matrices and compute approximations to all canonical correlations with provable guarantees. We show that any set of canonical weights of sampled pair of matrices can be used to obtain a set of approximately orthogonal canonical vectors of the original pair of matrices. 4. Adaptive Sampling algorithm for matrix reconstruction. We introduce a new adaptive sampling algorithm for computing lowrank matrix approximation. We are given a matrix A and a target rank k. The algorithm runs in $t$ iterations and selectes a subset of columns of the matrix. It computes a rankk approximation to the matrix that is as good as the best rankk approximation that would have been obtained by using all the columns. Return to main PhD Theses page 

