STA 290 Seminar Series
DATE: Thursday February 15th, 4:10pm
LOCATION: MSB 1147, Colloquium Room
SPEAKER: Guangliang Chen, Assistant Professor, Mathematics and Statistics, San Jose State University
TITLE: “A Unified Scalable Spectral Clustering Framework Based on Efficient Sparse Matrix Operations”
ABSTRACT: We present a scalable computing framework for various versions of spectral clustering, such as the Ng-Jordan-Weiss algorithm (NIPS '01), Normalized Cut (Shi and Malik, 2000), and Diffusion Maps (Coifman et al., 2005). We first consider spectral clustering with the cosine similarity for sparse data or data of moderate dimensions, and show that in those cases, spectral clustering can be implemented solely based on efficient operations on the data matrix, i.e., elementwise manipulation, matrix-vector multiplication and low-rank SVD. In the case of the Gaussian (or any other) similarity, we adopt a landmark-based embedding technique to transform the problem to that of spectral clustering with cosine similarity. Our algorithms are simple to implement, fast to run, and robust to outliers. We compare our implementation with a few existing methods on several benchmark data sets to demonstrate its competitive performance.