Outlier Detection

Statistical methods

  • use a model (e.g., Gaussian) to fit the distribution of all data
  • use two models to fit the distributions of non-outliers and outliers separately
  • Grubbs’ test

Distance based methods

  • the density within a neighborhood
  • the distance from a nearest neighbor

Learning based method

  • clustering, the smallest cluster is likely to contain outliers
  • one-class classifier (e.g., one-class SVM)
  • binary classifier (e.g., naive bayes for spam filtering, weighted binary SVM)