Danna explains classification and anomaly detection to differentiate the machine learning problems.
Sometimes defining classification and anomaly detection as two distinct machine learning problems can get tricky. So, guess what? We decided to write a blog post about it because we find the most meaningful posts are the ones that define difficult to understand topics.
Classification is a type of machine learning aimed at classifying observations into two or more classes. It is a type of supervised learning in which the data has class labels and the target is always categorical as opposed to regression in which the target is a continuous number. In classification we typically want to balance the class labels so all classes have equal importance. Why is balancing important in classification? Think of the lottery: the majority class would be be losers and if we just always guess "lose" we will be very accurate, but this completely ignores the minority class of winners.
Anomaly detection is at its core classification, but there are some important distinctions to be made between the two. In anomaly detection you distinguish between "normal" and "anomalous" observations. Anomalous observations do not conform to the expected pattern of other observations in a data set. Because anomalous events are, by definition, rare events it can be implied that data sets are going to be imbalanced.
Also, you have to consider that there are actually three main types of anomaly detection:
- Supervised anomaly detection - This is a fancy way of saying classification because the anomalous and normal observations are labeled. This sort of anomaly detection is handled by creating a classification model of typical vs. anomalous observations.
- Semi-supervised anomaly detection - The techniques for this area of anomaly detection assume that the data set is only partially labeled.
- Unsupervised anomaly detection - In this area of anomaly detection, the observations used to build a model are unlabeled. Algorithms for this type of anomaly detection assume that the normal behavior occurs far more frequently than anomalous behavior.
So yes, supervised anomaly detection is actually classification, but anomaly detection and classification are two very different machine learning problems. When differentiating the two you should determine if you have labeled classes and whether you want to distinguish "anomalous" from "normal" observations that are imbalanced.