Data Jam EP 2 - Anomaly Detection

Anomaly Types

Point or Global Anomalies → specific data points
A point anomaly is where a single datapoint stands out from the expected pattern, range, or norm. In other words, the datapoint is unexpected.

Example: sklearn.neighbors.LocalOutlierFactor — scikit-learn 1.3.0 documentation
Collective Anomalies → groups of data (unexpected pattern)
Where a whole subset of the data deviates from the whole data pattern.
Contextual Anomalies, where the point deviated based on a selected context

Tabular data → GitHub - PyOD: Python Outlier Detection
Time series data → GitHub - TODS: Automated Time-series Outlier Detection System
Graph data → GitHub - PyGOD: Python Library for Graph Outlier Detection (Anomaly Detection)
Image data → GitHub - Anomalib: Anomaly detection library comprising SOTA algorithms

None of the unsupervised methods is statistically better than the others, emphasizing the importance of algorithm selection;
Semi-supervised methods outperform supervised methods when limited label information is available.

Personal suggestion on selecting an OD algorithm. If you do not know which algorithm to try, go with:

ECOD: Example of using ECOD for outlier detection
Isolation Forest: Example of using Isolation Forest for outlier detection scikit-learn

Written on July 26, 2023