Data Jam EP 2 - Anomaly Detection
Anomaly Types
-
Point or Global Anomalies → specific data points
A point anomaly is where a single datapoint stands out from the expected pattern, range, or norm. In other words, the datapoint is unexpected.Example: sklearn.neighbors.LocalOutlierFactor — scikit-learn 1.3.0 documentation
-
Collective Anomalies → groups of data (unexpected pattern)
Where a whole subset of the data deviates from the whole data pattern. -
Contextual Anomalies, where the point deviated based on a selected context
Anomaly Detection libraries
- Tabular data → GitHub - PyOD: Python Outlier Detection
- Time series data → GitHub - TODS: Automated Time-series Outlier Detection System
- Graph data → GitHub - PyGOD: Python Library for Graph Outlier Detection (Anomaly Detection)
- Image data → GitHub - Anomalib: Anomaly detection library comprising SOTA algorithms
Anomaly Detection Resources
- anomaly-detection-resources
- Official Implement of “ADBench: Anomaly Detection Benchmark”
- Paper - ADBench: Anomaly Detection Benchmark
- A practical guide to image-based anomaly detection using Anomalib
ADBench Key Takeaways
- None of the unsupervised methods is statistically better than the others, emphasizing the importance of algorithm selection;
- Semi-supervised methods outperform supervised methods when limited label information is available.
Personal suggestion on selecting an OD algorithm. If you do not know which algorithm to try, go with:
- ECOD: Example of using ECOD for outlier detection
- Isolation Forest: Example of using Isolation Forest for outlier detection scikit-learn
Written on July 26, 2023