Data Jam EP 2 - Anomaly Detection

Anomaly Types

  • Point or Global Anomalies → specific data points
    A point anomaly is where a single datapoint stands out from the expected pattern, range, or norm. In other words, the datapoint is unexpected.

    Example: sklearn.neighbors.LocalOutlierFactor — scikit-learn 1.3.0 documentation

  • Collective Anomalies → groups of data (unexpected pattern)
    Where a whole subset of the data deviates from the whole data pattern.

  • Contextual Anomalies, where the point deviated based on a selected context

Anomaly Detection libraries

Anomaly Detection Resources

ADBench Key Takeaways

  • None of the unsupervised methods is statistically better than the others, emphasizing the importance of algorithm selection;
  • Semi-supervised methods outperform supervised methods when limited label information is available.

Personal suggestion on selecting an OD algorithm. If you do not know which algorithm to try, go with:

  • ECOD: Example of using ECOD for outlier detection
  • Isolation Forest: Example of using Isolation Forest for outlier detection scikit-learn
Written on July 26, 2023