Deep Learning/Anomaly Detection

[Paper Review] DROCC: Deep Robust One-Class Classification

sdbeans 2022. 1. 23. 16:59

DROCC: Deep Robust One-Class Classification

link to abstract: https://arxiv.org/abs/2002.12718

link to Github repo: https://github.com/microsoft/EdgeML/tree/master/examples/pytorch/DROCC

Anomaly Detection이란?

비정상 탐지, 말 그대로 비정상적인 수치를 탐지한다. 이의 목적은 outlier를 찾는 것이다. 즉, 전형적인 데이터와 다른 포인트를 찾는 것.

Classical AD는 간단한 함수와 입력된 정상 데이터를 모델링 시키는 것 방법이다. 반면에 deep-learning-based AD는 데이터의 feature들을 자동적으로 학습시키는 방법이다.

 

Summary

Deep Robust One-Class Classification. robust는 강한, 굳건한 등 의미를 가지고 있다.

Deep SVDD와 비교: 두 방법 모두 최종 layer에서 학습한 classical one-class loss를 최소화 하는 방식이지만, Deep SVDD는 representation collapse 단점이 있다.

이 모델은 다른 추가적인 정보 없이 대부분의 분야에 활용 가능하며 representation collapse에 강하다. 정상 클래스에 있는 점들이 잘 샘플링 된 로컬에서 선형적인 low-dimensional manifold에 있다고 가정한다.

 

Introduction

  • robust to representation collapse by involving a discriminative component that is general and is empirically accurate on most standard domains
  • typical data lies as a group of points on low-dimension, representing well-sampled training data
  • has a gradient ascent phase to adaptively add anomalous points to out training set
  • also has a gradient descent phase to minimize the classification loss by learning a representation and a classifier on top of the representations to separate typical points from the generated anomalous points
  • automatically learns an appropriate representation, similar to Deep SVDD, but is robust to a representation collapse as mapping all points to the same balue would lead to poor discrimination between normal and anomalous points
  • summary: method based on a low-dimensional manifold assumption on the positive class using which it synthetically and adaptively generates negative instances to provide a general and robust approach to AD. for one-class classification problem, low FPR on negatives is important

Related Work

  • Generative modeling (e.g. GAN-based methods): requires reconstruction of the entire input during the decoding step.
  • deep one class SVM (Deep SVDD): suffers from representation collapse issue
  • transformations based methods: transformations have limitations because they are heavily domain dependent and are hard to design for domains like TS. also suitability of a transformation varies based on the structure of the typical points
  • side-information based AD: complementary to DROCC, which does not assume any side-info

DROCC

  • hypothesis: the set of typical points S lies on a low dimensional locally linear manifold that is well-sampled
  • outside a small radius around a training point, most points are anomalous
  • manifolds (group of points) are locally Euclidean, so we can use l2 distance function to compare points that are very close neighbors
  • typical points are positive and anomalous points are negative
  • to solve the saddle point problem, DROCC uses gradient descent-ascent technique
  • algorithm: 3 steps of adversarial search are performed in parallel for each x in the batch.
  • allows DNN structure
  • Similar to experiments with DeepSVDD, DeepSAD uses the hidden state of the final timestep as the representation in the one-class objective. An important aspect of training DeepSAD is the pretraining of the network as the encoder in an autoencoder. We also tuned this pretraining to ensure the best results.