Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • P pyod
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 144
    • Issues 144
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 16
    • Merge requests 16
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Yue Zhao
  • pyod
  • Issues
  • #92
Closed
Open
Issue created May 10, 2019 by Yahya@John-Almardeny

Deviation-Based Outlier Detection (New Feature)

As far as I noticed that PyOD does not have any Deviation-Based Outlier Detection Type of Model. Although, they are not very popular and relatively old, but I think it is nice to have a one in the collection.


Linear Method for Deviation Detection for Large Databases

Based on the work of Arning, A., Agrawal, R., and Raghavan, P. 1996. A linear method for deviation detection in large databases. In Proc. Int. Conf. on Knowledge Discovery and Data Mining (KDD), Portland, OR. ; The Linear Method for Deviation-based Outlier Detection (LMDD) employs the concept of the Smoothing Factor (SF) which indicates how much the dissimilarity can be reduced by removing a subset of elements from the data-set.

The dissimilarity function can be any as per mentioned clearly in the paper. The one proposed in the paper is the variance, however, more options can be used from the Statistical Dispersion Measures. (Already implemented Average Absolute Deviation; Variance; and Interquartile Range, However, Median Absolute Deviation to be added in future once Scipy Stats Version 1.3.0 is released - optional).

The original algorithm outputs Labels, with a very minor tweak, it can output now Scores.

Assignee
Assign to
Time tracking