Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • P pyod
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 144
    • Issues 144
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 16
    • Merge requests 16
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Yue Zhao
  • pyod
  • Issues
  • #358
Closed
Open
Issue created Nov 09, 2021 by Administrator@rootContributor

Unexpected results in MAD when predicting

Created by: Quentin62

In the _mad function, the decision score is computed with: https://github.com/yzhao062/pyod/blob/master/pyod/models/mad.py#L129-L130

diff = np.abs(obs - self.median)
return np.nan_to_num(np.ravel(0.6745 * diff / np.median(diff)))

This function is used for both fit and predict. Here, the denominator is the median of the difference between the given observations and the median. The problem is that in the case of a prediction np.median(diff) uses the current observations and not the ones used for fitting and this can leads to wrong score. For example, if you use decision_function with one observation, the output score will always be 0.6745 because in this case diff == np.median(diff).

from pyod.models.mad import MAD
import numpy as np

mod = MAD(threshold=3)
x = np.random.normal(size=100).reshape(-1, 1)
mod.fit(x)
mod.median
y = np.array([[1000]])  # obviously an anomaly
mod.decision_function(y)  # array([0.6745])
mod.predict(y)  # array([0])

Idea to solve the problem: Saving the fitted median diff in the sae way the median is saved:

diff = np.abs(obs - self.median)
self.mediandiff = np.median(diff) if self.mediandiff  is None else self.mediandiff
return np.nan_to_num(np.ravel(0.6745 * diff / self.mediandiff))
Assignee
Assign to
Time tracking