Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • P pyod
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 144
    • Issues 144
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 16
    • Merge requests 16
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Yue Zhao
  • pyod
  • Issues
  • #158
Closed
Open
Issue created Jan 22, 2020 by Administrator@rootContributor

xgbod is failing when passing less than 202 rows

Created by: abhijeetmote

The Bellow code is working when I am giving no_of_rows = 202; but if I reduce the no_of_rows = 201. The code is failing at predict(). and giving the error mentioned below the code. @yzhao062 Is XGBOD needs to be trained with a minimum number of rows i.e 202

Following is the code snippet

no_of_rows = 202
outliers_fraction = 0.1
clf_obj = XGBOD(contamination=outliers_fraction)
new_df_train_data = pd.DataFrame(np.random.randint(0,no_of_rows,size=(no_of_rows, 2)), \
                                columns=list('AB'))

X_train = new_df_train_data.values
y_train =  pd.DataFrame(np.random.randint(0,2,size=(no_of_rows, 1)), \
                        columns=list('y')).values

# fit the model with training data
print(len(X_train), len(y_train))
y_train = y_train.reshape(y_train.size, 1)
clf_obj.fit(X_train, y_train)

test_data = pd.DataFrame(np.random.randint(0,50,size=(50, 2)), columns=list('AB'))
X_test = test_data.values
print(clf_obj.predict(X_test))

Error snippet

AttributeError                            Traceback (most recent call last)
<ipython-input-194-fee4e0bcf9ca> in <module>
     16 test_data = pd.DataFrame(np.random.randint(0,50,size=(50, 2)), columns=list('AB'))
     17 X_test = test_data.values
---> 18 print(clf_obj.predict(X_test))

~/anaconda3/envs/tensorflow/lib/python3.7/site-packages/pyod/models/xgbod.py in predict(self, X)
    379
    380         # construct the new feature space
--> 381         X_add = self._generate_new_features(X)
    382         X_new = np.concatenate((X, X_add), axis=1)
    383

~/anaconda3/envs/tensorflow/lib/python3.7/site-packages/pyod/models/xgbod.py in _generate_new_features(self, X)
    266         for ind, estimator in enumerate(self.estimator_list):
    267             if self.standardization_flag_list[ind]:
--> 268                 X_add[:, ind] = estimator.decision_function(X_norm)
    269
    270             else:

~/anaconda3/envs/tensorflow/lib/python3.7/site-packages/pyod/models/knn.py in decision_function(self, X)
    246
    247             # get the distance of the current point
--> 248             dist_arr, _ = self.tree_.query(x_i, k=self.n_neighbors)
    249             dist = self._get_dist_by_method(dist_arr)
    250             pred_score_i = dist[-1]

AttributeError: 'NoneType' object has no attribute 'query'

Any workaround? Your help would be appreciated?

Thanks, Abhijeet

Assignee
Assign to
Time tracking