Skip to content

API Reference

LocalOutlierProbability

from PyNomaly import loop

clf = loop.LocalOutlierProbability(
    data=None,
    distance_matrix=None,
    neighbor_matrix=None,
    extent=3,
    n_neighbors=10,
    cluster_labels=None,
    use_numba=False,
    n_jobs=1,
    progress_bar=False,
)

Constructor Parameters

Parameter Type Default Description
data np.ndarray or pd.DataFrame None Input data as a 2D array with shape (n_observations, n_features). Mutually exclusive with distance_matrix.
distance_matrix np.ndarray or pd.DataFrame None Pre-computed distance matrix with shape (n_observations, n_neighbors). Must be provided together with neighbor_matrix. Mutually exclusive with data.
neighbor_matrix np.ndarray or pd.DataFrame None Pre-computed neighbor index matrix with shape (n_observations, n_neighbors). Required when distance_matrix is provided.
extent int 3 Controls scoring sensitivity. Must be 1, 2, or 3. Corresponds to lambda times the standard deviation from the mean (1 = ~68%, 2 = ~95%, 3 = ~99.7%).
n_neighbors int 10 Number of nearest neighbors to consider. Must be greater than 0 and less than the number of observations. Automatically adjusted with a warning if invalid.
cluster_labels list None Cluster assignments for each observation. When provided, LoOP scores are computed within each cluster independently.
use_numba bool False Enable Numba JIT compilation for distance computation. Falls back to pure Python with a warning if Numba is not installed.
n_jobs int 1 Number of threads for parallel distance computation. Set to -1 to use all CPU cores. Only effective when use_numba=True.
progress_bar bool False Display a progress bar during distance computation.

Note

Either data or both distance_matrix and neighbor_matrix must be provided, but not both data and distance_matrix.


Methods

fit()

clf.fit() -> LocalOutlierProbability

Calculates the Local Outlier Probability for each observation in the input data.

Returns: self -- the fitted model instance. Access scores via clf.local_outlier_probabilities.

Raises:

  • ClusterSizeError -- if any cluster contains fewer observations than n_neighbors.
  • MissingValuesError -- if the input data contains NaN values.

stream(x)

clf.stream(x) -> np.ndarray

Calculates the Local Outlier Probability for an individual observation against the fitted model. Must be called after fit().

Parameters:

Parameter Type Description
x np.ndarray A single observation to score. When using raw data mode, this should be a 1D array with the same number of features as the training data. When using distance matrix mode, this should be a scalar distance value.

Returns: np.ndarray -- the Local Outlier Probability of the input observation (a value in [0, 1]).

Warning

The stream approach does not support clustered data. If cluster_labels were provided during fit(), PyNomaly will automatically refit using a single cluster and issue a UserWarning.


Attributes

Attributes available after calling fit():

Attribute Type Description
local_outlier_probabilities np.ndarray Array of LoOP scores for each observation, with values in [0, 1].
prob_distances np.ndarray Probabilistic distances for each observation.
prob_distances_ev np.ndarray Expected values of probabilistic distances for each observation's neighborhood.
norm_prob_local_outlier_factor float Maximum normalized probabilistic local outlier factor across all observations. Used internally by stream().
is_fit bool Whether the model has been fit.
n_neighbors int The number of neighbors used (may differ from the value passed to the constructor if it was adjusted).

Exceptions

All exceptions are importable from PyNomaly.loop or directly from PyNomaly:

from PyNomaly import ClusterSizeError, MissingValuesError
# or
from PyNomaly.loop import PyNomalyError, ValidationError, ClusterSizeError, MissingValuesError
Exception Parent Description
PyNomalyError Exception Base exception for all PyNomaly errors.
ValidationError PyNomalyError Base exception for input validation failures.
ClusterSizeError ValidationError Raised when a cluster has fewer observations than n_neighbors.
MissingValuesError ValidationError Raised when input data contains NaN values.