API Reference¶
LocalOutlierProbability¶
from PyNomaly import loop
clf = loop.LocalOutlierProbability(
data=None,
distance_matrix=None,
neighbor_matrix=None,
extent=3,
n_neighbors=10,
cluster_labels=None,
use_numba=False,
n_jobs=1,
progress_bar=False,
)
Constructor Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
data |
np.ndarray or pd.DataFrame |
None |
Input data as a 2D array with shape (n_observations, n_features). Mutually exclusive with distance_matrix. |
distance_matrix |
np.ndarray or pd.DataFrame |
None |
Pre-computed distance matrix with shape (n_observations, n_neighbors). Must be provided together with neighbor_matrix. Mutually exclusive with data. |
neighbor_matrix |
np.ndarray or pd.DataFrame |
None |
Pre-computed neighbor index matrix with shape (n_observations, n_neighbors). Required when distance_matrix is provided. |
extent |
int |
3 |
Controls scoring sensitivity. Must be 1, 2, or 3. Corresponds to lambda times the standard deviation from the mean (1 = ~68%, 2 = ~95%, 3 = ~99.7%). |
n_neighbors |
int |
10 |
Number of nearest neighbors to consider. Must be greater than 0 and less than the number of observations. Automatically adjusted with a warning if invalid. |
cluster_labels |
list |
None |
Cluster assignments for each observation. When provided, LoOP scores are computed within each cluster independently. |
use_numba |
bool |
False |
Enable Numba JIT compilation for distance computation. Falls back to pure Python with a warning if Numba is not installed. |
n_jobs |
int |
1 |
Number of threads for parallel distance computation. Set to -1 to use all CPU cores. Only effective when use_numba=True. |
progress_bar |
bool |
False |
Display a progress bar during distance computation. |
Note
Either data or both distance_matrix and neighbor_matrix must be provided, but not both data and distance_matrix.
Methods¶
fit()¶
Calculates the Local Outlier Probability for each observation in the input data.
Returns: self -- the fitted model instance. Access scores via clf.local_outlier_probabilities.
Raises:
ClusterSizeError-- if any cluster contains fewer observations thann_neighbors.MissingValuesError-- if the input data containsNaNvalues.
stream(x)¶
Calculates the Local Outlier Probability for an individual observation against the fitted model. Must be called after fit().
Parameters:
| Parameter | Type | Description |
|---|---|---|
x |
np.ndarray |
A single observation to score. When using raw data mode, this should be a 1D array with the same number of features as the training data. When using distance matrix mode, this should be a scalar distance value. |
Returns: np.ndarray -- the Local Outlier Probability of the input observation (a value in [0, 1]).
Warning
The stream approach does not support clustered data. If cluster_labels were provided during fit(), PyNomaly will automatically refit using a single cluster and issue a UserWarning.
Attributes¶
Attributes available after calling fit():
| Attribute | Type | Description |
|---|---|---|
local_outlier_probabilities |
np.ndarray |
Array of LoOP scores for each observation, with values in [0, 1]. |
prob_distances |
np.ndarray |
Probabilistic distances for each observation. |
prob_distances_ev |
np.ndarray |
Expected values of probabilistic distances for each observation's neighborhood. |
norm_prob_local_outlier_factor |
float |
Maximum normalized probabilistic local outlier factor across all observations. Used internally by stream(). |
is_fit |
bool |
Whether the model has been fit. |
n_neighbors |
int |
The number of neighbors used (may differ from the value passed to the constructor if it was adjusted). |
Exceptions¶
All exceptions are importable from PyNomaly.loop or directly from PyNomaly:
from PyNomaly import ClusterSizeError, MissingValuesError
# or
from PyNomaly.loop import PyNomalyError, ValidationError, ClusterSizeError, MissingValuesError
| Exception | Parent | Description |
|---|---|---|
PyNomalyError |
Exception |
Base exception for all PyNomaly errors. |
ValidationError |
PyNomalyError |
Base exception for input validation failures. |
ClusterSizeError |
ValidationError |
Raised when a cluster has fewer observations than n_neighbors. |
MissingValuesError |
ValidationError |
Raised when input data contains NaN values. |