Open set recognition
Open-Set Recognition¶
Standard classifiers are trained to output high-confidence predictions for every input β even inputs from classes never seen during training. This is called network agnostophobia: the network is incapable of saying "I don't know."
Ludwig implements two loss functions from Dhamija et al., NeurIPS 2018 (paper) that reduce this problem:
| Loss | Use on | Behaviour |
|---|---|---|
entropic_open_set |
category, binary |
CE on known samples + entropy maximisation on background samples |
objectosphere |
category, binary |
CE + norm push (known) + entropy + norm suppression (background) |
Both losses require a background class to be present in the training data: a special label assigned to the unknown/unwanted inputs that the model should be uncertain about.
How they work¶
Entropic Open-Set Loss¶
For known-class samples the loss is standard cross-entropy. For background samples the loss maximises Shannon entropy of the output distribution, driving the softmax probabilities toward uniform.
$$ \mathcal{L} = \underbrace{-\log p_y}{\text{CE on known}} \;+\; \underbrace{\sum_k p_k \log p_k} $$}
Objectosphere Loss¶
Extends the entropic loss with a magnitude objective that creates a clear norm threshold for unknown detection at inference time:
- Known samples: CE + hinge
max(0, ΞΎ β ||z||)Β²pushes logit norms β₯ ΞΎ - Background samples: entropy maximisation +
ΞΆ ||z||Β²suppresses logit norms toward zero
$$ \mathcal{L} = \underbrace{\text{CE}(z_{\text{known}}) + \max(0,\,\xi - |z|)^2}{\text{known}} \;+\; \underbrace{\sum_k p_k \log p_k + \zeta\,|z|^2} $$}
After training, a sample with ||logits|| < threshold can be flagged as unknown without any
extra model head.
Configuration¶
Entropic Open-Set Loss¶
output_features:
- name: label
type: category
loss:
type: entropic_open_set
background_class: 4 # integer index of the background/unknown class
| Parameter | Type | Default | Description |
|---|---|---|---|
background_class |
int \| null |
null |
Index of the background class in the feature vocabulary. When null the loss reduces to standard CE. |
weight |
float |
1.0 |
Overall loss weight |
Objectosphere Loss¶
output_features:
- name: label
type: category
loss:
type: objectosphere
background_class: 4
xi: 10.0
zeta: 0.1
| Parameter | Type | Default | Description |
|---|---|---|---|
background_class |
int \| null |
null |
Index of the background class. When null, applies CE + hinge to all samples. |
xi |
float |
10.0 |
Minimum logit L2 norm for known-class samples |
zeta |
float |
0.1 |
Weight for the unknown-class magnitude penalty |
weight |
float |
1.0 |
Overall loss weight |
Finding the background class index¶
Ludwig assigns integer indices to category values based on their frequency in the training data (most frequent first). To find the index of your background class:
- Run a training job without the agnostophobia loss (use
softmax_cross_entropy). - Open
results/<experiment>/model/training_set_metadata.json. - Find the output feature's
str2idxdictionary. - Use the integer value next to your background class string.
Alternatively, pass the background class as the most frequent category in your training CSV β that will always receive index 0.
Dataset setup¶
The training set must contain both known-class samples and background samples. Background samples can come from:
- A dedicated "unknown" category in the original data.
- Out-of-distribution images or text from a separate source.
- Synthetic noise or augmented samples unlikely to appear at deployment time.
label,feature_1,feature_2
cat,0.3,0.7
dog,0.8,0.1
...
background,0.5,0.5 β background/unknown samples
background,0.2,0.9
Example: synthetic Gaussian dataset¶
The following reproduces the key experiment from the paper: four known Gaussian clusters (classes 0β3) plus two unknown clusters (background class 4).
# Run the standalone example (no data download required):
cd examples/open_set_recognition
python train_open_set.py
Expected output:
Model | Max-prob (known) | Max-prob (unknown) | Norm known | Norm unknown
-----------------------|-----------------|-------------------|------------|-------------
CE Baseline | 0.998 | 0.741 | 8.828 | 5.375
Entropic Open-Set | 0.974 | 0.273 | 6.254 | 0.637
Objectosphere | 0.874 | 0.363 | 13.843 | 2.361
The agnostophobia losses clearly reduce the model's confidence on unknown inputs while preserving accuracy on known classes. Objectosphere additionally creates a large norm gap between known (β 5.2) and unknown (β 0.2) samples, enabling near-perfect unknown detection with a simple threshold.
Inference-time unknown detection¶
Using max softmax probability (both losses)¶
from ludwig.api import LudwigModel
import torch
model = LudwigModel.load("results/my_experiment/model")
preds, _, _ = model.predict(dataset=df)
# Column added by Ludwig's postprocessor
max_probs = preds["label_probability"]
is_unknown = max_probs < 0.5 # tune threshold on validation set
Using logit norm (Objectosphere only)¶
# Collect logits from the model before the softmax
acts = model.collect_activations(layer_names=["label/decoder/logits"], dataset=df)
logit_norms = acts["label/decoder/logits"].norm(dim=-1)
is_unknown = logit_norms < 3.0 # tune threshold on validation set
References¶
Dhamija, A. R., GΓΌnther, M., & Boult, T. (2018). Reducing Network Agnostophobia. NeurIPS 2018. https://arxiv.org/abs/1811.04110