Topology of Out-of-Distribution Examples in Deep Neural Networks

Download the paper Check the code  github

As deep neural networks (DNNs) become more common, concerns about their robustness, particularly when facing unfamiliar inputs, are growing. These models often exhibit overconfidence when making incorrect predictions on out-of-distribution (OOD) examples. This paper introduces a topological approach using latent layer embeddings from DNNs to characterize and identify OOD examples based on their topological features, or “landmarks”. The core finding is that while well-trained DNNs simplify the topology of in-distribution (ID) data, this simplification process is significantly less effective for OOD data, which tends to retain a more complex topological structure, measured by feature persistence.

The study leverages Topological Data Analysis (TDA) to understand how DNNs process different types of data. By examining the ‘shape’ of data in the network’s internal representations, specifically the penultimate layer, the authors reveal fundamental differences between how ID and OOD inputs are handled.

Methodology: Topological Data Analysis (TDA)

TDA (Datta et al., 2025) utilizes tools from algebraic topology to quantify shape features in high-dimensional data, offering robustness to noise and local variations by focusing on invariant properties.

Persistent Homology (PH)

Persistent Homology is the key TDA method employed. It analyzes data across multiple distance scales (\(\epsilon\)) simultaneously, identifying significant topological structures (like connected components, loops, voids) that persist over a range of these scales. The analysis focuses on the PH of data points embedded in the high-dimensional (\(\mathbb{R}^{512}\)) latent space of a ResNet18 model.

Simplicial Complexes & Vietoris-Rips

To analyze the point cloud’s shape, a simplicial complex is constructed using the Vietoris-Rips (VR) method. Based on a distance threshold \(\epsilon\), points become vertices (0-simplices), pairs closer than \(\epsilon\) form edges (1-simplices), triplets where all pairs are closer than \(\epsilon\) form triangles (2-simplices), and so on. As \(\epsilon\) increases, the complex grows, forming a filtration. PH tracks features across this filtration.

Homology & Betti Numbers

Homology groups (\(H_k\)) quantify k-dimensional “holes” in the complex. Betti numbers (\(\beta_k\)) measure the size of these groups:

Progression of Betti Numbers
Progression of Betti Numbers

Persistence Diagrams & Lifetime

Persistence diagrams visualize topological features, plotting their “birth” time \(\epsilon\) (when they appear) against their “death” \(\epsilon\) (when they merge or disappear).

Key Findings: Topological Simplification & OOD Persistence

1. Topological Simplification of ID Data

The paper confirms empirically that well-trained DNNs induce topological simplification on ID data (both training and test sets). Latent embeddings for ID data show low average lifetime for \(H_0\) features, indicating that distinct components merge quickly as \(\epsilon\) increases, approaching a topologically simpler structure (ideally, one component per class).

2. OOD Data Persists More

Example Persistence Diagram (Fig 2 from paper)
Example Persistence Diagram (Fig 2 from paper)

Higher-order features (\(H_1\)) showed generally low persistence for all data types, suggesting \(H_0\) (connected components) is the most informative dimension for distinguishing ID vs. OOD via this method.

Conclusion & Implications

This work demonstrates that topological simplification is a characteristic behaviour of DNNs processing ID data, which breaks down for OOD inputs. The average lifetime of \(H_0\) features in latent space serves as a robust indicator of this difference, offering a potential method for OOD detection and quantifying model uncertainty. Furthermore, the study shows the increasing computational feasibility of applying TDA methods like Persistent Homology to analyze realistic, large-scale deep learning models. Future work could explore alternative TDA methods or summary statistics and test these findings across more architectures and datasets.

References

  1. Datta, E., Hennig, J., Domschot, E., Mattes, C., & Smith, M. R. (2025). Topology of Out-of-Distribution Examples in Deep Neural Networks. ArXiv Preprint ArXiv:2501.12522.
  2. Naitzat, G., Zhitnikov, A., & Lim, L.-H. (2020). Topology of deep neural networks. Journal of Machine Learning Research, 21(184), 1–40.