Deep Learning Is Applied Topology

(theahura.substack.com)

Comments

colah3 20 May 2025
Since this post is based on my 2014 blog post (https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/ ), I thought I might comment.

I tried really hard to use topology as a way to understand neural networks, for example in these follow ups:

- https://colah.github.io/posts/2014-10-Visualizing-MNIST/

- https://colah.github.io/posts/2015-01-Visualizing-Representa...

There are places I've found the topological perspective useful, but after a decade of grappling with trying to understand what goes on inside neural networks, I just haven't gotten that much traction out of it.

I've had a lot more success with:

* The linear representation hypothesis - The idea that "concepts" (features) correspond to directions in neural networks.

* The idea of circuits - networks of such connected concepts.

Some selected related writing:

- https://distill.pub/2020/circuits/zoom-in/

- https://transformer-circuits.pub/2022/mech-interp-essay/inde...

- https://transformer-circuits.pub/2025/attribution-graphs/bio...

esafak 20 May 2025
If it was topology we wouldn't bother to warp the manifold so we can do similarity search. No, it's geometry, with a metric. Just as in real life, we want to be able to compare things.

Topological transformation of the manifold happens during training too. That makes me wonder: how does the topology evolve during training? I imagine it violently changing at first before stabilizing, followed by geometric refinement. Here are some relevant papers:

* Topology and geometry of data manifold in deep learning (https://arxiv.org/abs/2204.08624)

* Topology of Deep Neural Networks (https://jmlr.org/papers/v21/20-345.html)

* Persistent Topological Features in Large Language Models (https://arxiv.org/abs/2410.11042)

* Deep learning as Ricci flow (https://www.nature.com/articles/s41598-024-74045-9)

ComplexSystems 20 May 2025
I really liked this article, though I don't know why the author is calling the idea of finding a separating surface between two classes of points "topology." For instance, they write

"If you are trying to learn a translation task — say, English to Spanish, or Images to Text — your model will learn a topology where bread is close to pan, or where that picture of a cat is close to the word cat."

This is everything that topology is not about: a notion of points being "close" or "far." If we have some topological space in which two points are "close," we can stretch the space so as to get the same topological space, but with the two points now "far". That's the whole point of the joke that the coffee cup and the donut are the same thing.

Instead, the entire thing seems to be a real-world application of something like algebraic geometry. We want to look for something like an algebraic variety the points are near. It's all about geometry and all about metrics between points. That's what it seems like to me, anyway.

srean 20 May 2025
The title, as it stands, is trite and wrong. More about that a little later. The article on the other hand is a pleasant read.

Topology is whatever little structure that remains in geometry after you throwaway distances, angles, orientations and all sorts of non tearing stretchings. It's that bare minimum that still remains valid after such violent deformations.

While notion of topology is definitely useful in machine learning, -- scale, distance, angles etc., all usually provide lots of essential information about the data.

If you want to distinguish between a tabby cat and a tiger it would be an act of stupidity to ignore scale.

Topology is useful especially when you cannot trust lengths, distances angles and arbitrary deformations. That happens, but to claim deep learning is applied topology is absurd, almost stupid.

soulofmischief 20 May 2025
Thanks for sharing. I also tend to view learning in terms of manifolds. It's a powerful representation.

> I'm personally pretty convinced that, in a high enough dimensional space, this is indistinguishable from reasoning

I actually have journaled extensively about this and even written some on Hacker News about it with respect to what I've been calling probabilistic reasoning manifolds:

> This manifold is constructed via learning a decontextualized pattern space on a given set of inputs. Given the inherent probabilistic nature of sampling, true reasoning is expressed in terms of probabilities, not axioms. It may be possible to discover axioms by locating fixed points or attractors on the manifold, but ultimately you're looking at a probabilistic manifold constructed from your input set.

> But I don't think you can untie this "reasoning" from your input data. It's possible you will find "meta-reasoning", or similar structures found in any sufficiently advanced reasoning manifold, but these highly decontextualized structures might be entirely useless without proper recontextualization, necessitating that a reasoning manifold is trained on input whose patterns follow learnable underlying rules, if the manifold is to be useful for processing input of that kind.

> Decontextualization is learning, decomposing aspects of an input into context-agnostic relationships. But recontextualization is the other half of that, knowing how to take highly abstract, sometimes inexpressible, context-agnostic relationships and transform them into useful analysis in novel domains

Full comment: https://news.ycombinator.com/item?id=42871894

umutisik 20 May 2025
Data doesn't actually live on a manifold. It's an approximation used for thinking about data. Near total majority, if not 100%, of the useful things done in deep learning have come from not thinking about topology in any way. Deep learning is not applied anything, it's an empirical field advanced mostly by trial and error and, sure, a few intuitions coming from theory (that was not topology).
profchemai 20 May 2025
Once I read "This has been enough to get us to AGI.", credibility took a nose dive.

In general it's a nice idea, but the blogpost is very fluffy, especially once it connects it to reasoning, there is serious technical work in this area (i.g. https://arxiv.org/abs/1402.1869) that has expanded this idea and made it more concrete.

vayllon 20 May 2025
Another type of topology you’ll encounter in deep neural networks (DNNs) is network topology. This refers to the structure of the network — how the nodes are connected and how data flows between them. We already have several well-known examples, such as auto-encoders, convolutional neural networks (CNNs), and generative adversarial networks (GANs), all of which are bio-inspired.

However, we still have much to learn about the topology of the brain and its functional connectivity. In the coming years, we are likely to discover new architectures — both internal within individual layers/nodes and in the ways specialized networks connect and interact with each other.

Additionally, the brain doesn’t rely on a single network, but rather on several ones — often referred to as the "Big 7" — that operate in parallel and are deeply interconnected. Some of these include the Default Mode Network (DMN), the Central Executive Network (CEN) or the Limbic Network, among others. In fact, a single neuron can be part of multiple networks, each serving different functions.

We have not yet been able to fully replicate this complexity in artificial systems, and there is still much to be learned and inspired by from this "network topologies".

So, "Topology is all you need" :-)

terabytest 20 May 2025
I'm confused by the author's diagram claiming that AGI/ASI are points on the same manifold as next token prediction, chat models, and CoT models. While the latter three are provably part of the same manifold, what justifies placing AGI/ASI there too?

What if the models capable of CoT aren't and will never be, regardless of topological manipulation, capable of processes that could be considered AGI? For example, human intelligence (the closest thing we know to AGI) requires extremely complex sensory and internal feedback loops and continuous processing unlike autoregressive models' discrete processing.

As a layman, this matches my intuition that LLMs are not at all in the same family of systems as the ones capable of generating intelligence or consciousness.

ada1981 20 May 2025
For the last few years, I've been "seeing maps" whenever I think about LLMs. It's always felt like the most natural way to understand what is going on.

It's also for this reason that I think new knowledge is discoverable from with in LLMs.

I imagine having a topographic map of some island that has only been explored partially by humans. But if I know the surrounding topography, I can make pretty accurate guesses about the areas I haven't been. And I think the same thing can be applied to certain areas of human knowledge, especially when represented as text or symbolically.

iFire 22 May 2025
Huge fan of the idea in fiction of Applied Topology is magic https://en.wikipedia.org/wiki/The_Laundry_Files

> Howard is recruited to work for the Q-Division of SOE, otherwise known as "the Laundry", the British government agency which deals with occult threats. "Magic" is described as being a branch of applied computation (mathematics), therefore computers and equations are just as useful, and perhaps more potent, than classic spellbooks, pentagrams, and sigils for the purpose of influencing ancient powers and opening gates to other dimensions.

_alternator_ 20 May 2025
The question is not so much whether this is true—we can certainly represent any data as points on a manifold. Rather, it’s the extent to which this point of view is useful. In my experience, it’s not the most powerful perspective.

In short, direct manifold learning is not really tractable as an algorithmic approach. The most powerful set of tools and theoretical basis for AI has sprung from statistical optimization theory (SGD, information-theoretical loss minimization, etc.). The fact that data is on a manifold is a tautological footnote to this approach.

garganzol 20 May 2025
I want to share one more related observation: by definition, topology math refers to geometrical objects and transformations. But there exists another, more computer-esque definition of topology that defines relations between abstract objects.

For example, let's take a look at graph data structure. A graph has a set of stored objects (vertices) and a set of stored relations between the vertices (edges). In this way, graph defines a topology in discrete form.

Let's take a look at network data structure which is closely related to the graph. It is very much the same idea, but it additionally has a value stored in every edge. A network has a set of objects (vertices) and a set of relations between the objects (edges), while edges also hold edge values. So it is also a form of topology because the network defines the relations between the abstract objects.

In this light, you can view a graph as a neural network with {0, 1} weights. The graph edge is either present or absent, hence {0, 1} values only. The network structure, however, can hold any assigned value in every edge, so every connection between objects (neurons) can be characterized not only by its presence, but also by edge-assigned values (weights). Now we get the full model of a neural network. And yes, it is built upon topology in its discrete form.

sota_pop 20 May 2025
I’ve always enjoyed this framing of the subject, the idea of mapping anything as hyperplanes existing in a solution space is one of the ideas that really blew my hair back during my academic studies. I would nitpick at your “dots in a circle example - with the stoner reference joke” I could be mistaken, but common practice isn’t to “move to a higher dimension”, but use a kernel (i.e. parameterize the points into the polar |r,theta> basis). All things considered, nice article.
nis0s 20 May 2025
> One way to think about neural networks, especially really large neural networks, is that they are topology generators. That is, they will take in a set of data and figure out a topology where the data has certain properties. Those properties are in turn defined by the loss function.

Latent spaces may or may not have useful topology, so this idea is inherently wrong, and builds the wrong type of intuition. Different neural nets will result in different feature space understanding of the same data, so I think it's incorrect to believe you're determining intrinsic geometric properties from a given neural net. I don't think people should throw around words carelessly because all that does is increase misunderstanding of concepts.

In general, manifolds can help discern useful characteristics about the feature space, and may have useful topological structures, but trying to impose an idea of "topology" on this is a stretch. Moreover, the kind of basics examples used in this blog post don't help prove the author's point. Maybe I am misunderstanding this author's description of what they mean, but this idea of manifold learning is nothing new.

parpfish 20 May 2025
This is also how I've often thought about deep learning -- focusing on the geometry of the data at each layer rather than the weights and biases is far more revealing.

I've always been hopeful that some algebraic topology master would dig into this question and it'd provide some better design principles for neural nets. which activation functions? how much to fan in/out? how many layers?

Graviscalar 20 May 2025
I was one of the people that was super excited after reading the Chris Olah blogpost from 2014, and over the past decade I've seen the insight go exactly nowhere. It's neat but it hasn't driven any interesting results, though Ayasdi did some interesting stuff with TDA and Gunnar Carlson has been playing around with neural nets recently.
wg0 21 May 2025
This is an insightful read but at a place where a prompt churns out a whole model - that's yet far far away and extremely compute intensive.

Philosophical point being - if human brain is also all planes all the way down with trillion dimensional planes separating concepts.

I don't think so that is the case but yes - we might mimic something like that and such a thing will always malfunction more than their human counterparts in a given subject area and would always hit a wall where a human won't.

adsharma 20 May 2025
Isn't Deep Learning more like Graph Theory? I shared yesterday that Google published a paper called CRISP (https://arxiv.org/pdf/2505.11471) that carefully avoids any reference to the word "Graph".

So then the question becomes what's the difference between Graph Theory and Applied Topology? Graphs operate on discrete structures and topology is about a continuous space. Otherwise they're very closely related.

But the higher order bit is that AI/ML and Deep Learning in particular could do a better job of learning from and acknowledging prior art from related fields. Reusing older terminology instead of inventing new.

deepburner 20 May 2025
This whole article is just a nothingburger. Saying something is applied topology is only one step more advanced than saying something is maths - duh. These mathematical abstractions are incredibly general and and you can pretty much draw up anything in terms of anything, the challenging part is being able to turn around and use the model/abstraction to say things about the thing you're abstracting. I don't think scholars have been very successful in that regard, less so this article.

Yeah deep learning is applied topology, it's also applied geometry, and probably applied algebra and I wouldn't be surprised if it was also applied number theory.

polotics 20 May 2025
Ok, how do transformers fit into this understanding of deep learning?
CliffStoll 20 May 2025
Applied topology. Might a Klein bottle actually be a useful?
paraschopra 21 May 2025
Aren't manifolds generally task-dependent?

I've been debating whether the data lies on a manifold, or whether the data attributes that are task-relevant (and of our interest) lie on a manifold?

I suspect it is the latter, but I've seen Platonic Representation Hypothesis that seems to hint it is the former.

strimp099 21 May 2025
One of the most successful quant traders in history, Jim Simons studied topology in the 60s. It’s rumored he used deep neural networks in his trading before they were cool. This post really brought the two together for me in a way I didn’t understand before.
maxiepoo 20 May 2025
Isn't it more differential geometry?
crgi 20 May 2025
Interesting read. Reminded me of the Trinity 3D manifold visualization tool which (among other things) let's you explore the hyperspace of neural networks: https://github.com/trinity-xai/Trinity
Koshkin 20 May 2025
On the TDA in general, I found this series of articles by Brandon Brown helpful:

https://github.com/outlace/outlace.github.io

xchip 21 May 2025
The deep learning you are talking about is better understood as a regression.
kookamamie 20 May 2025
> This has been enough to get us to AGI.1

Hard disagree.

mirekrusin 20 May 2025
Just because manifold looks a bit like burrito if you squint doesn't mean it is a burrito.
fedeb95 20 May 2025
Interesting read. This seems hard to prove:

"Everything lives on a manifold"

mbowcut2 20 May 2025
To a topologist, everything is topology.
revskill 20 May 2025
So ml is about drawing the line.
khoangothe 20 May 2025
Cool post! Thanks