Catgrad: A categorical deep learning compiler

incrudible · 2025-02-06T14:09:45 1738850985

Category theory always reminds me of this quip by Feynman:

https://www.youtube.com/watch?v=B-eh2SD54fM

When I hear people talking in the jargon of category theory, I do not understand what they say, but I have a suspicion that it is something rather mundane that I would understand if they were using the more specific terms for the given context. I understand the appeal of generalization from a mathematical perspective, but from a practical programming perspective, I fail to understand the value proposition.

steego · 2025-02-06T19:25:52 1738869952

While I understand Feynman’s point that mathematicians love to generalize everything and that physicists focus on what actually works in the real world, as programmers, we’re doing something different from both of those groups. We’re engaged in a mix of engineering and analysis. We need to build systems that hold together under real conditions.

Category theory might seem obsessed with abstraction. In some ways, it is. However, when skilled practitioners apply these tools to areas of computer science and engineering, the point isn’t to create generalized abstractions. The point is identify the minimal, leak-proof abstractions needed to fulfill some desired quality. While I understand the language sounds very general, cryptic and off-putting, it’s also very precise.

While category theory might seem like a discipline that’s obsessed with generalizations, in practice, the abstractions one constructs using categorical tools can be very specific. The point isn’t always to construct the most general abstractions, it’s to construct abstraction that actually work 100% of the time. The irony is, mathematicians and computer scientists who are concerned with rigor often do this by proving specific cases first and then incrementally generalizing, not the other way around.

catgary · 2025-02-07T01:39:00 1738892340

It might help the programmer but confuse the quantum physicist or algebraic topologist. It’s an interdisciplinary field and we do our best (well, I’m out of it so it’s not really “we” anymore). The abstract differentiation stuff that makes up the backbone here also gets used in homotopy theory and is applied to algebraic geometry.

incrudible · 2025-02-07T18:43:04 1738953784

I doubt that the average quantum physicist would have a better time with the jargon of category theory than the programmer. If you want to say something to people, you need to speak their language.

> The abstract differentiation stuff that makes up the backbone here also gets used in homotopy theory and is applied to algebraic geometry.

I should certainly hope so!

catgary · 2025-02-08T02:54:04 1738983244

Look, I agree that someone pitching category theory to programmers should make more of an effort to introduce terms/use common language when possible. And you can look at Gordon Plotkin’s papers to find some good, hard CS papers that take the time to work out examples relating things to structures familiar to a programmer.

But there’s the reality that CT is an interdisciplinary field, and you will have topologists working with computer scientists and they’ll often be actively disinterested in each other’s fields. Imposing your jargon or notation on your coauthors from other fields is hardly a great way to foster collaboration.

feanaro · 2025-02-06T15:08:04 1738854484

How do you discover the principles in the first place? You can discover them once and then apply them in all applicable places precisely because you generalised them.

You have a point that the result may very well be more easily explained in concrete terms to practitioners of a given field in which you applied it, though.

abeppu · 2025-02-05T16:54:19 1738774459

So, learning about the categorical structure is interesting, but is there a specific advantage to seeing these concepts directly informing the implementation vs as a post-hoc way of explaining what autodiff is doing? E.g. Tensorflow is creating and transforming graphs of computation. Despite being written before most of the categorical work cited was done, isn't it doing the "same" thing, but we just wouldn't find names or comments in the code that closely align with the categorical work?

tripplyons · 2025-02-05T17:37:51 1738777071

How does this compare to the XLA's ability to compile full training steps from JAX?

catgary · 2025-02-05T18:22:11 1738779731

This is a nice project, but “You only linearize once” is more-or-less the type theory version of “Reverse Derivative Categories”, so JAX really does this already.

statusfailed · 2025-02-05T20:58:38 1738789118

(author here) the goals of catgrad are a bit different to JAX - first of all, the "autodiff" algorithm is really a general approach to composing optics, of which autodiff is just a special case. Among other things, this means you can "plug your own language" into the syntax library to get a "free" autodiff algorithm. Second, catgrad itself is more like an IR right now - we're using it at hellas.ai to serve as a serverless runtime for models.

More philosophically, the motto is "write programs as morphisms directly". Rather than writing a term in some type theory which you then (maybe) give a categorical semantics, why not just work directly in a category?

Long term, the goal is to have a compiler which is a stack of categories with functors as compiler passes. The idea being that in contrast to typical compilers where you are "stuck" at a given abstraction level, this would allow you to view your code at various levels of abstractions. So for example, you could write a program, then write an x86-specific optimization for one function which you can then prove correct with respect to the more abstract program specification.

efferifick · 2025-02-06T00:59:40 1738803580

> Among other things, this means you can "plug your own language" into the syntax library to get a "free" autodiff algorithm.

Hello, as a compiler engineer I am interested in this area. Can you expand a little bit more? How would I be able to plug in my own language for example?

> So for example, you could write a program, then write an x86-specific optimization for one function which you can then prove correct with respect to the more abstract program specification.

So, what you are saying is that catgrad alllows me to write a program and then also plug in a compiler pass? I.e., the application author can also be the compiler developer?

catgary · 2025-02-06T07:36:06 1738827366

I get it, but once you’ve compiled your program down to a computational graph…well those are basically morphisms in a monoidal/multi category, no? And then you can have a stack of MLIR dialects that represent different levels of abstraction, where optimization passes are functors between the multicategory determined by that dialect, then you hand it off to LLVM for CPU execution, or maybe you want to run it with WebGPU, go down to CUDA, etc.

nh23423fefe · 2025-02-05T22:04:11 1738793051

can you expand on "write programs as morphisms directly"

I'm someone super interested in category theory and ITT, but I can't quite parse what you are trying to convey even though I think I have the prereqs to understand the answer.

drumnerd · 2025-02-06T07:37:11 1738827431

I think about cones all the time when doing machine learning. If I have an object O that can be mapped to A and B I can learn a function from A to B or B to A if I can generate Os. That’s my view of self supervised learning.

bguberfain · 2025-02-05T18:14:25 1738779265

It remembers me Theano [0].

[0] https://en.wikipedia.org/wiki/Theano_(software)

cyanydeez · 2025-02-05T21:31:10 1738791070

i always find it fascinating when projects like these can seem to find a proper overview for the AI curious non-PhD students.