Hacker News new | past | comments | ask | show | jobs | submit login
Intuitive guide to convolution (betterexplained.com)
63 points by abhi9u on Dec 4, 2023 | hide | past | favorite | 24 comments



IMO 3blue1brown presents this in a very easy to understand way: https://www.youtube.com/watch?v=KuXjwB4LzSA


I wonder if we will ever be at a stage where LLM can generate videos like that as an ELI5


https://github.com/360macky/generative-manim :

> Generative Manim is a prototype of a web app that uses GPT-4 to generate videos with Manim. The idea behind this project is taking advantage of the power of GPT-4 in programming, the understanding of human language and the animation capabilities of Manim to generate a tool that could be used by anyone to create videos. Regardless of their programming or video editing skills.

"TheoremQA: A Theorem-driven [STEM] Question Answering dataset" (2023) https://github.com/wenhuchen/TheoremQA#leaderboard

How do you score memory retention and video watching comprehension? The classic educators' optimization challenge

"Khan Academy’s 7-Step Approach to Prompt Engineering for Khanmigo" https://blog.khanacademy.org/khan-academys-7-step-approach-t...

"Teaching with AI" https://openai.com/blog/teaching-with-ai

Photomath: https://en.wikipedia.org/wiki/Photomath

"Pix2tex: Using a ViT to convert images of equations into LaTeX code" > latex2sympy: https://news.ycombinator.com/item?id=38138020

SymPy Beta is aWASM fork of SymPy Gamma: https://github.com/eagleoflqj/sympy_beta

SymPy Gamma: https://github.com/sympy/sympy_gamma

TIL i^4x == e^2iπx

  import unittest
  import sympy as sy

  test = unittest.TestCase()
    
  x = sy.symbols("x")
  # with test.assertRaises(AssertionError):
  test.assertEqual(sy.I**(4*x), sy.E**(2*sy.I*sy.pi*x))
https://www.wolframalpha.com/input/?i=I%5E%284x%29+%3D%3D+e%...


Reminds me of Setosa's interactive explanation of image kernels:

https://setosa.io/ev/image-kernels/


This is a great post that makes the concept of image kernels way more accessible, thanks for sharing!


The flip and slide thing that electrical engineers do has always struck me as unintuitive; it makes more sense to think of it as a blurring operation.

Consider a function f: R^2 -> R (or Z^2 -> Z if you like) that represents a grayscale image. So f(0,0) is the pixel at the origin, f(1,0) is the pixel at (1,0), etc. Think of g: R^2 -> R as a blurring function, e.g. a gaussian.

What convolution does is it turns every pixel of f into a copy of the blur g, weighted by and centered on each pixel being blurred. So f(0,0) gets turned into a blurred image h(x,y) = f(0,0)g(x,y). f(1,0) gets turned into a blurred image h(x,y) = f(1,0)g(x-1,0). Note that the subtraction is just recentering g so the blur applies in the right position. In general, each pixel gets blurred into the function h(x,y) = f(a,b)g(x-a,y-b).

Now sum up all the blurred pixels, so you get the final image (f*g)(x,y) = integral_(a,b) f(a,b)g(x-a,y-b).

Same thing can be done in the time domain instead of a spatial domain, or you can write it in vector form, so (f*g)(x) = integral_(r) f(r)g(x-r).

Note that you can also write it in a more symmetric way as (f*g)(c) = integral_(a+b=c) f(a)g(b), which makes it clear that you can do this for any semigroup (you get the normal definition when you have a group by noting that b=c-a), makes commutativity obvious, and makes it clear why polynomial multiplication is convolution of coefficients: x^n comes from summing over all the x^i*x^j with i+j=n, so the coefficient for x^n is the sum over all coefficients indexed by i,j with i+j=n, which is the symmetric way to write convolution.


> it makes more sense to think of it as a blurring operation.

It's not a blurring operation, it's not even an operation on images. And even when applied to images, only some convolutions result in what could be described as a "blur." This is also tautological, because you can't really define why a gaussian filter "blurs" an image without first understanding the convolution of the image and filter kernel.

imho you should throw out the notion of "intuition" and stick with the mathematical definition and then explore its properties and relationships.

The simplest and most intuitive description I've heard is "convolution is a mathematical operation that describes mixing two things together" (those things being vectors, matrices, continuous functions, etc)


I don't need to know about convolution to tell you what a Gaussian is; it's just e^(-x^2). You can look at a grayscale image with a single bright point pixel (like a star in the sky) and say it's "sharp", and an image with a gaussian and say it's a "blur" or a blob. That's just a qualitative description of what a point or a gaussian look like. Then, as I described, you can define convolution by saying you're replacing each point pixel with an image that is your blur with its brightness weighted by the point pixel and located at that pixel, and then add them all up.

That intuition also works fine in the time domain, where your signal is "blurred" at each instant by the impulse response.

But yeah more generally, all you get is that (f*g)(c) is the sum over all a (**) b=c f(a)g(b), where (**) is... some operation.

I guess ultimately my point is that looking at g(t-tau) and seeing -tau as "flip the signal" and t as "move to the desired time" seems absolutely bizarre to me. g(t-tau) is g(t), centered at tau. So f(tau)g(t-tau) is g(t), centered at tau, weighted by f(tau). Then sum over all tau. I can never see where the flip comes in; it seems backwards to me. Nor can I see how this flipping thing would intuitively arise in physical systems. Obviously they're equivalent, but it just doesn't make sense to me.


Ok but when i try this, i dont get any good intuitions.

polynomial multiplication is convolution, so what am i supposed to do here when you say, "g is blurring function"

When i think of multiplying term by term an summing over the diagonal I can clearly see what the f(tau)g(t-tau) is doing, its collecting terms with like exponents.

This also makes sense when I think of probability distributions, i'm multiplying the densities and summing "over the diagonal"


If you plot your coefficients as if they were a discrete time signal, you ought to see multiplication appear as a blur I guess, but yeah IMO it's hard to interpret for polynomials because polynomials are very abstract.

I'd certainly look at the diagonal (or what I said is the "symmetric" way to write it) as the proof for "why" convolution appears in the sum of random variables, and I've never considered the blurring thing in that setting, but it actually makes a lot of intuitive sense for looking at "what" happens when you add distributions. Like just picturing repeated blurrings of a uniform distribution at the origin (i.e. start with a square and blur with a square over and over), it seems intuitively plausible that it will converge to something like a gaussian (i.e. that you'll get a bump of some kind), so that's neat.


Convolution, or the art of given hard names to easy things.

Convolution is something you learn when you're 7.

It's called weighted average.


Weighted sum, sure. (Average is normalized).

But it's not just one sum, but a _collection_ of weighted sums, and those have a particular order for reasons.


It just smears out a signal.


Convolution: The time domain operation that corresponds to multiplication in the frequency domain.


And which you can use to do actual multiplication, using the DFT and inverse DFT, if you let $f$ and $g$ be polynomials over, say $Z/(2^k +1), where the modulus is selected such that $2^k$ is the word size of your machine.


Isn't this just a complicated way of saying a computer can do multiplication?


No. It's saying a computer can do multiplication really fast by doing a bunch of additions, without having to worry about carries, instead.


Computers can do multiplication really fast without doing additions, you just have to use SIMD instructions that are already on the CPU.


Any explanation should start by saying a convolution is just a weighted average.


Well it is, sort of, but treating one of the functions as weights and the other as the thing that is averaged misses out on the symmetry f*g = g*f


Averages are normalized. Convolutions only are if they integrate to 1.


What are you using non normalized weighted averages for?


I'm not! Averages should normalize the weights. But there are plenty of reasons to define convolutions where neither signal is normalized.


What are they?




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: