Nice article, but a tiny bit weird that it calls affine transforms categorically non-linear, and then shows an affine matrix, which is linear by definition.
Affine transforms are linear intermediate transforms in one dimension higher than the source & target spaces. They're intuitively (rather than algebraically) non-linear because we move from 2d to 3d then back, or from 3d to 4d then back.
It can be helpful to think of the translation as a (linear) skew in the added extra dimension.
For linear functions (in the article's second, linear algebra sense), the image of zero must be zero, and that doesn't hold for (most) affine functions, so makes sense.
Note that he shows the trick of making affine functions linear by tacking on one more dimension.
> the image of zero must be zero, and that doesn't hold for (most) affine functions
It does hold, always, in the higher dimension. And I feel like that's the most important thing to clearly understand about affine transforms. The entire beauty of the augmented matrix is that you get a class of non-linear transforms in 3d by using linear transforms in 4d. The article was nice, I'm nit picking something that was nearly there. It'd just be nice to be one teensy bit more explicit about what's going on here.
But affine transformations are indeed not linear. The augmentation trick creates a new, linear transform in n+1 dimension, which is related but different to the affine transform in question.
Yes, exactly, you're right. The article is correct, it's just not telling quite the whole story. Affine transforms aren't linear, and their augmented matrices that do the same thing are linear. In practice, it's both, and it's precisely cool because it's both.
Really liked this post. Although I knew the "correct" definition of linearity I'd never considered that linear regression is not in fact linear (though you can transform it as blt says).
Yeah its one of those handy hacks. I like these because they allow changing the input data to be a substitute for a different algorithm/formulation. Transforming the input is often easier to do under a deadline than deploy a well tested new algorithm.
All that said, the typical regularized version of linear regression with the 'append 1' trick is no longer equivalent to the affine version one may have in mind. The difference is the weight that corresponds to the appended dimension would be regularized by a typical implementation of a regularized linear regression. Unless, of course, special care is taken to remove regularization on the appended dimension.
w.r.t. the "affine regression" comment - you can reduce affine regression to linear regression by adding a constant feature whose value is 1 for every data point, or by centering the data on its mean. So it's a fairly mild misnomer :)
edit: of course, I agree that we should not say "linear function" when we mean "affine function" in general.
I really love the translation "hack". The 2D plane is mapped onto the z=1 plane, and then we apply an x-y shear in 3D space. This keeps the 3D origin at the origin, but shifts around the z=1 plane without distorting it, so that when we project back to 2D it looks like a translation.
In geometry an "affine vector" (c,v) is said to be the vector v tangent to c -- no longer a vector, a "tangent vector". In your neural network this would be the weights tangent to each bias. If you keep c fixed
Tc[V] = {(c,v) | c fixed, forall v in V}
is clearly a vector space.
Here's the fun part: if you tried to consider
T[V] = {(c,v) | forall c in R, v in V}
this is no longer a vector space. But it's a fun structure: at each c, Tc[V] is sorta like (homeomorphic) to the Cartesian product
{c} x V
this is a fiber bundle. For example: a torus is near each c the cartesian product of a point and a circle; a "circle squared" is a donut.
Now, we don't need to restrict ourselves to c+vx -- we can consider sigmoid(c+vx), which will have tangents (derivatives) to b at each w. For fixed c we still have a vector space with the derivatives dsigmoid/dv, and varying c you have a sigmoid fiber bundle.
Going on the thought that interaction aids learning, I recently put together a very crude interactive jsfiddle to help visualize 2-d matrix transforms here: https://jsfiddle.net/holoopooj/31yt1ytp/6/
You click and drag in the left graph area and see the transformed vector on the right. The matrix values can be changed at the top. I've only tested in chrome on desktop. You may have to vertically expand the lower right pane to see all of the drawing area.
A nice discussion of the extension of the linear transformation matrix. The classic example and use case of this is 3D transformations that include Translation (as well as the linear shear, scale, rotate, and reflect).
I vaguely remember that the Minkowski space can also be written as an affine, but not particularly how or why, since it should be translation independent? It seemed that the raising/lowering tensor notation is always used.
Can someone unpack the definition of an affine subspace for me, it's been a while since I took point set topology:
A subset U ⊂ V of a vector space V is an affine space if there exists a u ∈ U such that U - u = {x - u | x ∈ U} is a vector subspace of V.
I'm unpacking this to read
A subset U of a vector space of V is an affine space if there exists an element u such that U - u, which is exactly equal to x - u for all x in U, is a vector subspace of V.
If I'm reading that right, the right side of the equation is a paranthetic expression, so is it necessary?
So. In calculus both the derivative and integral are linear operators:
D[a*f(x)] = a*Df[x]
D[f(x)+g(x)] = Df(x) + Dg(x)
and the indefinite (without limits) integral is an "antiderivative", right? I.e. y(x) = I(f(x)) is the solution to
D[y(x)] = f(x)
Here's the problem: there are multiple solutions to the case where f(x) = 0. Indeed, for any constant y(x) you have
D[y(x)] = 0
This is why you're drilled in engineering classes to always add a + C to your indefinite integral. The solution to an indefinite integral is always a class of functions -- the part without +C continues to be linear, but you have to tag that along.
I'm tempted to go through this carefully given that affine transformations tagged with cost functions are used in MR image registration/normalization all the time.
Affine transforms are linear intermediate transforms in one dimension higher than the source & target spaces. They're intuitively (rather than algebraically) non-linear because we move from 2d to 3d then back, or from 3d to 4d then back.
It can be helpful to think of the translation as a (linear) skew in the added extra dimension.