Isn't this just doing the integer log to base 2 of x (or put another way, determining the highest set bit). Search for bit manipulation hacks and you'll find people have already thought about how to do that as efficiently as possible. The point of the article is perhaps more to do with how gcc optimizes templates but it'd be better if the initial non-template code was less contrived.
As the commenter ploxiln on the site noted, it is questionable if the author presents the codes which are actually equivalent.
As far as I understand, at least if the less sign from one function should become greater or equal in another.
It also appears to me that the starting function by definition has a fixed end limit number over which the f() isn't called, and that this limit gets set by every Foo<x> differently.
A more compelling example would have been nice, considering you can write a single line constexpr function that calculates "My_Size". You can do some pretty neat (and actually useful) things with parameter packs and recursive templates.