Wouldn't this only apply to neural networks as used for classification? I mean the general paradigm of deforming curves until they're separated by a hyperplane seems pretty obvious now that I see it in front of me, but what about neural networks used to approximate continuous functions?
I'll take a stab at this (I'm a decade out from my last machine learning class, so no guarantees on correctness): The only reason it's fitting a hyperplane is because one class is being mapped to the continuous value -1.0 and the other class is being mapped to the continuous value 1.0 and there's a thresholding step (the hyperplane perpendicular to the line onto which the continuous values are being projected) at the end to determine the class. If you're doing regression instead of classification, your training data will be fed in with more output values than just 1.0 and -1.0 and you'll omit the thresholding at the end, but otherwise the behavior and intuition should be the same as in the article.