I suspect it has to be in kernel for efficiency reasons. Because applications access network devices through kernel interfaces; the kernel has to context switch and copy data to and from the userspace VPN process; This can be quite slow.
On a microkernel system, the standard way of connecting to the network stack might be through direct shared memory; and on such a system you could manage to run the VPN in userspace at essentially no cost.
However, on Linux, most of the network stack is in the kernel; the ABI for interacting with network hardware is stable. So while in theory you could write an efficient userspace VPN, it would require you to modify or at very least recompile your applications.
On a microkernel system, the standard way of connecting to the network stack might be through direct shared memory; and on such a system you could manage to run the VPN in userspace at essentially no cost.
However, on Linux, most of the network stack is in the kernel; the ABI for interacting with network hardware is stable. So while in theory you could write an efficient userspace VPN, it would require you to modify or at very least recompile your applications.