GSO improves performance because you can send a ~64kB "datagram" through the stack, and then it will be split up either by the network card (if it supports GSO) or by the kernel. This is a lot more efficient than sending a pile of 1500 byte packets through the stack. It also saves a lot of context switches with the kernel.
By way of comparison, when you send data over TCP, the system call interface lets you write very large chunks, so a single system call can write gigabytes of data.
By way of comparison, when you send data over TCP, the system call interface lets you write very large chunks, so a single system call can write gigabytes of data.