Hm, I wonder if the symmetric encryption with a sophisticated cipher is really necessary in this scenario. The aim is require an attacker to read 16Kb of memory in addition to the much smaller key data, e.g. for each bit of the key, the attacker needs X additional, random bits.
Wouldn't it be possible to block-wise xor the random data onto the key?
Maybe use a windowing mechanism, where the window is moved forward depending on the random data (e.g. for a 16 bit key, xor random bits 0 to 15, then move forward 4 to 16 bits, depending on the current window; iterate until the maximum number of window movements necessary to go over the whole random data is reached [leak less bits via timing]; if the end of the data is reached, again start at the beginning, but with some offset to avoid result_bit_0 = secret_bit_0 xor random_bit_0 xor random_bit_0).
You want every single bit error to avalanche to the whole key. You also don't want bit errors at some offsets modulo X to cancel each other out. Cryptographic hashes provide these properties. For any custom solution you would first have to prove that it has the security properties needed.
The asymmetric crypto of a connection setup takes the lion's share of CPU cycles. I don't think it's worth the risk just to beat the cheap (relatively speaking) symmetric algorithms.