A shift of 16 is enough for every 8-bit numerator, ie. x/a is (u32(x)*b)>>16 for some b depending only on a. You could precompute b for each a and store it a lookup table. The largest b is b=65536 for a=1 and the smallest is b=258 for a=255, so b fits in a u16 if stored with a 1-offset. Not sure it's worth it unless you reuse the denominator many times though.