This helps if and only if you suffer from instruction starvation. Given that nginx binary size is around 1 MB which fits comfortably inside cache, it is very unlikely BOLT would help with nginx.
I don't think that is true. Even without memory bandwidth constraints, jumping to instructions that aren't in cache is going to incur memory latency. If the instructions are all packed together, the next instructions could be prefetched and already be in cache.
Also low cache levels have less latency and yet are much smaller, the L1 instruction cache is 32KB. Any linear access of memory will prefetch and minimize the latency of memory access.
AMDs Zen architecture uses 32KB for data and 64KB for instructions (was curious if there are differences between AMD and Intel designs regarding L1 cache).