Nearly any real-world Huffman encoder is trivially optimal, i.e., given a set of...

Nearly any real-world Huffman encoder is trivially optimal, i.e., given a set of symbols with probabilities, it is easy to construct the optimal set of output bits for each symbol. (There are some exceptions in that if you have a huge amount of symbols, or some that are extremely rare, you can bound the upper length and get a non-optimal code. This matters very little in practice, i.e., less than 0.1% in a typical setting. And of course, if you allow fractional bits or probabilities that change based on context, then you can do better, but then it's no longer Huffman.)

BWT is orthogonal to Huffman; like LZ, it exploits that some symbol _sequences_ are more common than others, while Huffman is about the optimality of coding each symbol on its own.