Another problem is that if you have an array of identical values, Hoare contrived things so that his two partitioning pointers meet in the middle, ensuring O(n log n) behavior. Lomuto produces a degenerate partition, resulting in the quadratic worst case.
My algorithm pdqsort fixes this problem fundamentally which turns many identical values into an ideal O(n log k) case where k is the number of unique values, even if it were to use Lomuto.