You don't need a bytecode interpreter to not have UB defined in your language. E.g. instead of unchecked addition / array access, do checked addition / bounds checked access. There are even efforts to make this the case with C: https://github.com/pizlonator/llvm-project-deluge/blob/delug... achieves a ~50% overhead, far far better than Python.
And even among languages that do have a full virtual machine, Python is slow. Slower than JS, slower than Lisp, slower than Haskell by far.
And even among languages that do have a full virtual machine, Python is slow. Slower than JS, slower than Lisp, slower than Haskell by far.