1. UTF8 encoding is trivial and everything already speaks it by default nowadays; 2. it is useful to explicitly differentiate byte arrays from text; 3. if you'd change your mind later on and decide you do want to support Japanese (like here: https://100r.co/site/niju.html), you haven't dug yourself into a hole.
1. UTF-8 is not an unreasonable encoding, but it is an encoding of an unreasonable character set and is also unnecessary here. It is better to avoid needing unnecessary conversions that will then be needed in both programs, even though it should not be necessary. Not everything is using UTF-8 and Unicode. I continue to use (and write) programs that do not use Unicode. (And, if it is necessary, conversion program can also be written in uxn; the fact that uxn does not use Unicode does not prevent this; you can implement whatever character sets/encodings that you want to do.)
2. Sometimes it might, but that has nothing to do with uxn. Sometimes it isn't helpful to be differentiated anyways, and sometimes this differentiating byte arrays from text causes problems, too (it isn't really so uncommon).
3. The niju program does not use Unicode and does not need it; it works better without it. If you do want more sophisticated Japanese text, even then there are better ways than using Unicode.
UTF8 is an extremely simple and lightweight text encoding. Check out Plan 9's man page on UTF, it would fit on a t-shirt: https://plan9.io/magic/man2html/6/utf
Unicode is also just a representation for text, and a handful of common operations - you work with arrays of characters, rather than arrays of bytes. It was worth its cost on 1992 hardware; Nintendo DS is over a decade more recent.
I recommend studying libutf in sbase[0]. It's not a single header file solution (although utf.h[1] is an excellent place to start reading), but it does provide a fairly comprehensive implementation. There's also a good introduction to Unicode in Plan 9's C programming guide[2]. Even if you choose to only support runes that fit in a single byte, you gain the ability to tell byte blobs apart from text, which is useful both for reasoning about your program, and for future-proofing it, in case you needed to put places like Łódź or Πάτρα on your map.