Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

[flagged]


Like many I appreciated seeing Antirez's kilo project, and went my own direction with a trivial fork: adding support for an embedded lua scripting language, allowing multiple buffers, and flexible syntax highlighting for different languages, etc.

For me one of the challenges was getting UTF8 support, since I live in Finland and am exposed to ä, ö, and other characters. It was a fun learning experience, even though I never intended it to become a "real editor" and I continue to use emacs on a daily basis.

Quickly looking over the (closed) bug reports I see the discussion I had with myself back in 2016 which largely caused me to rewrite the core in C++ so I could take advantage of modern facilities to make UTF8 work more easily:

https://github.com/skx/kilua/issues/49


I think you're reading intent that isn't there with this project. The author states its a WIP, and is writing it as a learning experience for building text interfaces without ncurses.


The lack of non-ASCII support is not mentioned on github, so this warning about this basic capability (which is often taken for granted) is certainly useful.


Can unicode be implemented in a thousand lines or so of C?


Define implementing Unicode. If you want to support rtl/bidi and grapheme clusters and every little detail, probably not.

But 99% of the utility for most people is there if you can find the right column, and. move left and right by character instead of byte, and can output UTF8 sequences correctly. In C it's a minor pain, but not impossible.


Libgrapheme will get you there for the most part. It will let you find characters, words, sentences. It’s nice because it returns byte offsets so you can use them directly for C data structures. I wish there was a way to get the number of characters along with byte offsets which helps with things like line breaking.

https://libs.suckless.org/libgrapheme/


Thank you for this reference. Deriving a freestanding C99 implementation from the standard sounds great. That can probably be rendered as a single source file to drop into a project that otherwise deals only in ascii.


Yeah, but you've added a dependency, which the author doesn't seem to want. If you're willing to add dependencies there are multiple other options too.


Rudimentary BMP support, probably. You basically have to account for combining characters and double-width characters.

Emojis and multi-character* presentation form code points like U+FDFD would take a lot more work to do correctly.

* I'm using "character" here in the linguistic sense. Unicode did not invent the word "character" and it doesn't only mean code points.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: