I use NFKC form for scripts that seem to require it, such as Arabic, and NFC for...

espeed · on Dec 17, 2016

Thanks for the info. I'm looking at this from the perspective of designing a backend datastore and query engine for a knowledge system. The idea is to encode a spatial data structure (similar to Google's S2 Geometry Library [0]) that enables content-based addressing of non-spatial data types for data fusion.

One idea is to make a lattice of unicode characters that builds up to combination of words a la Formal Concept Analysis [1] -- on one level, the characters compose into words that represent properties (key/value pairs), and then the KV pairs compose into higher-level objects. Each property and higher-level object is encoded with an integer derived from its constituent objects/properties, and each object is encoded in such a way that its constituent objects/properties can be determined algorithmically from the integer without having to traverse the structure [2] -- ANS encoding [3] embedded into a space with a VI metric (https://en.wikipedia.org/wiki/Variation_of_information) [4] might make this work. Have you played with this type of design?

[0] http://blog.christianperone.com/2015/08/googles-s2-geometry-...

[1] https://www.youtube.com/watch?v=Xuxm929tIRY

[2] http://www09.sigmod.org/sigmod/record/issues/0506/p47-articl...

[3] https://en.wikipedia.org/wiki/Asymmetric_Numeral_Systems

[4] http://www.sciencedirect.com/science/article/pii/S0047259X06...