Edit: Reading some comments here - this doesn't seem to be about 'recognising' or 'predicting' existing characters, but using a dataset of characters to create a character by itself (which probably isn't an existing character).
This is quite 'clever'.
I don't understand the example shapes at the beginning. They're not correct strokes. How does that work?
The about page has some neat made up characters.
But after trying a few strokes, and doing so more carefully, it seems if you put in a clear radical, the character is well formed, kinda; if you put in a squiggle, all you get is a doodle... that makes sense.
Inputting 口 or 艹 for example, vs a random squiggle. Take care to make it reasonable accurate.
About characters, incase anyone doesn't know: A character is basically a 2x2 grid where 4 radicals get placed (there are (about) 201 radicals in modern Chinese, Japanese kanji too I guess?). Sometimes 'cells get merged' so the left column of 2 rows is merged to contain 1 radical, and the right contains 1 or 2 radicals. Or 'add a row' can happen at the top, for example adding a 艹 above the 2x2.
"a character is basically a 2x2 grid"...
Do you have any reference for this type of classification/composition?
N.B.: I am not implying that it is incorrect - I am a dabbler in Japanese Brush Calligraphy (without any fluency in the language, only as an art) and I would like to read more about this so if you have links or books (preferably in English) I would be very happy to learn more.
The practice paper given to children learning kanji is commonly partitioned into a 2x2 grid, to help with proportions -- kind of like how children learning to write English are given paper with guidelines for baseline, ascenders, descenders, and mean line. You can see an example of the 2x2 grid in the Nintendo kanji game on the article's "Info" page, or by searching for "kanji practice paper".
The lines are meant more to help relative placement than to exactly divide characters, but many characters do divide into left-and-right or top-and-bottom portions -- see Jack Halpern's SKIP system for example:
This was related to some work I did a few years ago but recently had time to retrain models to make it work inside the browser in an interactive setting.
The dataset used to train the network had to be refined a bit as well to match how humans write on a tablet.
Some prev discussion a while back for the original non-interactive TensorFlow version:
Hi Jason! Please get in touch with me too; I have a lot of Chinese data to share and discuss.
I made https://pingtype.github.io - a program to break up sentences into words, pinyin and parallel translation, and typing characters by breaking them into glyphs.
I also just finished making a large dataset of glyph images of 52,000 characters from 1200 fonts - see my other comment for the download link.
Hit or miss for me. It doesn't want to fill in anything if I draw the enclosure of "wind": 風. Also fails to find some of the simplest kanji; completions for just a stroke or two are quite complex. E.g. a simple downstroke suggests mouth 口, with just two more strokes, or 土 and whatnot, but .. 出てきてくれない. :) It is also stumped if I draw the first two strokes of 山; it wasn't trained on this character, to complete that middle stroke, and basically doesn't want to do anything else, either.
Some out-of-order inference would be cool. E.g. draw the bottom four dots (fire) of 煎, and have various top parts emerge. For that purpose, it would be good if there were a reference frame. That is to say: an underlying square box to serve as a target for the supplied input. If you draw something near the bottom of the empty box, then it's understood by the neural network to be a bottom part of the kanji requiring a top. I think the whole concept could really benefit from a precise agreement between the user and the neural net about the bounding box.
I don't think it's your kanji writing; I can't figure out what it's supposed to do. I'm assuming it's supposed to autocomplete my kanji, but it failed at that, for both 国, 本, and 三. I'm pretty confident I wrote them right, especially the last one :)
It seems to just write random kanji based on the last stroke or something. But, writing random kanji has a certain coolness factor. Especially if I could copy-paste a kanji and get it to write it for me so that I know what the stroke order is supposed to be.
I think this demo was created recently, and the old article linked in the demo is there only for background info, as the author explained in one of the comments below.
TensorFlow.js only came out this year and the interactive sketch-rnn JavaScript browser demo that this was based off of is also quite recent.
I think the reason that you believe the JS code is obfuscated is because the part of the code that contains the “weights” of the neural network, which contains 4-5 million floating point numbers of an LSTM recurrent neural network.
Thanks for bringing this to my attention, although a link to the Stallman article which someone linked below would have been helpful. I've installed LibreJS, I'm curious to see how many things stop working now.
This is quite 'clever'.
I don't understand the example shapes at the beginning. They're not correct strokes. How does that work?
The about page has some neat made up characters.
But after trying a few strokes, and doing so more carefully, it seems if you put in a clear radical, the character is well formed, kinda; if you put in a squiggle, all you get is a doodle... that makes sense.
Inputting 口 or 艹 for example, vs a random squiggle. Take care to make it reasonable accurate.
About characters, incase anyone doesn't know: A character is basically a 2x2 grid where 4 radicals get placed (there are (about) 201 radicals in modern Chinese, Japanese kanji too I guess?). Sometimes 'cells get merged' so the left column of 2 rows is merged to contain 1 radical, and the right contains 1 or 2 radicals. Or 'add a row' can happen at the top, for example adding a 艹 above the 2x2.