Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> But I'm not sure why the table is 4×4. The fourth transition from every state is unreachable since the symbols are all in [0, 2].

I think the fourth transition represents illegal Unicode. From there it stays in the illegal state until it hits legal UTF-8, then goes back to counting.




For handling ASCII, table needs 4 states × 3 classes of chars. Why is it defined as 4×4?

To handle illegal chars, it would need a 4th class but also a 5th state, so that's not the reason.

Can it be to replace a 'modulo' operation with an 'and' in the access to table?


That isn't relevant in the ASCII example, and the UTF-8 one uses a 256×256 table.


Half of all bytes are illegal ASCII though.


And actually there isn't a state for illegal ASCII so it makes no sense to have a transition to/from it. And it still can never receive a 3 as input and use that transition.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: