Hacker News new | past | comments | ask | show | jobs | submit login

hmmm... 1000x? how does it scale though?



The explanation is actually very simple, you should just read it! But basically they just generate variants of the word by deletions, not also insertions or substitutions. So increasing the "edit distance" multiplies the search space by n (the length of the word) instead of by n + an + a(n+1) (where a is the number of elements in the alphabet). Then there's cost in storing a larger dictionary of words, but lookup being logarithmic or better means that this should be fine. So yes, it scales better (at a memory cost).


The best way to know is to look at the code itself. https://github.com/wolfgarbe/symspell/blob/master/symspell.c...


It's easy to say "1000x" when they don't mention the reference :)

Actually even reading the article I'm not completely certain where that 1000x comes from, I suppose it's when compared to the naive approach.


I think it's compared to Peter Norvig's approach:

> This is three orders of magnitude less expensive (36 terms for n=9 and d=2)

Three orders of magnitude is 1000. Peter Norvig's approach was described above with:

> still expensive at search time (114,324 terms for n=9, a=36, d=2)

So 114,324 / 36 = 3,175, so "three orders of magnitude", and I suppose he went conservative by saying "1000x" rather than "3000x".




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: