hmmm... 1000x? how does it scale though?

tgb · on Aug 20, 2014

The explanation is actually very simple, you should just read it! But basically they just generate variants of the word by deletions, not also insertions or substitutions. So increasing the "edit distance" multiplies the search space by n (the length of the word) instead of by n + an + a(n+1) (where a is the number of elements in the alphabet). Then there's cost in storing a larger dictionary of words, but lookup being logarithmic or better means that this should be fine. So yes, it scales better (at a memory cost).

ejr · on Aug 20, 2014

The best way to know is to look at the code itself. https://github.com/wolfgarbe/symspell/blob/master/symspell.c...

simias · on Aug 20, 2014

It's easy to say "1000x" when they don't mention the reference :)

Actually even reading the article I'm not completely certain where that 1000x comes from, I suppose it's when compared to the naive approach.

gorhill · on Aug 20, 2014

I think it's compared to Peter Norvig's approach:

> This is three orders of magnitude less expensive (36 terms for n=9 and d=2)

Three orders of magnitude is 1000. Peter Norvig's approach was described above with:

> still expensive at search time (114,324 terms for n=9, a=36, d=2)

So 114,324 / 36 = 3,175, so "three orders of magnitude", and I suppose he went conservative by saying "1000x" rather than "3000x".