Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

While the interview question is a fun exercise, the complexity analysis presented is wrong.

For part 1 using a hashtable to look up the dictionary words, the complexity is not O(n^2) where n is the length of the input string. It should be O(w^2) where w is the length of the longest word in the dictionary.

The hash key material of the dictionary words has at most w characters. When computing the hash key for the input string, it can be truncated at w, as any longer input won't lead to a hash key making a match in the hashtable.

For part 1 using a trie, the complexity is O(w) rather than O(n). This is actually more obvious since walking down a trie tree will stop at the last character of a dictionary word.

Again for part2, the complexity is O(w 2^n) using bruteforce with hashtable, O(w^2 n) using DP with hashtable, and O(w n) using DP with trie.

Edit: Also for short strings like dictionary words, O(w) is kind of meanless since with SIMD or SWAR the short word can be folded into one or two DWORD. You can effectively treat it as O(1).



You are absolutely right. I should have been more clear about the dictionary size being large compared to n, and similar about the length of the longest word in it for all my analysis to follow.


Also, while trie looks better on paper than hashtable in regarding to complexity analysis, in reality hashtable can be 2X~3X faster than trie. This is because the hash key can be computed very fast with the key data fitted in a L1 cache line and with much less branching jumps in the hashing function, while trie does a separate memory load on each node going down the tree plus each character comparison causes a branching jump. It's always good to benchmark before settling on a scheme.


Both of your answers in this thread are great, but it doesn't sound like what behdad was looking for. I think the unfortunate reality is that you're supposed to try to figure out what he wants to hear, which is that his answer is the most correct one in his mind to this awesome question that's so great he violates his company's rules in order to keep asking it. This just lead to cheating, where a smart person with connections learns from a friend what behdad wants to hear, and is able to give it to him, while people who don't have these connections give a more correct answer like yours and get rejected.


This feels really weird to consider in terms of "O(..)" notation as presented to be honest. Once you get into the trie and other lookups the "n" vs "w" is not something you can ignore. The real effects of cache usage and implementation details render it true-but-meaningless as written.

I don't think the function "inDict" is even defined as either local or remote query (unless I missed it), which really influences the answers. Querying a short string against in-memory structure will be based on "n", but querying a long string against any remote database will be based on "w".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: