Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not actually sure this makes for a good interview question. Doesn't it mostly just test whether you've heard of the alias method?

Btw, a slightly related question:

Supposed you have a really long text file, how would you randomly sample a line? Such that all lines in the text file have the exactly same probability. Ideally, you want to do this without spending O(size of file) time preprocessing.

(I don't think this is a good interview question, but it is an interesting question.)

One way: sample random characters until you randomly hit a newline. That's the newline at the end of your line.




It’s a retired question (so I’m not really disagreeing that it wasn’t very good), and no one was expected to get the alias tables (if they did, just ask for updateable weights) and in fact there isn’t even much point in telling people about them as they can then get the impression they failed the interview. The point is more to get some kind of binary search and understanding of probability.

The Monte Carlo method you propose probably works for files where there are many short lines but totally fails in the degenerate case of one very long line. It also may not really work that well in practice because most of the cost of reading a random byte is reading a big block from disk, and you could likely scan such a block in ram faster than you could do the random read of the block from disk.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: