Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are papers that omit useful info, this one certainly ain't it. They mention on their website homepage FAQ that image embedding takes 0.15 seconds on an A100.

SAM team released the entire codebase, model weights, and the entire dataset and details of their training recipe. I can't believe you are calling it a bad paper for not mentioning embedding generation time on the paper? Seriously? Like it's a few hundred million parameters model that results in a 256x64x64 embedding of course it's gonna take some time.




OK i can see everybody is bothered that I said the paper was "bad". A more productive way to say it: "Many great papers still omit information that implementors would like to see spelled out".

0.15 seconds (150ms) would be almost good enough for me (I aim for 25 FPS on my microscope, and achieve that with full high quality object detection), but it's 2 seconds on my 2080 (and 2 seconds on my 3080).

These sorts of things are important (even the 0.15seconds would have been useful to include in the paper) for implementors because we can read the paper, and reject it as a solution without having to download any code or run any experiments. It even took a while to get a good answer to why inference is so slow (I was using the python scripts not the notebook, which mentions that image embedding is super slow).

Note they report 55ms inference, I guess that's also an A100, my 2080 and 3080 both take over a second to do inference after the embedding. Looking at inference performance of A100 vs. 3080, it doesn't seem to make sense that the A100 is that much faster (I wonder if they are running many batches and then dividing by the batch size?)

As an ex-scientist, I've come across many well regarded papers that omitted stating something explicitly, and it's only once the first or second reimplementor finishes that we know something important was left out. I don't think the authors were being intentionally misleading, and I'm sure this product is nice, but so far, in my hands, it has not been that great and it would have been nice if they prominently stated the image embedding time, since it's absolutely necessary to do before prompt and mask decoding.


"It even took a while to get a good answer to why inference is so slow" No it didn't, if you read their paper they mention multiple times what parts take most of the time.

There is only so much one can write in a paper. Meta team did a great job writing down every detail that matters to people trying to reproduce the results or build on their architecture. Not a single researcher I know complained that the authors missed out important details.

You are picking on tiny, trivial details that anyone in the area can figure out in a few minutes and making a big deal out of it. It is a research paper, not a product with detailed documentation/spec sheet.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: