Hacker News new | past | comments | ask | show | jobs | submit login

The purpose of running SHA is to determine whether the input can be trusted though. But it is true that relatively few files circulate at or above that size.



And even if they are, you probably aren't going to do a digest on the whole thing all at once.


IDK, I see a lot of "non-production" code that likes to read whole files into a in-memory buffer rather than bothering with streaming. With modern memory sizes it is completely possible that you could read a 4GiB file into memory without having issues.


Thinking about it, it's strange that hashlib doesn't have a function that accepts a file-object as input, to make the efficient method default pit of success. Naive implementations like hash(f.read()) are all too common to see. Such an api would also allow the entirety of the tight loop to be done outside of slow interpreted code.

Guessing the rationale is that if you read a file, you also want to use the content for something else, not only compute its hash.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: