Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One reason is that some ML libraries are really slow to import, so you don't want to put them at top-level unless you definitely need them. E.g. if I had just one function that needed to use a tokenizer from the Transformers library, I wouldn't want to eat a 2 second startup cost every time:

    In [1]: %time import transformers
    CPU times: user 3.21 s, sys: 7.8 s, total: 11 s
    Wall time: 1.91 s


I didn't think about lazy loading, I also didn't know they were scoped differently! I thought it was some sort of organisation to keep imports close to usage. Thanks!


The scoping also has some performance advantage: locals are accessed by index in the bytecode, with all name resolution happening at compile-time, but globals require a string lookup in the module dictionary every time they're accessed.

This isn't something that should matter even a little in typical ML code. But in generic Python libraries, there are cases when this kind of micro-optimization can help. Similar tricks include turning methods into pre-bound attributes in __init__ to skip all the descriptor machinery on every call.


Curious, in what cases might this help? The compute would have to be python-bound (not C library-bound); and the frequency of module lookups would have to be in the ballpark of other dictionary lookups. I wonder if cases like this exist in the real world.


The case where I've seen those tricks used with measurable effect was a Python debugger - specifically, its implementation of the sys.settrace callback. Since that gets executed on every line of code, every little bit helps.

(It's much faster if you implement the callback in native code, but then that doesn't work on IronPython, Jython etc.)


They are also scoped differently, but the lazy loading is the key thing here




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: