Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

re: 3, Python has a native numeric array type https://docs.python.org/3/library/array.html


We should probably get rid of that. It is old (predating numpy) and has limited functionality. In almost every case I can think of, you would be better off with numpy.


If you don't want to add a dep on numpy (which is a big complex module) then it's nice to have a stdlib option. So there are certainly at least some cases where you're not better off with numpy.


Even better if Python adds a mainline pandas/numpy like C-based table structure, with a very small subset of the pandas/numpy functionality, that's also convertable to pandas/numpy/etc.


What kind of subset would you have in mind? I think that any kind of numeric operation would be off the table, for the reasons given in PEP 465:

"Providing a quality implementation of matrix multiplication is highly non-trivial. Naive nested loop implementations are very slow and shipping such an implementation in CPython would just create a trap for users. But the alternative – providing a modern, competitive matrix multiply – would require that CPython link to a BLAS library, which brings a set of new complications. In particular, several popular BLAS libraries (including the one that ships by default on OS X) currently break the use of multiprocessing."


Numpy is incredibly widespread and basically a standard so I would propose: It should have exactly the same layout in memory as a numpy array. It's fine if it has a very limited set of operations out-of-the-box. Maybe something like get, set, elementwise-arithmetic. Work with numpy project to make it possible to cast it to numpy array to help the common case where someone is fine with a dep on numpy and wants the full set of numpy operations.


The best they can do without BLAS. Doesn't have to be as fast as numpy, just faster and more memory efficient than doing it in native Python, without the dependency.


It just should be native support for Apache Arrow.


A performant table data structure with powerful and convenient syntax for interaction is one great feature Q has that Python lacks.


array is for serializing to/from binary data. It isn't useful for returning from a library because the only way a python programmer can consume it is by converting into python objects, at which point there is no efficiency benefit. numpy has a library of functions for operating directly on the referenced data, as well as a cottage industry of libraries that will take a numpy array as input. Obviously someone might end up casting it to a list anyways, but there is at least the opportunity for them to not do that.


multiprocessing.shared_memory.ShareableList can be useful in some circumstances, even if you don’t intend on sharing it across processes. It allows direct access to the data, elements are mutable (to an extent; you can’t increase the size of either the overall list or its elements once built), and since the underlying shm is exposed, you can get memoryviews for zero-copy.

The downside is they’re on the more esoteric side of Python, so people may not be as familiar with them as other structures.


Cython implements a C api for accessing the underlying data structure.

Arrays implement the buffer interface so they can be used efficiently with tools like numpy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: