Hacker News new | past | comments | ask | show | jobs | submit login

Can you explain what it means to have native support? Are you saying you hook into Python's file io to implement remote IO? Or is the filesystem mounted via FUSE or another technology? I Don't think that Pandas has any HopFS specific code in it.



"In Pandas, the only change we need to make to our code, compared to a local filesystem, is to replace open_file(..) with h.open_file(..), where h is a file handle to HDFS/HopsFS."

HopsFS is a drop-in replacement for HDFS. Here's some more details on native HDFS connectors in Python by Wes McKinney -http://wesmckinney.com/blog/python-hdfs-interfaces/


I think what you are saying is that you're wire-compatible with HDFS, and Pandas has access libraries (special file objects) that support HDFS.


In my opion, Cephfs and GlusterFS have much better support for Pandas and others, you don't need change any code between local development and train an distruted way, rather than a path.

Second, the Python library for HDFS provides only subset of filesystem API, which limits what you can do with it.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: