>What would be your language of choice for the next gen, stable distributed file...

nl · on June 26, 2014

You won't have any trouble running either of these on embedded devices (which I can't say for Java or Go. Go has some weird compiler bugs on ARM platforms, and the JVM is frequently too memory intensive for embedded).

Why is this important in this use-case? If the DFS is being used for data processing then presumably the nodes are reasonably capable machines.

There may well be a difference use-case for a DFS for embedded and resource-constrained devices. That's not what Google or Hadoop is doing though.

vidarh · on June 26, 2014

The biggest limiting factor even in our relatively low-density populated rack is heat and power. With off the shelf servers and relatively low density, I can trivially exceed the highest power allocations our colo provider will normally allow per rack. The more power you waste on inefficient CPU usage, the less you can devote to putting more drives in.

nl · on June 26, 2014

The OP's claim is that memory is the limiting factor in the case of Java. I don't entirely agree, but even if I did it would almost certainly be a fixed overhead per machine, and unlikely to be a problem on server class machines.

Also, the read/processing characteristics of compute nodes often means the CPU is underutilized while filesystem operations are ongoing.

srean · on June 26, 2014

I will leave with an elliptical meta-comment, for those whose competitive advantage lies in others not getting it right, have little interest in correcting misconceptions. You might have interest in this anecdote https://news.ycombinator.com/item?id=7948170

nl · on June 26, 2014

But how much of that is Java, and how much is Hadoop?

Spark runs on the JVM, and much, much faster than Hadoop on similar workloads (Yes, I understand it isn't just doing Map/Reduce, but the point is that Java doesn't seem to be a performance limitation in itself).

srean · on June 26, 2014

Indeed and as I said it did surprise me that Hadoop was so much slower. But the buck really stops at resources consumed per dollar of usable results produced, and in that Java is going to consume a whole lot more. At large scales, running costs far exceeds development costs. BTW my point was not only about Java but also about your assessment of the hardware.

delian66 · on June 26, 2014

CPU and memory resources spend on an inefficient filesystem implementation are just wasted resources, not available for your workload. Keep in mind that the inefficiencies are multiplied over all your cluster nodes.

wyager · on June 26, 2014

There are many use cases for a DFS. Some of those cases involve relatively low-resource nodes.

nl · on June 26, 2014

I agree. But that's a difference use-case to what HDFS is designed for.

liquidcool · on June 26, 2014

I don't think large scale distributed file systems written in C are hypothetical. I'm pretty sure this is exactly what MapR has done - replace the Java-based HDFS with C, retaining the API. GlusterFS by Red Hat is another DFS.

fh973 · on June 26, 2014

As someone who is currently implementing a next-gen distributed file system, I can highlight one aspect: you have a lot of concurrency and asynchronous processing. Thus you need at least reference counting.

dasil003 · on June 26, 2014

Can you really do Haskell on embedded? I thought the far far abstraction away from memory as a concern made it pretty much a non-starter for the foreseeable future.

codygman · on June 26, 2014

According to Atom, you can even do "hard realtime embedded software":

http://hackage.haskell.org/package/atom

wyager · on June 26, 2014

Embedded meaning "ARM running an OS", yes. Embedded meaning "OS-less microcontroller", not so much. You'd have to use an embedded programming DSL for that, which isn't really ARM anymore.

zem · on June 26, 2014

ats will probably be an interesting best-of-both-worlds third option soon, though from what little I've seen of it it is currently harder to write code in than either haskell or c. but once you do put the work in to write your proofs etc. both correctness and speed should fall out naturally.