Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Very cool! Seems like a Rust version of something like Bento? [1] Have you done any benchmarking against similar stream processing tools?

[1] https://github.com/warpstreamlabs/bento




I haven’t benchmarked this, but I have recently benchmarked Spark Streaming vs self-rolled Go vs Bento vs RisingWave (which is also in Rust) and RW matched/exceeded self-rolled, and absolutely demolished Bento and Spark. Not even in the same ballpark.

Highly recommend checking RisingWave out if you have real time streaming transformation use cases. It’s open source too.

The benchmark was some high throughput low latency JSON transformations.


Thanks for your recommendation.


Yes, they are similar. ArkFlow is mainly based on DataFusion. Bento actually comes from Benthos. Currently, the ArkFlow project is in the early stages and no performance comparison test has been conducted, but I believe that ArkFlow will outperform them in the long run.

Benthos: https://github.com/redpanda-data/benthos

DataFusion: https://github.com/apache/datafusion


What we found with RPCN (redpanda connect)/old benthos is that most systems are very slow and only cpu intensive things require manual CPU instruction optimizations like the snowflake connector we wrote (https://docs.redpanda.com/redpanda-connect/components/output...). The bulk of it is just about completeness. Go feels like the Perl of the 2020s. Cool little libs for just about everything.


Yes, RPCN (redpanda connect)/old benthos is very cool and can solve most of the scenes. Let me tell you quietly that I am using it too.


Arroyo is another one based on DataFusion


Yes, Arroyo is entirely based on DataFusion, but ArkFlow is not exactly. In the future, ArkFlow will establish a plug-in ecosystem, allowing anyone to process data through plug-ins, not limited to DataFusion.


This isn't quite correct (I'm the creator of Arroyo). We use DataFusion to implement parts of our SQL support (in particular the planner and the expression interpreter) but we have our own dataflow and operators. By contrast Synnada[0] is directly built on DF.

A contrast between Arroyo and systems like Benthos and from what I can tell ArkFlow, is that Arroyo is a "stateful" stream processing engine, which means that we can support things like windows, aggregates, and joins, with exactly-once semantics and fault tolerance, at the cost of significant additional complexity[1].

[0] https://www.synnada.ai/ [1] https://www.arroyo.dev/blog/stateful-stream-processing


Arroyo has been designed with more comprehensive consideration.


Sorry, please forgive me for not knowing Arroyo completely.


No worries! We definitely rely heavily on DF (it’s an incredible project!). Part of what makes it so great is its modularity—it’s a toolkit for building sql systems, which is extremely cool.


Yes, whether it is DataFusion, Arroyo, or Bentos, these open source products have made me profit a lot.


That made me chuckle. May you profit in all senses.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: