I am not aware of any such implicit connection of ledger and commutative property, also couldn't find anything as my google-fu is letting me down. Anything I can refer to? Generally curious to know use of term ledger outside of accounting and blockchains.
I have seen it used to mean WAL before, so I am taking this with a dose of skepticism.
It is refreshing to see multiple projects with arrow/datafusion trying to bank on existing and user friendly spark's API instead of reinventing the API all over again.
There is likes of comet and blaze that replace execution backend of spark with datafusion and then you have single process alternatives like sail trying to settle in "not so big data" category.
I am watching evolution of projects powered by datafusion and compatible with spark with keen eye. Early days but quite exciting.
there is Ibis[0] as a fairly mature package. They recently adopted duckdb as the default execution engine and it can give you a nice python dataframe API ontop of duckdb, with hot-swappability towards heavier engines.
With tools like this providing a comprehensive python API and the ability to always fall back to raw SQL, i am not sure DuckDB devs should focus on the python API at all beyond basic (to_table, from_table) features.
Impressive progress and a real chance to shake up the data tool market, but still a way to go:
There is is still much to do especially on large table formats (iceberg/delta) and memory management when running on bigger boxes on cloud. Eg the elusive "Failed to allocate ..." bug[1] is an inhibitor to the claim that big data is dead[2]. As it is, we tried and abandoned DuckDB as a cheaper replacement for some databricks batch jobs.
Its great to have a single entrypoint for multiple backends. What I am trying to understand and couldn't find much information related to: How does the use of multiple engines in Ibis impact the consistency of results for the same input and query, particularly in relation to semantic differences among the engines?
Is there a way for general public to see the status of open sluice gates and water levels in various parts of netherlands? A live datastream might be the best!
Rijkswaterstaat publishes quite a lot of information online. Water levels can be seen on a map[1] on waterinfo.rws.nl. It also offers access to historic data in CSV format, but that is handled through a wizard and you get the data by email apparently.
Information about sluices, bridges, etc. can be found on vaarweginformatie.nl, including the live status of many of them. The Ijmuiden sluice complex doesn't have a live status it seems. See [2] for the map.
I guess 99% of the population would not be qualified to tell if an open sluice gate is problematic under certain circumstances. If it were as easy as "Sluice gate open && water level > X" I am 100% sure there would already be an automation for it.
Druid has quite some intelligence baked in to handle the scaling by default. I am curious how clickhouse is doing in all those aspects.
When we did a PoC, the operational aspect of clickhouse and performance was severely lacking as compared to druid. Clickhouse had bigger resources at its disposal than druid during this PoC.
If they could improve the operational aspect and introduce sensible defaults so that the users don't have to go through 10000 configuration to work with data in clickhouse, I am sure I will give it a go for some other usecase. It is simple on surface but devil is in the details. Druid is much simpler and sane at the scale I need to operate.