More

ClaireGz · 2025-05-09T19:38:54 1746819534

We heard of Julius a lot, but did not know about Cipher42, there are a few others folks around. We feel there is a pain and also data teams are a bit abandoned at the moment when it comes to work using AI so makes sense. Curious to get a feedback about your journey building cipher, did you stop working on it?

ClaireGz · 2025-05-09T19:36:37 1746819397

Oh yes! Should be done fairly easily, we have DuckDB coming in the next release, we can also add SQLite. You use SQLite to develop locally I guess?

linsomniac · 2025-05-10T13:12:44 1746882764

I can't really tell what those databases are that are coming soon, a "hover" over the icons would be nice. SQLServer coming anytime soon, my coworkers are working on some data integrity work right now and it might be a nice tool for them.

ClaireGz · 2025-05-10T15:11:29 1746889889

It's Databricks, Iceberg and Redshift, which on the first survey that we did were the most asked. But as per this post and a broader audience it appears SQLite at least win! We'll add also SQLServer in the list

davidwritesbugs · 2025-05-10T12:03:41 1746878621

Yes I second this. I use sqlite for local use and also for prototyping data designs, so sqlite support is very useful indeed - not a deal breaker but definitely a tick item.

bomewish · 2025-05-10T08:17:27 1746865047

Will also give nao a shot as soon as this is shipped. A LOT of non corp data work happens in SQLite and duckdb.

redwood_ · 2025-05-10T00:59:56 1746838796

Yes just local, but love to use Nao to quickly analyze datasets

ClaireGz · 2025-05-09T18:12:28 1746814348

we support Postgres (and DuckDB is coming very soon) so yes probably as Hydra is a mix of both, but I have to try it

coatue · 2025-05-09T18:29:11 1746815351

Sweeeet. Let's give it a go!

ClaireGz · 2025-05-09T17:54:50 1746813290

Thank you!

ClaireGz · 2025-05-09T17:54:29 1746813269

We say "data vibing" so it feels unique to the data community! But in all seriousness, this is already an issue, people are already asking ChatGPT (or Cursor/whatever else) to generate SQL for them, but the next steps do not exist, if you "vibe code" for data you want to have the easiest feedback loop you can get to check if the output is good, and that's what we are working on: identifying the downstream impacts in the IDE and proposing fixes, a table diff view, new UI/UX to test your outputs.

The goal for us is to be the best way to do data with AI.

dennisy · 2025-05-09T17:59:44 1746813584

Ok, but how do you know it’s good?

With data I think that is very hard, I wrote a SQL query (without AI) which ran and showed what look like correct numbers only to years later realise it was incorrect.

When doing more complex calculations, I am not clear how to check if the output is correct.

ClaireGz · 2025-05-09T18:09:09 1746814149

Usually what we've seen is data people having notebooks/worksheets on the side with a bunch of manual SQL queries that they run to validate the data consistency. The process is highly manual, time consuming. Most of the time teams knows what kind of checks they want to run on the data to validate it, our goal here is to provide them the best toolbox to do it, in the IDE.

Tho, i'd say this is like when writing tests in software, you can't catch everything the first time (even when going 100% code coverage), especially in data when most of the time it breaks because of upstream producers.

It will still require live observability tools monitoring data live in the near future.

ClaireGz · 2025-05-09T17:19:54 1746811194

Thank you!

ClaireGz · 2025-05-09T17:19:03 1746811143

Great! Let us know if it's how you imagined it when you try it

ClaireGz · 2025-05-09T17:17:35 1746811055

Thanks for the kind comments, he's surely a great guy :)

ClaireGz · 2025-05-09T17:15:31 1746810931

When it comes to SQL writing we are more relevant, when it comes to speed this is hard to benchmark exactly against Cursor and Windsurf but we are a bit slower (around ~600ms on average) obviously and we know what we have to improve to speed it up.

Next in the list is the next edit suggestion dedicated to data work, especially with dbt (or SQL transformations) where when you change a query you have to change the downstream queries directly.

ClaireGz · 2025-05-09T17:05:53 1746810353

Thank you! Actually this is exactly what we target, we've seen that data teams have often a longer feedback loop than software engineers. That's a goal for us to shorten it and to bring data the closest to your dev flow.