More

maxisaurus · on Jan 16, 2024

thanks for the feedback!

- "automates root cause analysis" -> it means (1) showing which rows have affected the metrics and (2) provide some automated context (is it an update? a delete? a dimension that changed? etc). But it is still very early for 2.

- Metrics are defined by users in their usual "data" repository (using dbt for example). The metric computation is not defined on Datadrift, we only go "read it".

- No, it's really for batch processing in a data warehouse (like hourly / daily computations)

- That's not something we had in mind (I know some dbt package can help you do this)

maxisaurus · on Jan 16, 2024

Hey HN,

Sammy and Lucas here. We are building an open-source framework that monitors your metrics, sends alerts when anomalies are detected and automates root cause analysis. Think of Datadrift as a simple & open-source Monte Carlo for the semantic layer era. The repo is at https://github.com/data-drift/data-drift

Datadrift started as an internal tool built at our former company, a large European B2B Fintech. We had data reliability challenges impacting key metrics used for financial and regulatory reporting.

However, when we tried existing data quality tools we where always frustrated. They provide row-level static testing (eg. uniqueness or nullness) which does not address time-varying metrics like revenues. And commercial observability solutions costs $manyK a month and brings compliance and security overhead.

We designed Datadrift to solve these problems. Datadrift works by simply adding a monitor where your metric is computed. It then understands how your metric is computed and on which upstream tables it depends. When an issue occurs, it pinpoints exactly which rows have been updated and introducing the change.

You can also set up alerting and customise it. For example, you can decide to open and assign an Github issue to the analyst owning the revenue metric when a +10% change is detected. We tried to make it easy to customise and developer friendly.

We are thinking of adding features around root cause analysis automation/issues pattern analysis to help data teams improve metrics quality overtime. We’d love to hear your feature requests.

Datadrift is built with Python and Go, and licensed under GPL. Our docs are here: https://github.com/data-drift/data-drift?tab=readme-ov-file#...

Dev set up and demo : https://app.claap.io/sammyt/drift-db-demo-a18-c-ApwBh9kt4p-0...

We’re very eager to get your feedback!

maxisaurus · on Nov 15, 2023

Work a lead data in a fintech company based in EU. I built a simple observability tool for key data assets in a data warehouse. It's a python monitor you add to a given table, it checks that table daily and tells you when there is an issue & which rows introduced that issue. We used static testing framework like great expectations but that was not enough. We did not have the budget for the big data observability players like Monte Carlo, so we kept it simple.

Repo if interested: https://github.com/data-drift/data-drift

(Disclaimer: I am focusing full time on this project to see if it's an interesting business opportunity. It's 100% open-source -- feedback welcome!)

maxisaurus · on Nov 13, 2023

Congrats for this - Love the bitemporal aspect. It was a real struggle for me in past analytics experiences where we spent a lot of time recomputing key metrics 'as of' certain dates for reporting / auditing.

Been following this https://news.ycombinator.com/item?id=38108044 as well, might interest you!

lichtenberger · on Nov 13, 2023

Thanks, Dolt it awesome. I think it's probably the only DBS with branching/merging capabilities as of now.

Sirix from the ground up was built having (bi)temporality and easy audits / append-only paradigm in mind.

The other very similar DBS in this regard seems to be Datomic (as it also uses a persistent index structure), but it doesn't version the pages itself.

j-pb · on Nov 14, 2023

TerminusDB also spports branching and merging. [1]

1: https://terminusdb.com/

lichtenberger · on Nov 14, 2023

Oh yes, I've heard about TerminusDB, but completely forgot, that it exists. Thanks :)

maxisaurus · on Nov 13, 2023

It's more about engineering management but 'Accelerate: Building and Scaling High Performing Technology Organizations' by Nicole Forsgren (Github VP), Jez Humble and Gene Kim is a must-read.

maxisaurus · on Nov 10, 2023

Currently doing it for an open-source metrics observability and troubleshooting tool (15 PoC in production, no revenues yet). Committed about 30% of the amount so far, but it's tough and expectations seems ever increasing (revenues, community traction etc). Curious to hear others experience as well!

maxisaurus · on Nov 9, 2023

Have you considered mentoring with platforms like codementor or superprof? Not sure it fits the "reasonable amounts" but it sure is a nice exp.

maxisaurus · on Nov 8, 2023

One that comes top of mind is Swedish "startup" H2 Green Steel (https://www.h2greensteel.com/). They're building a steel plant powered by a giga-scale electrolyser to produce hydrogen (rather than using coal).

maxisaurus · on Nov 7, 2023

Aligned with the humble way. Have you tried the user research angle like "hey I'm building XXX, thought it might be useful for you because YYY. Would you be open to try it and give us your feedback"? I've been doing this for a dev tool for data analysts and works pretty well. Anyway keep trying and good luck, been there and it's not easy.

maxisaurus · on Nov 6, 2023

Never tried it myself but check OnlyDust (https://www.onlydust.xyz/) PS: not affiliated with them, just know on dev there