Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm building a new tool for end-to-end data validation and reconciliation in ELT pipelines, especially for teams replicating data from relational databases (Postgres, MySQL, SQL Server, Oracle) to data warehouses or data lakes.

Most existing solutions only validate at the destination (dbt tests, Great Expectations), rely on aggregate comparisons (row counts, checksums), or generate too much noise (alert fatigue from observability tools). My tool:

* Validates every row and column directly between source and destination * Handles live source changes without false positives * Eliminates noise by distinguishing in-flight changes from real discrepancies * Detects even the smallest data mismatches without relying on thresholds * Performs efficiently with an IO-bound, bandwidth-efficient algorithm

If you're dealing with data integrity issues in ELT workflows, I'd love to hear about your challenges!



This sounds interesting. Is this meant to run in pipelines or be used interactively?

Are you building something open source? Link to the repo?


It’s meant to run in or alongside the pipeline continuously.

Not planning to open source, working on a commercial offering but haven’t launched anything publicly yet.

Would love to hear any more thoughts on the concepts here or my email is in bio




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: