I'm building a new tool for end-to-end data validation and reconciliation in ELT pipelines, especially for teams replicating data from relational databases (Postgres, MySQL, SQL Server, Oracle) to data warehouses or data lakes.
Most existing solutions only validate at the destination (dbt tests, Great Expectations), rely on aggregate comparisons (row counts, checksums), or generate too much noise (alert fatigue from observability tools). My tool:
* Validates every row and column directly between source and destination
* Handles live source changes without false positives
* Eliminates noise by distinguishing in-flight changes from real discrepancies
* Detects even the smallest data mismatches without relying on thresholds
* Performs efficiently with an IO-bound, bandwidth-efficient algorithm
If you're dealing with data integrity issues in ELT workflows, I'd love to hear about your challenges!
Most existing solutions only validate at the destination (dbt tests, Great Expectations), rely on aggregate comparisons (row counts, checksums), or generate too much noise (alert fatigue from observability tools). My tool:
* Validates every row and column directly between source and destination * Handles live source changes without false positives * Eliminates noise by distinguishing in-flight changes from real discrepancies * Detects even the smallest data mismatches without relying on thresholds * Performs efficiently with an IO-bound, bandwidth-efficient algorithm
If you're dealing with data integrity issues in ELT workflows, I'd love to hear about your challenges!