Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Meta does it another way. Instead of one giant data warehouse or various DW silos, build a data platform API stack supporting heterogeneous storage adapters, privacy policies, regional locality policies, and retention policies underneath supporting heterogeneous D*L operations. This sidesteps duplication of and denormalizing data and allows for maximum data discovery, reporting, and reuse. And while GraphQL can't be all things to all people, it's pretty damn good. If needing {MySQL,PostgreSQL,{{other_thing}}}-compatible or REST APIs, then build them similarly.

ETL should be minimized (except for external data, which is a bad sign of data owned or managed by a third-party) and replaced with the equivalent of dynamic or materialized "views". Prefer to create hygienic "views" of data against original data rather than mutating and destroying such original data with destructive transformations.

Finally, have a deeply-integrated, robust, enterprise-wide, fine-grained ACL system and privacy policy to keep everyone (and system users) from accessing anything without a specific business purpose need and an approval audit record stored via some sort of blockchain-like tech.



This sounds really awesome, I will note that I put this data stack together by myself in about 1 week when we were just ten people in the company.

Obviously very different resource constraints than Meta, so worth considering which situation you may be closer to when picking an implementation plan.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: