Hacker News new | past | comments | ask | show | jobs | submit login

Yes it is :)

You have to normalize data taken from various sources of various age and complexity. So you really have to understand the data. You also have to really understand the questions.

I've worked with (and on) lots of these tools and projects; the complexity is never in the frontend, it's dominated by getting the data, getting the data right and into the right format.

If all you want in the end is a good looking dashboard on a website then you might as well build it yourself; because of the cost structure that can even cost less than buying one of the BI frontend tools (there's not a lot of difference in development time, the the BI frontenders are more expensive because they are rarer and the licencing is high).




from my humble experience if you have a sales or product team keeps pumping out spreadsheets in weird formats you need someone dedicating a few hours to get a proper etl, and if they are constantly changing the format or adding new things you need a dedicate person just for that. Modern tools like Python or Power Query are not enough for this eternal war.


It's not that, it's the systems. 15 years ago I built a pretty sophisticated for its time data warehouse for a company that ran call centers. The amount of data that came off of the call systems was staggering, and the format arcane. Every vendor patch had the potential to wreck the ETL process. Then there was account data from clients, and other internal systems.

The people and their spreadsheets was the easy part to control.


This basically reads like, "You need to have a data engineer." Or half an engineer and half an analyst.


Let’s say you have 20000 tables in total for a company. They are in 10 different databases. You have no overview over the data and no comments. You don’t have a starting point for where information x are.

Welcome to my reality.

Would I love a data architect and a domain expert in my team? Yeah.

Will I run around booking meetings with everyone that even hints at working with data like a headless hen? Yeah.

Is this the normal procedure for Data Scientists in big and old companies? More so than I would like.

Oh! And I forgot that the security department will constantly deny your access to data you need (until you force their hand).


Everything you mention is true and is compounded if the data healthcare related. Privacy concerns, data from different systems that claim to be the same. Preventing reidentification.


If you can get your data safely to S3, Athena can handle a lot of reporting and analysis use cases. The table or view definition can handle the normalization process. Full on ETL pipelines are sometimes (but not always) more engineering than necessary.

(Disclaimer: I work in data engineering at Amazon and use those tools in my day to day)




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: