Hacker News new | past | comments | ask | show | jobs | submit login

The businesses are coming with requests that require complex SQL on millions of records of data that normally is sitting in various sources (warehouse, salesforce, etc.). Unless you hire expensive data engineers, you can't do this type of work reliably. You can stick things together with expensive GUI oriented prep tools like Alteryx, but you pay in reliability and, quite frankly, sleep. And forget IT, IT is so stuck in their ways that you'd be waiting years for each analysis + you'd spend 10x what you should.



Isn't the problem space here genuinely complex in terms of business-complexity? Is there some better alternative that doesn't entail some other massive tradeoff such as managing your own servers, creating ingress mechanisms from multiple systems, building your own version of salesforce etc.?

In short, is there any solution that "does everything you could possibly want" while ensuring you _never_ need to hire a data engineer? This is a holy grail that I don't think exists.


Yes it is :)

You have to normalize data taken from various sources of various age and complexity. So you really have to understand the data. You also have to really understand the questions.

I've worked with (and on) lots of these tools and projects; the complexity is never in the frontend, it's dominated by getting the data, getting the data right and into the right format.

If all you want in the end is a good looking dashboard on a website then you might as well build it yourself; because of the cost structure that can even cost less than buying one of the BI frontend tools (there's not a lot of difference in development time, the the BI frontenders are more expensive because they are rarer and the licencing is high).


from my humble experience if you have a sales or product team keeps pumping out spreadsheets in weird formats you need someone dedicating a few hours to get a proper etl, and if they are constantly changing the format or adding new things you need a dedicate person just for that. Modern tools like Python or Power Query are not enough for this eternal war.


It's not that, it's the systems. 15 years ago I built a pretty sophisticated for its time data warehouse for a company that ran call centers. The amount of data that came off of the call systems was staggering, and the format arcane. Every vendor patch had the potential to wreck the ETL process. Then there was account data from clients, and other internal systems.

The people and their spreadsheets was the easy part to control.


This basically reads like, "You need to have a data engineer." Or half an engineer and half an analyst.


Let’s say you have 20000 tables in total for a company. They are in 10 different databases. You have no overview over the data and no comments. You don’t have a starting point for where information x are.

Welcome to my reality.

Would I love a data architect and a domain expert in my team? Yeah.

Will I run around booking meetings with everyone that even hints at working with data like a headless hen? Yeah.

Is this the normal procedure for Data Scientists in big and old companies? More so than I would like.

Oh! And I forgot that the security department will constantly deny your access to data you need (until you force their hand).


Everything you mention is true and is compounded if the data healthcare related. Privacy concerns, data from different systems that claim to be the same. Preventing reidentification.


If you can get your data safely to S3, Athena can handle a lot of reporting and analysis use cases. The table or view definition can handle the normalization process. Full on ETL pipelines are sometimes (but not always) more engineering than necessary.

(Disclaimer: I work in data engineering at Amazon and use those tools in my day to day)


The tools and stack of Salesforce make building your own version extremely appealing.


Expensive data engineer here, I see nothing wrong with this >:)


I am hiring one in Krakow! Seriously though, in a team of 10 business analysts I can barely afford 1 data engineer. Business analysts tend to cost less and also be more "business focuses", so they are an easier sell to management.


You seem to have an engineering problem, so hire engineers and perhaps fire some of those analysts. Don't make your business depend on someone else's tailored IT products, they will reap the profits you could be making.


Yes, I am a bit confused by that statement too. Isn't it good that complex tasks are handled by specialized people? Maybe it is my bias as an ex-big data engineer / current data scientist, but it seems to me that a lot of the tooling is is pretty simple as it can be (yes yes complacency is the enemy of good, I mean no obvious things to improve as low-hanging fruit)


Tableau has a data etl tool called “Prep” that helps with this problem. But it only goes so far. But I think that’s where the problem truly requires a data engineer.


There are plenty of modern day ETL tools like Funnel, Improvad or Dataddo to help with that part of the puzzle, though it does mean you have to pay another saas each month on top of Tableau.


Exactly. Instead of ETL, start writing your own Perl and various logical, reusable components. Roll your own ETL, however you want it, in a terminal. So what, you have to learn vim, big deal! Mouse driven interfaces are a huge part of the dysfunction.


Yea I've been a little confused here until I realized I would just write some bash, Python, Perl...etc script where some would advocate for complicated tools.


And after a few years you leave your job, a new person comes in and gets stuck with your script soup and lack of documentation.

Companies prefer well known products like Alteryx or Tableau because, despite the cost, it makes people easier to replace.

But i cant blame you for writing your own things. Im currently replacing a large SSIS-based etl proces with Python, because i'm sick of SSIS randomly breaking.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: