I don't hear a lot about "data science" anymore. And judging from the shrinking number of job postings, I suspect it was a bit overhyped a few years ago. What do you think?
I think what companies really want is smart generalists with advanced math, programming, and modeling skills coupled with domain knowledge. That skill set will always carry high value in technical companies.
The reason it carries value is the skills are difficult to acquire. I think the recent decline in interest reflects the rise of new data science candidates that are taking the path of least resistance to a career in data science. Rather than pursuing problem solving, people are pursuing "data science" which is a nebulous term in and of itself.
I am wary when people wax lyrical about all of the ways they love using machine learning on data. It makes me nervous because i worry that they have a hammer and can't wait to use it on anything vaguely nail shaped.
Yep, that's why I make sure to set time aside for toy problems. Creating contrived problems can sometimes scratch that itch to use a certain technology that don't really fit into what I'm doing at work.
Machine learning is an area where you need to be able to produce results. Fake it ‘til you make it isn’t going to cut it for long. Either these people produce something that works, or they don’t.
> Machine learning is an area where you need to be able to produce results.
Having to produce results is one thing. Mindlessly throwing tensorflow/pytorch at problems is an entirely different problem.
It's like those front-end devs who mindlessly insist that they need to use heavy javascript frameworks with convoluted build processes such as React/Angular to churn out a static web page with a couple of paragraphs and images.
You can certainly produce some plots and numbers, and possibly even plots and numbers that look good to your boss/clients/investors. The (multi)million dollar question is whether those numbers are actually meaningful. I think this is where a lot of ‘data science’, both in industry and academia, falls down.
Some state-of-the-art models don’t even generalize to test sets drawn from the same database, let alone similar data sources or the actual business problem. Unless you run a pet shop, telling breeds of dog apart, a la ImageNet, is probably not your goal.
Well, no. ML solves the classification problem, not the prediction problem.
E.g.: The "is this a cat picture" problem is effectively solved, but we _still_ can't reliably predict something as primitive as a simple binary proportion.
>I think what companies really want is smart generalists with advanced math, programming, and modeling skills coupled with domain knowledge. That skill set will always carry high value in technical companies.
What companies want is to be "in" on the data science hype, while they have no clue what they are doing and the most advanced "data science" they need are simple graphs, boxplots, and linear regressions.
Yes. Unlike software development, data science is not completely business agnostic and a fair amount of business understanding is required. For e.g. if you work with sales data, you need to be aware of seasonalities, purchasing patterns etc to understand the trends that you observe to discern between what is a true outlier and what is explainable.
What makes me cringe the most is to see flashy presentations with claims akin to 'Data Science will change your world'.For sure, it can and has been proven to automate decisions (think, credit scores), assist in decision-making (think, sales trends) and anomaly detection (think, security systems). I find so many data scientists that I interview are so hung up about the esoteric techniques they employ, often failing to even explain why was it useful or how it helped their businesses.
What has been transformational and path-breaking is the breaking of enterprise monopolies in this space (for e.g. SAS/IBM SPSS) and a variety of open-source frameworks have made it easy and convenient, apart from opening it up to software developers to build these skills. Important, though, to not lose of the sight that data science is at the sweet spot of expertise in domain, data and technology.
Yes I'd be quite keen to see that also, I wasn't aware there was a decline already (although it's obvious the interest can't just keep going up forever).
There's a lot of hype in software. Just like you were hearing about Blockchain startups only when Bitcoin started trading at 5 bazillion USD and now you don't hear nearly as much about it. During 2008-2015ish we went through an insane churn of new JavaScript frameworks and NoSQL database hype.
Yeah it seems to have calmed, but I don't think data science was just hype because it comes from (and somewhat is) probability and statistics, and the rate of data/information that's being produced by and extracted from people seems to be ever increasing. But it absolutely was prone to a hype cycle as with almost anything else in tech. IMO this is a phenomenon exacerbated by venture capital.
Most of what we call “data science” is repackaged “data mining” — a skill that goes easily back to the mid-90s. Sure, open source tooling makes it all a lot more accessible today; but IBM / Oracle / etc. have offered similar packages (at MUCH higher price points, of course) for decades.
I think once the hype calmed down, people started to realize that it was largely the same old shit in a much cheaper package — evolutionary rather than revolutionary. Ultimately I think the hype cycle was driven by Moore’s Law more than anything; the fact you could run this type of analysis in a manageable amount of time without needing a huge IBM mainframe was the real innovation.
That hype in software is the reason I wrote a media literacy guide for software engineers. Hype makes it easy to get free marketing and also drives clicks for media and social networks.
I've always had this suspicion that open source software is some kind of propaganda, that it's not purely organic. The work is hard, the benefits are huge, yet the contributors are paid zilch. If I were a big company I would for sure want to win "the hearts of engineers" as you say, to sway them to contribute software critical to my needs, and to pay nothing for it.
I'm a "data scientist" and couldn't agree more. There are a lot of companies that benefit from hyping emerging tech and careers to a point of saturation, these include bootcamps, consulting firms and service providers. The market has calmed but I wouldn't say it was just hype (obviously I'm biased!)
> IMO this is a phenomenon exacerbated by venture capital.
Not just VCs. It's a whole mafia gang consisting of tech reporters and founders also. They all have their vested interests - reporters want new stories and founders want funding and growth.
Slack, VR, AR - they all went through this cycle. Sometimes, it's a bit annoying.
Right, I don't mean the VCs themselves or any particular parties - rather more the expanded role of venture capital in tech startups (esp. looking for unicorns) allowing companies to ride out net losses for long periods of time, distorts our perception of technology and business in a variety of ways, including this hype problem.
UC Berkeley has an undergraduate data science major. The choices and bureaucracy of creating new majors is also prone to hype though, so it might not be the best indicator.
Agreed - many state universities are cashing in on industry trends, just at a slower pace. You can see it with the various weirdly specialized masters programs and especially certain online degree programs.
For us, “data science” is just doing research on the company’s data. We hire trained researchers - PhDs or PhD candidate dropouts - regardless of specialization. One of our most prolific and respected trained in political science.
It wasn't 'just' hype, but it was over-hyped. There are companies that have their act together from a data standpoint and can make use of data scientists, whatever that term actually means in the context of their organization, but most can't. So the companies who spun up a data science initiative but had no business doing so are now likely saying things like 'what do you mean we don't have the necessary data?' and 'what do you mean our data is a mess?' etc. and will quietly back off over time. Likewise, the companies who can take advantage of it will quietly do so. No different than the hype surrounding every other buzzword in tech... there is no silver bullet.
Sometimes it can't be cleaned up, or the process of cleaning it up takes too long or is more expensive than the company wants to spend. Sometimes the clean up process is error prone, or leaves you with too little useable data. Sometimes the data really is too noisy and no amount of clean up is really possible. It's definitely true that data clean up is a problem data science can address, but it's not a magic wand.
It's absolutely part of the process... just not something companies always realize they're signing up for, or the degree to which they are, when they initiate projects. One problem is that the cost of cleaning it up can easily exceed the cost of, and benefit from, the work they planned on doing with the data. Another is that sometimes it's a mess (structurally or the company failed to record key data at point of collection that can't be determined after the fact etc.) to the point of being useless in terms of being able to glean meaningful insight from it.
Several of the Data Warehousing projects I've dealt with could be better described as Data Landfills. One can't just dump data into a hole for years, let it rot, and expect goodness when you go back to look at it.
Usually the companies able to benefit from data science are also the ones best positioned to benefit from digitalization. They have their processes under control. I worried that all the others will just be relegated to ... wherever.
The hype was that you could take a huge pile of data and turn it into hugely valuable insights.
The realization is that any random pile of data likely doesn't have anything in it that is worth paying for:
Here's our analysis!
We already do/knew that.
Some people have really good, valuable data sets. Most people don't.
It’s an epistemology / ontology question, as folks familiar with the humanities would spot in little time. Aka it’s not “data” until something empowers the created metric a meaning.
I think the point is that "data" is useless until it becomes "knowledge," and that turning data into knowledge is a complicated and philosophical act. Business being business, most of the people running the show were never interested in the deeper questions of how to create knowledge. They just wanted a new arrow in the quiver. Once they realized the cost of actually doing the work, it became much less appealing.
My guess is that data science roles will merge with business analyst roles. Python and r will slowly join excel as tools of choice for making tables and charts to stick in powerpoint slides and pdf reports. Meanwhile the machine learning side of things will be the domain of _something_ engineers with candidates more likely to come from the computer science/math/engineer world rather than the sciences. (Other then those with degrees in physic who seem able to perpetually land wherever they feel like.)
> My guess is that data science roles will merge with business analyst roles.
Data Scientist is a buzz word for Statistician. Business Analyst is buzz word for Industrial Engineer. For example 10 years ago if you studied at my university you would witness that some Statistics students were doing second major mostly at Industrial Engineering and vice versa. They are already related for many years but average Joe has no idea.
Eh. I'd say "industrial engineer" has more bullshit in it than "business analyst" by a long shot. The twenty or early thirty something spending all day in powerpoint and excel to make presentations and handouts for a fifty or sixty something to present to other fifty and sixty somethings is more likely to have majored in economics, business, and/or have an MBA than to have spent any time anywhere near an engineering department.
I still don't really know what a business analyst role truly entails. In my office they seem more like mini project managers w/o the management. They talk to internal stakeholders a bunch, handle a lot of our UAT, and I guess do some reports stuff? One has moderate tech skills (but no programming) and the other is just really good at Excel.
I'm still not sure what their actual formal responsibilities are.
I mean, a business analyst should do just that: analyzes the business. It's usually their job to translate strategic areas of improvement in the business unit into specific outcomes.
They ask questions to identify areas of improvement; translate those into functional (and sometimes technical) requirements for other areas (not just IT) to fulfill; and then coordinate the efforts to implement those requirements, potentially as PMs, product owners, Scrum masters, UAT leads, or just a SME.
The best BAs (paraphrasing the data science JD) know more tech than the business and more business than the tech.
It's really not. The skill set we need in terms of some software, system design, and a rich knowledge of modern data science libraries and trends is not something you should expect a statistician to have.
Similarly, I certainly cannot prove asymptotic theorems like a statistician.
Or design patterns, software, modern database architectures, cloud services, how to write production code...
I'm not trying to slight the Dept. of Stats. I love statistics. I've found studying PhD stats textbooks more valuable for my data science career than learning the latest deep net framework. I'm just noting there is a need for other tools.
We're passed the early peak of the hype cycle, but now the marketers have calmed down the real world applications are maturing.
If you think of Data Science as AI sure, but if you frame it as applied statistics + good software engineering practices + cloud scale I think things are in a good place.
This. Any good data science person needs to understand how distributed systems work. They need to be decent at applied statistics. I think an average engineering grad can easily work with simple statistics they learn in Linear algebra and intro to ML. The good software engineering part is what has been brought on in the last few years.
I once saw some code written by a "data scientist". The overall code was non-complex, but the Java/Scala code was the worst of my nightmare. Additionally, I think other engineers have also matured enough to understand that underneath the veneer of data science, the fundamentals do not change much.
"AI" (or whatever rebranding it gets) always works in cycles, with a phase of excitement and overpromising followed by a phase of apparent underdelivering and skepticism. But what actually happens is that the innovations just become part of the normal tooling, and stop being called "AI".
At some point there is no need to hire a "data scientist", as any python programmer is already expected to know how to use numpy, pandas, sklearn and keras, just like before it was already expected for them to do any kind of data manipulation with SQL without requiring a dedicated database expert.
The value is a "Data Scientist" isn't that they know how to use a tool - it's that tell know why to use _that_ tool (technique) and not this other one.
I'm definitely not denying the worth of the particular skillset, just like of good DBAs. But as with any skillset, there are diminishing returns. You can get fairly easily someone at the point where they can merge data from multiple sources, create automatic reports with graphs, make simple similarity clustering, regressions and expert systems even if in suboptimal ways and most companies don't really need more than that. They can even learn to integrate cloud/black box solutions for image/speech recognition without having any idea of how to write one from scratch.
Of course, if a company wants to truly innovate in the area it will need PhDs or people with great dedicated knowledge in ML/Statistics/Particular Domain, if it needs to scale it will need good data engineers to create the data pipeline together with DBAs and experts in each tools (like Spark/Flink), but for most companies the basic above is already a great improvement to what they had before.
It wasn't hype, however people got very confused about what they actually needed vs what they thought they wanted.
When someone says they want a "Data Scientist" what they really mean is "I want a Data Scientist who is also a Data Engineer".
I have seen so many companies spend a really decent chunk of money on a data scientist and then are shocked to find that this data scientist doesn't know how to deploy models, set up spark clusters or know how many and what type of GPU they need to use to get the job done.
After all - that is not their purpose.
We were in a similar situation, but what we needed was a Data Engineer - we had a rough idea of where we wanted to go and what we wanted to achieve, he was doing a Masters in Data Science so he had that background as context.
We will look at adding a Data Scientist to our ranks in the future - but they will be working side by side with a Data Engineer who can action their requirements!
The "data scientist" where I recently contracted struggles with generating basic reports. I saw a full page SQL query with the caption "yeah, data science is hard" posted to slack. Terrifying.
I think the term "data science" is often misused. It seems to make management feel like they are on the cutting edge. They were talking about AI and a R&D department the other day. They aren't even making use of simple heuristics yet! I guess that talk helps with fundraising though.
Over the past several months I keep seeing people trying to equate data science with machine learning, and it made me wonder if the people doing this are trying to salvage (or perhaps enhance) the investment they made in data science by trying to blur the lines between the two.
Isn't the line between the two indeed blurry? Maybe deep learning is machine learning, but modern statistical methods such as elastic net, SVM, and random forests are things data scientists should know about.
At my previous company data science has become synonymous with data analysis, to the point that the number of data scientists on staff is starting to outnumber data analysts. I think it's more a sign of a maturing field than anything else.
The more narrow view of data science as big data, models, and machine learning is probably less a thing now, but data analysis overall is only getting bigger.
No it isn’t a fad. Data collection by every man and his dog is really happening. The need for people that can use the data in order to improve business outcomes is the consequence of it.
Data collection will become more prominent IMO because:
1. Data driven business preference, competitive advantage and FOMO. Already dominates sales and marketing. Starting to dominate in product and dev. Already dominates production.
2. IoT, and more data marketplaces resulting from it.
3. Extensions of the global SaaS value chains (usually connected by data).
I would like to hear more about my European fellows w.r.t. how GDPR affected their ability to muster domain knowledge.
I used to work for a small start-up and the CTO was very strict on data access, making my life as feature developer and "data scientist wanna be" almost impossible.
He, on the other hand, had not only access to all data but also used the product as a consumer (which didn't make sense for ICs so we ended just playing with sales demo accounts). I ended leaving the company because of that.
I’ve been in similar situation and it was really hard to be effective in product / high level planning meetings because it is easy to be blindsided. At the time, I was still an youn engineer in a big co, so I just thought it was my lack of technical experience, but in reality it was nothing technical to it but BS politics.
What boils down to is that people who have any extra data access privilege will have the lead.
Most of the insights will come from aggregate data, so I think companies could work around privacy concerns but I am no GDPR expert.
Back in my days in academia, there was a saying “if you have the trace, you have the paper”.
In my professional experience this is a little misguided. A pure PhD statistician isn't going to be able to hack it working on fast-paced production software environments and building end-to-end pipeline/software/ML systems. I mean, no doubt a PhD statistician could learn and be good at it, but the average statistician isn't geared up for this type of work.
On the other hand, your standard tech data scientist may find themselves out of their element if needing to design a very rigorous randomized trial for testing a new drug, and making careful inference (I mean I'm sure plenty could, but I'm not going to trust a 25 year old with two years work experience to do that).
With the cloud came a lot of data, so 'big data' was a wave in which Engineers had to deal with ti, and 'data science' was the wave in which we tried to leverage it.
The reality is that most insight from 'big data' are optimizations. They're not going to move the needle on the business as perhaps we would have hoped.
Data Science focused on ad targeting - now that might move the needled.
And of course, maybe some Data Science working along side AI engineers make a breakthrough which could move the needle.
But from a high level, CEO's view, all of these things have trendy undercurrents, the trick is to figure out how much of it really matters to the business.
The 'wins' for consumers will be slow: maybe better product search, better ads. Maybe they figure out how to send flights around ugly weather or how to slot landing times for an x% decrease in flight delays. Or slot road fixing/lights for an x% decrease in traffic delays.
I work on a data science team that's growing pretty aggressively. I also frequently hear from recruiters hiring at companies big and small for analyst and scientist positions.
I'm not sure what it would mean for days science to be "just hype." I see DS work on this website alone all the time.
It is "different now". The DS bubble popping was covered well by Vicki Boykis:
> Since academia is typically a lagging indicator in adoption to new trends in the work place, it’s been long enough that it’s truly worrying for junior data scientists, all of who are hoping to find data science positions. It can be very hard for someone with a new degree in data science to find a data science position, given how many new people they’re competing with in the market.
It wasn't 'just' hype, but the real problem was that very few companies had any data that you could actually extract much value from. Garbage in Garbage out is as true as ever.
That being said I've observed that the data science techniques and tools that have developed over the past few years have been absorbed and adopted by a lot of people that aren't "data scientist". So while companies are hiring a lot less "data scientists", a lot of "data science" is now done by domain experts and analysts as part of their work.
The best quote I've heard for ML is "ML promises to deliver what Computer Science promised to deliver in the decades before as AGI will promise to deliver in the future"
I think they are evolving to more specific roles. A couple of years ago some "Data Scientists" were actually doing Data Engineering. Now that distinction is more clear. You also have Machine Learning Engineers, who can help with deploying models or you can even see things like "Deep neural network engineer" or NLP data scientists.
It’s been subsumed inside traditional BI teams, who don’t interact with “generalist” developer teams and are an entirely different audience.
So the field keeps growing, but in more specialized forums. Increases in Python adoption, for instance, mostly come from re-skilling initiatives in BI teams inside my customers.
A lot of it was. Like VR, blockchain. Companies didn't know whether they needed this new tool. A lot of them didn't. Like VR, the long tail on this might be huge. It will just be new technologies that weren't possible before, not huge improvements on existing ones like some companies thought.
Two thoughts:
1) Much of this is the bias of this community towards new and more more technical companies. At many of our work places Data Science is just part of the landscape and is now baked in. But there is a wide gap between that and legacy companies where the idea of using these techniques this way isn't new at all. In other words we've been through the initial wave of the hype curve and are now into mainstream growth of the idea.
2) Branding matters. In many companies that I get to interact with (fortune 100s) run-of-the mill analysts exist for ass-covering the non-data-based decisions of higher ups. Post-hoc decision justifying. But DS teams (currently) actually get a seat at the table and get listened to. If for nothing else that makes DS way more effective then just BI or whatever. And as long as we get things right that should turn into a virtuous cycle of being heard, being seen to contribute, and being asked to contribute more.
I work for one of the major bootcamps and I can tell you that interest has significantly declined over the past 1 year. The market for data science has indeed shrunk. The reason? I'm not sure, could be that it didn't live up to the hype.
From my perspective it seems like even 5 years ago data scientists were more about creating and validating ways to look at the data whereas today it is more about processing the data with the standard tools and methodologies. Data processing is as big as ever but takes a fraction of the effort to apply it.
The reason it carries value is the skills are difficult to acquire. I think the recent decline in interest reflects the rise of new data science candidates that are taking the path of least resistance to a career in data science. Rather than pursuing problem solving, people are pursuing "data science" which is a nebulous term in and of itself.