Hacker News new | past | comments | ask | show | jobs | submit login

My thoughts (a little of a Rant) on R as the Lead Engineer at a data science focused company is that R is a great statistical language, but a poor programming language. I use the term programming language as a language which is very versatile for a variety of needs (web app, commandline app) such as python, ruby, etc. R has the capabilities to use as a programming language like a climbing rope can be used as a belt. It can, but shouldn't because of some points I have below.

It is great for exploratory analysis, as it is forgiving and easy to use in the console for testing things; but once it needs to be put into practice, it has issues. For a non-programmer, grasping R isn't too hard thanks to some great developers in the community.

There is a lot of good in the R community, but people are focused on making it isn't. Just look at deploying R into production, that can be a nightmare. I've spent days looking over code to figure out where an error in production lies. One of the errors was a package of a package which was updated for the first time in years. That package depended on another package which my package called another function that called the first one; basically it was a mess of dependencies. And there are some misconceptions, while doing the engineering work in R and learning I learned not to use for loops. Then one day I timed it and the for loop was 10x+ faster than any apply/plyr function including using a gpu.

The things that separate a programming language from a statistical language are a programming language have more than one of these:

* Good dependency management

* Easy deployment into production environment

* A clear way to setup environment (e.g. naming, folder conventions)

* Ability to do most of the things you want with the base packages

* Good documentation about the above.

Basically, I believe a good data scientist is someone who can use R (or something else) to explore data and then create the algorithm in a compiled language to be put in production. And for someone who just needs to create analysis for research or a paper, R is the perfect use case. R is an excellent language for its use cases, just don't think about using it for general programming. It has caused a lot of extra dev hours working on issues with it.

Little plug, we wrote a piece on hiring data scientists.[0]

[0]: https://gastrograph.com/blogs/gastronexus/interviewing-data-...




To contrast this - I'm the lead data scientist at the same company, and head over heels in love with R....

It is the only language I can quickly and efficiently jump from algebraic topology for novel pre-processing, straight into model building and validation - with just about every potential variation of every major algorithm freely available and packaged on a well curated package manager (CRAN), and then ensemble them.

I _agree_ that it's a bit difficult to use in production, and that dependency management needs work (Packrat is trying to do that), and that blindly trusting packages on CRAN can cause errors - but 98% of the time - it just works. Graphics, models, crazy niche things that are currently only used by one post-doc locked away in a top secret research lab... it all just works.

Of course, take this with a grain of salt: this is coming from a guy who's built web-servers (HTTP responses and all) in R.


R's server sockets can't select(), so unless you've reimplemented that as well, don't count on handling concurrent requests with that web server.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: