Hacker News new | past | comments | ask | show | jobs | submit login

My own personal rant, I think the specific feeling I get is the conceptual idea of R has long since outpaced the reality of R.

People like to fetishize data, and R sure lets you do that. The data science landscape however is growing such that R is really just a one-trick pony, however, that one trick is for better or worse being the gold standard of statistics and modeling, somehow.

But everything else wants to sugar coat the software surrounding the statistics, and leaves you no room to grow.

This is a very bad over-simplified example, but you sort of can't learn much about graphic design or good communication skills by using ggplot2 ... you can make something look very very nice, hopefully, in the general case, sure. And you can definitely do all kinds of hacks and crazy code to make it do whatever you want, but by doing that you produce ever more fragile and environment dependent code. You'd be better off learning just about anything else for graphics (Straight SVG, D3, Processing, Cairo directly, etc) because it is of course a bit more of a problem starting up, but a generalized skill set that could allow you to grow.

You also learn pretty much nothing about web development from Shiny. Shiny is a wonderful idea, but ultimately prevents a statistician from implementing what it promises, which is an analytic application. At some point, you have to ditch it and learn more traditional web stacks. It is also something of a sales funnel into a server solution that's a DDOS or security nightmare just waiting to happen.

So instead of just griping, I guess I have some ideas... it would be nice to have a Ruby/JS/Java/Python service generator. It would be nice to have a D3/React/whatver based generator. It would be nice for there to be a data munging solution (or even whole models, more like more PMML type stuff) that can be generalized into something that could be compiled or generates Python/Java/Bash/JS/Whatever code.

Ultimately you start thinking along those lines, and you realize that the promises R is making about empowering the analyst are just teasing them rather than helping.

R could do with less magic and more concentration on being simply a great statistics engine that integrates better. I guess it is that to some degree, but it sure fails the rest of the technology world that tries to live with it.




I disagree on multiple fronts!

1) ggplot does exactly what it is supposed to do: create data visualizations. It made no promises for interactivity or display, and in fact, it was originally designed for creating publication quality charts, which it continues to do well.

1.5) ggvis is a D3 API wrapper on ggplot and allows for interactive graphics. Do you want to pay your data scientists for creating production ready graphics or let them focus on what they're best at?

2) R has been growing - outside of neural networks (which R needs to catch up on), R gets almost every pre-processing and modeling algorithm first, and distributes it for free. Furthermore, it has better sampling options, metric options, augmentation options, and model ensabling tools (stack or meta-model) VS any other language or framework - it is the gold standard.

3) I don't think there's any "magic" in R. It's just a language with a learning curve and lack of opinions.

4) Last point: R is really not built for the web (it's older than Python!) - its built for data science. There's no reason you need to run your modeling stack in the same language as your application server. R is perfectly capable of writing to databases or sending API responces in JSON or PMML.

/endrant

Not trying to start a flame-war - but this type of difference in opinion is important to see when thinking about hiring data scientists or deploying models.


I sincerely appreciate your thorough and well reasoned response, and thank you for taking the time with it.

1) I agree, perhaps I was trying to allude to visualizations being more than charts. ggplot's charts are absolutely gorgeous and simpler to make than even I remember them being in Lotus 123 for DOS. However, there are some things in the periodic table of visualizations (http://www.visual-literacy.org/periodic_table/periodic_table...) that it can't do. And what about hybrid combinations in the same chart? Could I have a bar chart where the bars are also mini-spectrograms? I can instantly think of how to do this in SVG or Processing, but I'm not sure where to begin with ggplot ... maybe it is possible. Of course why would you?

1.5) I guess I don't want to pay someone else to do the custom thing in D3 that the statistician can almost do with their code, or try to get a regular web stack developer familiar with D3 to actually get it working the way the statistician says.

2) Yup, totally agree!

3) I think Shiny takes a lot of liberties and makes a lot of assumptions that users of it can't even express to me are important to them because using Shiny completely hides the underlying concepts of how it is implemented. I guess I would definitely call that magic. You're right there's quite a lack of magic in most of the language and packages, however.

4) I think data science can/should/does embrace the web. I think the modeling stack shouldn't be on the application server, but a trained model perhaps should be? I also wouldn't trust the stability of R for performance critical API calls without a lot of redundant instances and a lot of load balancing.

Anyway... the real problem is that you're also absolutely right. There is quite a bit difference in opinion between the tooling of an analysis effort, and the robustness expected by IT.

Thanks again!


I'd divide my work into three categories - Exploratory plots just for me. - Plots that I'm going to show to my boss or coworkers. - Things we're going to distribute to the whole world.

Plots in category #1 are often quick and dirty--I just want to see if an idea worked and don't really care about communicating that idea cleanly.

I could show these plots internally, but it often helps to clean them up a bit first. This avoids us getting bogged down in whether we should be comparing the red/blue lines here or the circle/diamond points there.

This is where ggplot shines--I can go from #1 to #2 with minimal effort. The final version usually still needs some tweaking, but only a small fraction of the plots ever get this far and some of this customization really needs a human in the loop (e.g., in Illustrator or something).

Similarly, while you can use Shiny as final product, it's actually great for letting moderate-sized groups play "what-if" with the data. It's certainly easier than sending them a huge powerpoint deck with "choose your own adventure" style instructions.


Except that you miss a very important point - nobody cares about the stuff you listed. I build models and use shiny to create a front end for clients to interact with them. They are very happy and pay me very handsomely. I can assure you that this is the case across the board. R is for analysts, not for programmers. It seems like programmers feel intimidated, because analysts now code their solutions themselves.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: