Hacker News new | past | comments | ask | show | jobs | submit login

The popularity of R amazes me. I took a one-week class in R and left with a vow to avoid it at all costs. I have never seen a more confusing, hard to understand, inconsistent software product in my 30 years as an IT professional. It's apparently targeted at scientists and sociologists who are non-programmers. I have no idea how they manage to use it.



Scientific programming is just different. Much of the culture of scientific programming is different and with good reasons not easy to understand.

Its something like how baking cookies at home and running a cookie factory are very different. To a person doing each, the behaviors and priorities of the other seem strange and it’s easy to for one to think “we are both just making cookies, why don’t they do what i do which is obviously superior “

Scientific programmers are solving problems first, not writing programs. They are solving problems in a way that is useful only to them or peers who know a whole lot about the problem being solved. The problem-first tools look strange because they deemphasize the programming niceness in favor of problem niceness.

You find the same sort of confusion when programmers are facing business types and excel usage.

There are certainly times when a piece of code starts needing the programming touch, but the right tool for the job depends on the job.


As someone currently writing scientific programs for scientists, much of the culture of scientific programming is bad and they are having rings run around them by kids who build websites for a living and it's embarassing.

You think web developers are just writing programs for the hell of it? They have operational constraints around their work as well, it's just that there's a rad open online culture of continuous process improvement that's leveling up the tooling and practices that makes what was once hard look easy.

Things scientists using computers can learn from people working in the software industry:

- use a version control oriented workflow

- write extensive tests

- data management, automated data processing/cleaning pipelines, backups

- build generic frameworks

- use continuous integration

- logging and error reporting for long running tasks

- use modern tooling for automating infrastructure and big jobs, rather than manually submitting Slurm jobs via SSH

#notallscientists but the average level of competence is not good and I think the idea that writing software for scientific research is somehow special is backwards and counterproductive.


None of those get you closer to tenure I guess... There used to be research engineers in 'good' labs, friend of mine was. They'd take your software weekly, clean and slice it, help you write tests, connect you with other colleagues or orgs, show you how you could achieve 30x perf when needed, would help you make your shit distributed, would get budget for CI and stuff. That was a dream job for me, for some time, since I'd been mentored by a demi-god of non-destructive-control (radiography, ultrasound...) and radioprotection research that was also a C++-expert sw engineer who had the most beautiful and clean/clear/modular/malleable codebase I've ever seen, and I saw the yuge potential of somewhat even passable sw engineering, especially when I understood his software was used in so many different operational contexts and also advanced research projects without his input, and his colleagues' sw were just toys, toys and more toys.

I don't see that many job posts doing this anymore. And even then it's just not valued that much.


I think the problem is that these academics literally do not understand or appreciate the benefits of having, say, a robust test suite with CI. Software development and DevOps is an alien culture and to be fair a good proportion of software developers themselves don't really get it. It's like a professional lumberjack who regularly sharpens their saw vs a forestry researcher who doesn't realise how slowly they're chopping down trees for samples because their saw is always blunt.


Are you really building generic frameworks and writing extensive tests every time you do some data exploration and analysis in R Studio?


No. You'd be using pre-existing high-level tools for this purpose (in Python land this would be pandas, matplotlib, jupyter etc).

Not all software development should involve writing your own framework, and it's a bit of a trope that junior SWEs try to write DSLs and frameworks where they should just write some throwaway code. That said, if there's a common modelling or simulation method in your field and there's no great open source generic framework for it, then you should take a hard look at what you are all spending your time on (endlessly reimplementing the wheel).


> use modern tooling for automating infrastructure and big jobs, rather than manually submitting Slurm jobs via SSH

Is that going to be robust to whatever new cluster comes along five years down the line, or will it all have to be re-written?


Yes. Rewriting this stuff is much easier when you've already automated it the first time, vs scratching together instututional folklore on how to run a job.


There's no good reason for something being difficult to understand. Especially a product that's been designed to be general purpose.

You give good reasons for why a certain situation exists in scientific communities, but I see no reason why it has to be that way.


Being "difficult to understand" is a perspective of an individual observer, not some fundamental property of the thing itself. Japanese is "difficult to understand" if you only know English, but not if you grew up speaking Japanese.

The various conventions around "accepted" programming paradigms like OO, inheritance, scoping, etc are natural if you use them every day for years, but if you're more interested in optimizing some finances or solving a physics problem, something "primitive" or "messy" like a spreadsheet formula or fortran might actually be more understandable.


I couldn't agree with more OP. I got my graduate degree in Statistics and, after working for several years in such a role, made a similar vow to Never Again™ touch R (or SAS). This effectively forced a career change. (I'm now "officially" a software developer and couldn't be happier.) My distaste for Statistics stems directly from the commonly used tools.

I appreciate your point. In most contexts, such as your comparison between Japanese and English, I agree with it. However, paradigms like OO, inheritance, scoping, etc. are hard-won, intellectual accomplishments; they're not arbitrary. They're purposefully designed to solve specific problems. My experience with R showed it to be rife with problems that have been avoidable for decades. AFAICT, it boils down to the tragic view of "I'm a (statistician|engineer|mathematician) not a coder so I don't need to care". The unfortunate truth is that despite such a view, doing analysis with R makes the analyst a programmer by definition. And so the language, ecosystem, and, consequently, users suffer from half-baked, poorly designed workarounds to problems which have long been solved (and abundantly documented in the software literature). To reduce it to a matter of perspective feels to me like a tautology: it's easy once you get it.

Clearly, I'm triggered by this. I hope I've expressed myself respectfully. My point is, I don't feel it's arbitrary. The design and complexity of R has real consequences in terms of cost and reproducibility.


I use R regularly. I love it. Doing the maths is very easy and general. The programming makes me think differently. I like how it's high level, reasonably fast, rarely involves loops or inline functions directly, and, above all, is the Swiss army chainsaw of statistical analysis. The fact that the journal of statistical software exists is a good thing!


Inheritance was invented for performance reasons [1]. It was not conceived for pure code organization, so in way it is arbitrary as any other solution that could have brought performance gains for the original garbage collector. Inheritance is an "intellectual accomplishment" as other accomplishments, that will incur into issues if applied blindly, so not having it is not an issue per se. On the contrary, today's widely accepted view of inheritance is to rather use composition instead of it [2].

R does have inheritance by the way, not as you will see frequently on "general purpose" programming languages, thing R is not.

[1] http://catern.com/inheritance.html [2] https://books.google.pt/books?id=ka2VUBqHiWkC&pg=PA81&lpg=PA...


I would say that "inheritance" was much developed in Smalltalk, and there it was not really a tool for improving performance. Rather it is a conceptual tool.

"Inheritance" is really the expression of Abstraction in code. Superclass is more general, and thus the abstraction of its subclasses.

Although Smalltalk did not have a specific keyword for "Abstract Class" it used the convention of calling "subclassResponsibility" on methods defined in abstract subperclasses which had to be implemented in subclasses.

Abstraction, generalization, these are conceptual tools. I'm not sure how "composition" models abstraction if at all. Only OOP does.

And one could think that abstraction is a tool valued by scientists. No?


I don’t know why you are saying inheritance was invented for performance reasons? Is there any evidence behind that?

Everything I was taught when OOP became mainstream in the late 80s, was that it was a performance trade off to afford code organization insight, while preserving encapsulation.

This statement about “performance reasons” is quite baffling.


> Being "difficult to understand" is a perspective of an individual observer, not some fundamental property of the thing itself.

I disagree. There are inherent qualities associated with systems that happen to be easier to understand. If you or I were designing such a system, and wanted to ensure this system imbued similar qualities — I believe it would be possible to do so.

> Japanese is "difficult to understand" if you only know English, but not if you grew up speaking Japanese.

Needing a priori knowledge to be able to understand something, is different.


> I disagree. There are inherent qualities associated with systems that happen to be easier to understand. If you or I were designing such a system, and wanted to ensure this system imbued similar qualities — I believe it would be possible to do so.

Can you formalize these inherent qualities? Until you can, it's hard to prove that a quality is or is not essential.


> Can you formalize these inherent qualities? Until you can, it's hard to prove that a quality is or is not essential.

There is an obvious one, though it's tricky to fully formalize and quantify: locality (also known as "coupling"). You ask yourself a question, "if I were to make a small change to a single piece of this program, how much of the program would I have to change/retest/keep in mind?". If the answer is, "typically, just the area around the change", it's a high locality (loosely coupled) program, a good design. If the answer is, "usually, most/all of it", then it's low locality (tight coupling), a bad design.

The reason is, of course, that human mental capacity is limited and in general, number of interactions grows superlinearly with the amount of interacting components. So this here is an objective measure of "easy/hard to understand": the more code you have to keep track of when investigating a random piece of the program, the more mental effort you have to exert - the more difficult to understand it is.


Of course difficult and easy exist. It’s hard to run a marathon it’s easy to eat popcorn. It’s hard to learn Japanese it’s easy to watch tv


I'm wondering what your opinion is of this talk: https://www.youtube.com/watch?v=7jiPeIFXb6U

It has some comments about why good software engineering leads to good software, how good software engineering is hard right now because there's a lot of bad examples which cause people to make more bad code, and some suggestions on how things could be done.


Love this point


It's no more difficult to understand for researchers coming at a programming language for the first time, and they need to learn significantly less about the language to get up & running with analysis. Two lines of code will get me data loaded and run a regression analysis & plot the results.

Especially consider that much of the audience for R come from proprietary and extremely expensive languages like SAS or SPSS Syntax, neither of which are significantly less difficult to work with but significantly more limited.

Something more intuitive would be nice, but I'm not aware of another option that has the depth of libraries for practical every type of data analysis that can also begin getting a researcher meaningful results in few lines of code than they have fingers on their hands. (Assuming there was no catastrophic accident resulting the loss of fingers.)


R is easy to understand if you use it as intended.


Having worked a bit with candy making and baking from home sized to restaurant sized to a regional factory, I'll say this analogy doesn't make sense.

Almost all the progression in that industry is relatively straight forward and would make sense to the lay person.


I sometimes quip that the difference between cooking and process engineering is that unlike the former, the latter actually gives a shit about the quality of outcome.

The issue isn't just specialized knowledge - knowing the principles underlying the process, knowing that there are principles - but also access to tooling. I'm sure plenty of lay cooks would appreciate appliances that let them be more consistent, even without knowing any related formal theory - but these are not available. As it is, an oven that's not shit is a major capital investment. It only makes sense to get one if you're more likely to call your work output a "formulated product" than "cookies".

However, software development is a peculiar occupation - one in which the best tools are free. It's like being able to buy a production line for less than a consumer-grade stove. Access to quality tools is not a limiting factor. Knowledge and awareness is.

Of course, non-programmers that code have more interesting things to do than to study software engineering. Which is why it's doubly important to make sure tools that are being popularized aren't shit. On the contrary, tooling is the perfect place to encode good practices and principles, so that even most lay (programming-wise) users succeed by default.


> Scientific programmers are solving problems first, not writing programs.

And they don't value writing programs. Or Software Engineers.

That makes productizing some research interesting. That also makes trying to get the same result as something published by your own lab a year ago an uphill battle: "it worked on John's old laptop, the huge Alienware. Never worked on any of our machine".


My description of R: It makes hard things easy and easy things hard. Ergonomically it is the absolute worst "programming language" I've touched in my life. However somehow it managed to become the official language of statistics research and has packages to do any type of analysis you can dream of.

I think the reason you and I dislike R is because we just work differently than non-programmers. Non programmers think in purely imperative, straight-forward semantics. They write one-off unmaintainable code tying together libraries that solves their immediate problem. Programmers try and write R code as if it was a proper programming language and immediately run into walls. Non-programmers never see the walls because they don't even know there's another way.


R > Matlab. But both suck as programming languages.

But that's because neither R nor Matlab are primarily programming languages. They're primarily mathematical exploration tools.


Right. Julia is looking fairly promising as a "real" programming language that is still an excellent Matlab replacement, and possibly to a lesser extent an R replacement, and it does show that there is nothing about filling the exploratory math programming niche that requires it to have the warts that the incumbents have.

Fundamentally, R and Matlab are probably best comparable to Perl in terms of how they got popular. While they are rather ugly, they were the first to provide a simple solution to a few specific problems that a lot of people had, which snowballed them into popularity and network effect advantages


The fact that Julia requires compilation seems like such a huge hurdle to adoption in light of the above. When the alternatives are a REPL and the users aren't programmers, to use Julia they have to first learn the difference between code and compiled code.


Julia is JIT compiled and most people use it through a REPL. For a beginner there's hardly a difference in the experience.


No, there is a difference. My day-to-day use case is: start the REPL, start typing some code, wanna see a graph. I might literally only want to write a single command. If that takes a minute to compile, it's a problem. Julia is pushing hard to address this and I'm really interested to see where it goes - I tried it for the first time recently and saw a lot to like.


Project Jupyter's name is a reference to the three core programming languages supported by Jupyter, which are Julia, Python and R

https://en.wikipedia.org/wiki/Project_Jupyter


Right, compilation time went down a lot with 1.6 by avoiding method invalidations during compilation. The other thing they did is ensure that the package manager properly compiles all dependencies in topologically sorted order when you install a package, instead of doing a lazy compile on package load where it would often waste time recompiling the same method.


Julia isn’t compiled tho it has a REPL?


One isn't precluding the other. For example, SBCL is an implementation of Common Lisp that compiles everything ahead-of-time down to native code, and like all Common Lisp implementations, it offers a powerful REPL suited for interactive development. The compiler has low overhead, so you don't even notice that it compiles everything you type into your REPL.

Another common misconception is that AOT compilers can only be used to build libraries/executables, which then get executed. Again, SBCL is an example of an AOT compiler that works at function-level granularity[0], in accordance with Lisp heritage of image-based development, where you write your program by starting it, and then adding/removing/modifying code into a running instance. SBCL achieves this by having the compiler being a part of your application runtime.

I don't know how Julia is implemented internally, but since it's essentially a Lisp clothed in syntax that's more pleasant to the masses, it inherits a lot of Lisp family heritage. That could easily include a fast, to-native, in-process AOT compiler.

--

[0] - Or maybe even lower.


Julia's JIT is a just ahead of time method jit just like SBCL, and the language is designed around making idiomatic code as fast as possible with that approach.

So every function is a multimethod that gets compiled the first time any combination of types is first encountered, and at that point the type information is propagated to all function calls inside the method body, to eliminate dynamic dispatch on those & possibly inline the appropriate method. Julia has parametric polymorphism as well, so it's common for all types in the body to be inferable.

As a result, Julia code tends to make much more heavy use of polymorphism & multimethods than common lisp (where CLOS has a significant runtime cost instead of being a near-zero cost abstraction), since the language and the runtime were designed carefully together to make that fast.


A quick look an Julia’s get started guide shows it does not require compilation...

https://docs.julialang.org/en/v1/manual/getting-started/


Luckily, Python >> R > Matlab, and the fight between R and Python is core to data science. Last I did any real investigation, R still had the edge in highly-theoretical and new techniques, for whatever reason it was much more used by academics, but Python was eating R's lunch for non-academic data science.


I've given SciPy a try, but its performance is grossly lower than R and Matlab in my experience. SciPy also seems to be missing plenty of things. You get pretty far (basic filter design seems easier in SciPy than Matlab), but Matlab has so many different mathematical fields in it that Matlab vs Python are just incomparable in terms of mathematical usability. Matlab has way more features.

R is statistical-slant, so you have all sorts of distributions and statistical features.

Matlab is general mathematics (but especially matricies). So you can grab say... Galois Fields between 2^2 through 2^16 and just start working with them immediately.

Matlab itself is decently replicated by Octave (which is a great project), but it should be noted that Octave aims to be largely compatible with Matlab (so all of my complaints about how awful that programming language still applies).


> somehow it managed to become the official language of statistics research

There's zero mystery to this. The intended audience is people that want to get stuff done. Professional software developers commenting on R are like this: "He's such a good salesman. He does everything the right way. He dresses right. He talks right. He has the best smile of any salesman I've ever seen. He fills out his reports on time. Granted, he doesn't make many sales, but why would you hire one of those other guys over someone that's perfect?"


Mmmm more like like they try to put the salesman in an operations role and wonder why the company has broken down.

If you tried to use R for an actual engineering project you'd wind up with a system that is always on fire. Where as these one off analysis it's fine because it's just a one off.

We want to get stuff done too, but we need it to hold up for more than a few hours so that's the mode we think in.


Every HN discussion: I do real programming. Everyone else works on toy problems.


It's all real programming and it has all the real consequences, but there is a distinction between something that needs to live for a day, a few months, maybe years and then maybe have multiple people work on it, have it communicate with other systems or not and the level of investment and structure you need to put into a system to make sure that's possible.

Programming has a very low barrier to entry and that is incredibly powerful but there is a big gap between someone writing one off scripts to solve immediate problems and people writing highly connected systems that withstand scrutiny.

I've always personally been for accreditation for software engineers much like pretty much every other engineering discipline.


He wasn't calling that one-off use case a toy, just that it's a different world than most of the people here on HN function in.


they used to split stuff like R off into the "scripting" category. But now most "real programmers" are apparently doing python or javascript so that little hierarchy quietly disappeared.


FWIW, R started to make sense to me (as a programmer) once I read Advanced R by Wickham.


I have a colleague that use R as a general purpose language (I'm in science), and it's horrible. He runs into problems all the time and usually the answer is "more RAM".

For the things it's good at, it's great. For everything else I avoid it like the plague. More often I find it easier to use a quick Python script to generate a data table that I can then read into R to perform whatever stats or plots I need. It's almost always faster than if I had just run everything in R to begin with.

But I think your description is spot-on. Non programmers just want to get something done and if it works in R, then great. For those of us that think in terms of software engineering, good practices can be difficult in R.

The vanilla-R vs Tidyverse split has made this all worse too. These are two completely separate dialects of R that while still the same language, are completely different.

It's like the R folks took the "there's more than one way to do it" mentality from perl and said "challenge accepted!".


I agree overall. In my particular case, my employer won't let me have Python for fear I will do things with it. But it is perfectly ok for me to RStudio with 2 million useful extensions that will do similar things. For better or worse, sometimes the tools we use are the tools we are left with.

And I am saying this as a recent RStudio convert. I love how easy it makes some otherwise hard things. I hate the sheer amount of hoops you have to jump through to make some stuff work.

But it works and I am definitely not an engineer.


Just like any other programming language, it DOES take a good programmer to make a decent library in R that people can use on their own problems.

R has had a very long evolution. It is a very different beast today in its most common usage, than it was the earlier days 10, 20+ years ago. Even the Tidyverse has some libraries that are very much crafted with a programmer mindset like purrr and tidyr. These tools are decidedly non-imperative and not straight-forward in their semantics.

What makes R difficult for experienced programmers, I think, is the inconsistency of paradigms that are the result of its long history. This complicates how one writes library code.

There is, however, a "sweet-spot" for R and that would be as a "notebook" based programming language much like Mathematica, Matlab, and Julia. Which one you like, I guess, depends on your taste, your own history, and the killer libraries you want to use.

Whenever I have to describe what R is all about to excel jockeys at work, I just say it's "excel on sterioids". I think that's fair (albeit reductive) description. To be honest, I probably would have never learned R if Julia had existed when I started picking up R. I think I would have preferred a more ahistorical language with less "baggage" than R. But it's always worked out for me, so I am sticking to it at least for now.


> It makes hard things easy and easy things hard.

People also say it for k8s. And it kind of explains why k8s is creating so many jobs.


> And it kind of explains why k8s is creating so many jobs.

Is it 'creating' jobs? I think it's merely making it easier to specialize.

A few problems are unique because of container usage, but by and large K8s is trying to do what's otherwise a difficult job. Try assembling your own distributed container system with scheduling and whatnot and see if you can build something easier to understand or that works better. Maybe you can, but there's inherent complexity.

The criticism of K8s should really be criticism of indiscriminate container usage and the attempts to ship a company's organization chart as microservices. Many applications should really be monoliths and would work better that way. Some should be split on different "services" (not _micro_) along obvious interface points. Just a minority should be architected as microservices from the get go. Distributed systems are _hard_


I imagine that you could get rid of kubernetes in 90% of the projects it is used in. We have it at my work, must have taken around six months to a year for on dev. Sure we can autoscale now, but we never actually need to. It saves us a bit in server costs, but more in maintenance / dev time.


What's the previous system you had which required less maintenance than k8s?


One fairly big machine.


I'd generally argue that, if you're trying to do easy things with Kubernetes, there's a good chance you're using the wrong tool.


> R: It makes hard things easy and easy things hard

Maybe you'd like this post: http://bioinfomofo.blogspot.com/2014/01/r-hard-things-are-ea...


I was a mathematics major in college, and didn't have much training in programming when I graduated. R was the first language I learned when I started my career as an actuary, and it was a breeze. Things “just work”. Want to add 2 vectors of different dimensions together? R knows what you’re getting at, and makes it work. Comparatively, learning Python was harder.

Now that I’m used to both languages, I find it funny how much R is hated by “true” programmers.


This is the key. R shouldn't be seen as a general programming language, but a domain specific language that's still open-ended. I started with SAS in my job, which was fine for statistics and handling tables. But anything beyond that, even supposedly simple things like reusing code or listing all files in a folder, was not simple. With R, it was.

R only had to be ergonomically better than the competition, and they weren't very good.


What are you "getting at" by adding two vectors of different dimensions? It's not obvious to me.

Off-by-one dimensionality errors are so common in programming. If the language does something like zero-extending instead of raising an error, it will lead to an "it runs but gives the wrong answer" bug. These are much more painful in numerical code than in logic-based code.


For instance, if you wanted to add 1 to every other element in the vector (1,2,3,4,5,6). To someone who has no programming experience, this may be a daunting task. But simply doing (1,0) + (1,2,3,4,5,6) works in R.


The only problem with this is that I don't personally find it obvious/intuitive that (1,0) + (1,2,3,4,5,6) = (2,2,4,4,6,6).

In fact, I would probably expect the output to be (1,0,1,2,3,4,5,6).


Indeed, no mathematician or statistician would think of `+` as concatenation. List concatenation isn't a relevant problem to most mathematicians. This is where the above comments about prior knowledge and context come into play. Likewise, when I multiply two vectors together, or multiply a scalar with a vector, I have a definite idea what I "want" out of that multiplication. For many programmers, they think of this more as an exercise in data-types than an expression of linear algebra.


Statisticians might see it differently, but I still don’t think it’s obvious that (1,0) represents effectively a repeating pattern that’s extended to the size of the other vector.

For instance, for the statement (a,b,c,d,e) + (a,b,c,d,e,f) are we really all saying that it’s clear, obvious and unambiguous that this means (a+a, b+b, c+c, d+d, e+e, f+a).

Personally if you showed me that and asked me to describe what would happen before reading this thread, I would have said it would either throw an error, concatenate the vectors or only add the matching elements. I wouldn’t personally guess that the first vector would be treated as a repeating sequence, but different people might make different assumptions - I just don’t think it’s particularly clear, and I struggle to believe it’s clear, unambiguous and obvious to statisticians too.


> I would have said it would either throw an error, concatenate the vectors or only add the matching elements

Heh this itself is common evidence of an (experienced) programmer's way of thinking. Remember, when dealing with mathematics, there's no machine (or runtime or similar abstraction) there lurking in the background, enforcing conditions, or even lending a physical reality. Operations in mathematics are defined. Notation in mathematics is just an operation encoded as an "arbitrary" symbol. Which leads to

> I just don’t think it’s particularly clear, and I struggle to believe it’s clear, unambiguous and obvious to statisticians too.

This happens to be convention in both a subset of textbooks and many programming environments. It's mostly an artifact of the notation.

If you want to explore what math notation "feels like" a bit more without learning math (which I _wholeheartedly_ recommend as it's incredibly useful), try out the APL programming language a bit. It evokes a similar atmosphere of notation conveying the idea of well-defined operations.


I guess that's the difference between people who prefer R vs. people who use more traditional languages. '+' means addition in the scientific world, if I'm trying to figure out how to add numbers the first thing I try is '+'. To me, those vectors are just that - vectors of numbers, not instances of a class (even though that is kind of true under the hood). I have a problem I want to solve, and R does a good job of giving me what I want.


In what branch of maths or science is (1,0) + (1,2,3,4,5,6) even defined? That operation shouldn't make sense to anyone; it is adding vectors of different dimensions.

The result should probably be vector promotion then (2,2,3,4,5,6). (2,2,4,4,6,6) is not a good answer.


Sure we can argue about the syntax and what ‘+’ should do but the point is that many people find the build in behavior of R to be intuitive and easy to learn, myself included.


How can something be intuitive if it maps to a totally arbitrary abstract concept? Does that operation even have a name?

That is the plus symbol and it is in the context of two vectors. The intuitive thing to do is to fail with an error if it is a mathematics operation, concatenate if it is a programming context or treat the two vectors as the same length by appending 0s to the shorter one if the goal is to be unhelpfully helpful.

Repeating a vector until the dimensions match then element-wise adding them may be convenient for you, and you may like it. Maybe even lots of people like it. But it is a tough sell for me to believe it is intuitive.


> Does that operation even have a name.

numpy calls it 'broadcasting', although this particular operation doesn't work. For example, this is valid in numpy:

`np.array([1,2]) > np.array([[1],[2]])`

The idea is that an attempt is made to handle the operation even if the arrays / matrices are incompatible. Extending the operations of matrices and vectors in this way allows for extremely concise operations that would otherwise be a pain, like in the original R example.


Pretty sure it isn't broadcasting, because

  > c(1, 0) + c(1, 2, 3, 4, 5, 6)
  [1] 2 2 4 4 6 6
succeeds but in numpy

  >>> numpy.array([1,0]) + numpy.array([1, 2, 3, 4, 5, 6])
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  ValueError: operands could not be broadcast together with shapes (2,) (6,)
ie, numpy correctly recognises that totally incompatible dimensions can't be added.


Both languages allow for incompatible arrays to be operated on in this manner though, just because it fails in numpy just means that numpy hasn't implemented that type of broadcasting. For instance, this operation:

np.array([1,2]) + np.array([[1],[2]])

is not broadly defined and understood in pure mathematics. Looking at it, it's not immediately obvious what it would do. You are adding a (1,2) matrix to a (2,1) matrix. It turns out, numpy extends the rules of matrix multiplication to matrix addition through the rule of broadcasting.

R is just doing the same thing, in a different way. These types of implementations are common in scientific computing because it creates easy ways to do these operations, programming robustness and rigidity be damned.


Have you spent time with the community? The community is fantastic.

Also, the cheat sheets put out by the RStudio team are the best programming language cheat sheets I've seen for any language: https://www.rstudio.com/resources/cheatsheets/

I don't do R much anymore (at the end of the day I personally have the freedom to start fresh so I choose that over dealing with technical debt in the R language itself), but the R Studio product, team, and R community I found fantastic.


R got a simplicity boost with the "tidyverse", R studio and ggplot, all driven by Hadley Wickham. at its core, R is a very straightforward language. however it never had a benevolent dictator who gave it consistency, elegance and style. Hadley is compensating that a bit.


A lot can be said in favor of R, but "straightforward" is a debatable description. The semantics of R were never really designed, and were only recently "discovered", post hoc. See "Evaluating the Design of the R Language" (http://janvitek.org/pubs/ecoop12.pdf).


I'd argue that tidyverse is entirely nonconsistent with the rest of R, though. At least base R packages all operate in an "R-way," so learning this syntax helps you with other packages that others try to write in an "R-way," while tidyverse only operates in a tidyverse way that you can't take your syntax knowledge with you to other packages.

I'd say the learning curve for making a sexy plot is a lot shorter with tidyverse, but overall, relying on it handicaps you versus spending the half hour longer to do the same thing with base graphics (or a base-like package).


> I'd argue that tidyverse is entirely nonconsistent with the rest of R

Well, yes, but that is part of the reason why it is so popular. R may be written by world-leading language design experts - but it doesn't show!

> ...relying on it handicaps you versus spending...

There is almost no reason to drop out of tidyverse if the problem domain is visualising data. Most people could go an entire career as an analyst using just ggplot + tidyverse.

If something needs to be plotted and ggplot won't work it probably makes sense to drop out of R and go straight to TikZ or OpenGL.


I think these tools hurt beginners a lot, personally. You have to learn R no matter what you use to plot, but if you want to use tidyverse you have to now learn that too, and the logic you learn there might not transfer over to the rest of R that you are gonna have to learn no matter what, and can lead to confusion. It doesn't help that stackoverflow uses R or tidyverse code interchangably and its up to the beginner to spend time figuring out what is what. I think it even adds to the notion I see, even here on HN in this thread, that R is some awkward unlearnable monstrous thing that has no place. It is that, when you start adding all these different packages and syntax conventions without thinking if your solution can be implemented trivially with base R. I find base R pretty similar to python with small caveats like you don't use loops and you apply functions instead.


I completely agree, in opposition. This is all true, and makes strong case for deprecating base R and replacing big chunks of the language with tidyverse equivalents. I would certainly encourage the people maintaining the language to reflect on your points with that in mind as an option.


> tikz, OpenGL

or D3.js


I think it's somewhat more complex than that. I think tidyverse-R, which is a quasiseparate language, is only simple with complete buy in, and involves a lot of magic, shorthand, and "These are the symbols I put into the machine to get X back out".


I am primarily a python programmer, but I sometimes use R.

You are either going to use a programming language (or library, etc.) made by a programmer pretending they know about statistics, or a statistician pretending they know about programming. Oftentimes, as a programmer, the right choice is the former, but not uncommonly (because statistics is even less intuitive than programming), you really really need to know that the statistics have been done right. If someone has ported the relevant code from R to python, great. If not, bit the bullet and use R, it's where the statisticians hang out.

You know, I bet statisticians don't think any more kindly of how programmers make stuff. Our use of the '=' sign, for example. We're just used to that kind of thing, so it doesn't look like a problem to us.


R programming is fundamentally different at a conceptual level. You are operating on datasets rather than individual values. Also the GUI mechanism use reactive programming if you are using R Shiny. R is awesome for what it is designed for.


Yeah, its not so hard to groc. Its just a data driven style all the way down. You get pros and cons. Its great at working on data sets.

That said, I feel like correctness should be given a higher priority in scientific computing and yet a dynamically typed, lazily evaluated language is used.


Other than using apply functions instead of loops, coding R is a lot like coding python only you get a lot more of the data science python package functionality already baked into base R. The syntax differences are slight enough where it's pretty easy to move between the two (or find relevant stackoverflow answers instantly to common annoyances). R generally inputs your data and outputs your statistical test results in less code with less headscratching than doing the same in python in my experience. I prefer plotting in R as well.


I've written some R both for small interactive scripts and running in production, it wasn't the first language I learned - R gets some things done very well; it also has some idiosyncracies, there is stuff that is clearly patched up together and exists for backwards compatibility, and there are many ways to do the same thing in R. If you don't expect it to be perfect, it gets the job done - nothing to write home about and certainly not a language that you should avoid at all costs.


It does it well, if there is a package out there for what you are trying to accomplish and hope that the package works for your use case..


There's a difference, in my mind, between "Programmers" and "Invokers of Code".

R is a terrible programming language.

It's not a bad language for invoking code, because for many of those people, they're not taught the concepts behind any language, so it's all semi-arbitrary symbols.

model <- lm(outcome ~ variable1 + variable2 + variable3, data=data)

summary(model)

Isn't any more complex than anything else. And what R does have is a network effect - at this point, for almost any statistical task I've ever encountered, there's R code for it.


They can use it because they don't come with preconceived notions of how programming normally works. And it let's them do powerful analysis without much boilerplate code along with a library of packages for an enormous range of analytical methods and visualization, all without having to do much in the way if boilerplate coding.

Sure the syntax is going to seem alien to them, but so would any first encounter with a programming language.

The use case for R simply isn't a traditional programmer. That isn't the target user. Sure if you need an application that might need significant scale you're not going to use R Shiny, but a lot of R work are one-off bespoke analysis projects. Models that do need to be deployed for use at scale in an application take their output parameters from R models and simply implement them in the app. I do this myself, taking coefficients etc, implement a function call in a database and then use the results on the front end.


I'm a professional programmer and I don't find R hard to understand. Have a look at R for Data Science[0], maybe you'll see why scientists and statisticians find it easy for their analysis and visualizations (and conversely find Pandas+Python very complex).

[0] https://r4ds.had.co.nz


That's are my thoughts, mostly, but in the end libraries are the killer feature of successful programming languages. I have got used to it and I'm more proficient now at data analysis tasks using R than Python/pandas thanks to tidyverse+ggplot.

The object system mess though... I have no words.


it seems to me that the value is in hard-compressed optimized functions and to this crowd this is the most important factor

software engineering practices don't exist unless you have a gigantic program, programming language theory is either not interesting or too foreign for them

about how they manage.. it's easy, they get used to it


Machine learning is ripe with similar examples (tensorflow in primis).


R was the supposedly the "FOSS" replacement for SAS, and Matlab to a degree. I had to support it and bioconductor.

Now, most people just use Python.


Why the scare "quotes"?


It's selling point is it's not matlab. It's downfall is it's not python. The python scientific compute libraries have they're problems but people are using them and math people "get them". I think they have an awful design/api from a programming perspective but most people don't care. Matlab is similarly a strange language but all of it's libraries/tools make what scientists do easy. "Click a button and your code can now run on a compute cluster".

R is very popular with stats people but new PhD candidates are beginning to write python implementations of R things (sort of how like DataFrames/pandas happened).


"R is very popular with stats people but new PhD candidates are beginning to write python implementations of R things (sort of how like DataFrames/pandas happened)."

People have been saying this since I was an undergraduate.

I'm submitting my tenure packet this year.


> People have been saying this since I was an undergraduate.

> I'm submitting my tenure packet this year.

That doesn't mean it isn't happening. If anything, the fact that the rate of adoption is slow over a long period of time would tell me that there's more stating power here. If suddenly 20% used python that would be a warning sign. If there was .5% year over year for decades.... That's different.


I'm just saying I have spent my entire career in "The year Python unseats R."

and also "The year R unseats Python."


> R dropped from 8th place in January 2018 to become the 20th most popular language... At its peak in January 2018, R had a popularity rating of about 2.6%. But today it’s down to 0.8%, according to the TIOBE index.

https://www.datanami.com/2019/08/15/is-python-strangling-r-t...


I'd suggest the TIOBE index really isn't the proper metric to be using.


A similar trend is on Github

https://madnight.github.io/githut/#/pull_requests/2014/1

R has halved in percentage since 2018




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: