Apart from all the other considerations and problems with various types of package management, consider this:
"Update January 6th, 2021 A reader alerted me to a bug with the current groundhog (version 1.1.0) where you cannot set the groundhog library to be a folder containing spaces in the name."
So we are talking about software here that somehow made it to version 1.1 *without anyone ever using a directory with spaces in it with it". This can be interpreted in two ways: either very few people have spaces in their paths, or very few people have actually ever even tried (not even really used, I'm only talking about the most basic trial use) this package. I'm not a betting man, but if I were, I know where I'd put my money...
As a Linux user I can relate to that. I always avoid spaces in folders and filenames as they make it more annoying to manipulate them using command line tools. Years later I carried this habit to whatever OS I am using.
Best way to know where every bit of code is: put it all in one source file.
Sarcasm aside, I've worked with codebases like that- thousand-line java methods and classes and the like. The problem is that there's nothing that really forces modularity on a codebase. There isn't even any consensus, objective way to modularise code. Otherwise, a machine could do it and we wouldn't have this kind of problem. But, a machine cannot, and so we do.
My guess is people are encountering the situation, working around it and calling a day. Maybe a little note here and there but, I don't think someone would report it due to a couple of reasons.
First of all, I don't think people report this type of stuff because they don't know how to report it, and secondly think it doesn't need to support this use case anyway since space is a latecomer to naming and path game.
Don't remember the source and probably misquoting, but I like this truism: there's software that people complain about and software that nobody is using.
The original quote is from Bjarne Stroustrup, the creator of C++. The quote also doesn't apply here. (You can't just use it to excuse any problem with software that you come across). The author of the article and the library in it just seems out of their depth in many ways.
> So we are talking about software here that somehow made it to version 1.1 without anyone ever using a directory with spaces in it with it.
This is extremely common, especially on Linux. Basically anything that uses things like Bash or CMake will almost certainly not work in directories containing spaces.
Developers don't use paths containing spaces because it causes so many issues with badly written Bash scripts, and as a result they don't test their code with paths containing spaces.
Bash and CMake and similar hacked together languages have very error-prone quoting rules that make it very easy to accidentally make something work with paths without spaces but fail on paths with spaces.
> Developers don't use paths containing spaces because it causes so many issues with badly written Bash scripts, and as a result they don't test their code with paths containing spaces.
It is also a PITA to use when typing in a shell, as you need two characters ( \ + space ) instead of one. So even though my scripts can handle them, I still avoid them if possible.
Should be simple, but with not Gnome. I make the screenshot, Gnome creates a file "Screenshot from ...", but does not tell you where. Then I search it in the file explorer, find it, copy the path. Then I paste the path in the mail program, file:///....Screenshot%20from%20. Then the mail program: "File not found"
If you start discarding software which has problems with a space in a directory name, you should start with libtool, at which point you can't build significant chunks of the Linux ecosystem.
I hit this when trying to test libgmp (as an example of an important library you would lose).
This means in practice you can't really build most software which uses configure scripts and libraries in a directory with a space -- this may well be what they are hitting.
No, I think you’re correct. Incomplete source is bad in any world.
Unfortunately, it’s that world we live in for pretty much everything.
Reproducibility? What if all of the source were to depend on part of a CPU instruction set that we stop using? How long must things be reproducible? We don’t even make lab equipment exactly like we used to with the experiments our current sciences are based on.
However, I give a thumbs up to Groundhog for trying to do the right thing.
Reproducibility down to CPU bit differences is a sign that you did something wrong. Usually calculation with insufficient precision and no thought given to the range of simulation error. Simulation must be treated like a measurement, there is a maximum precision for your instrument and you have to know and apply it.
And even if you might disagree for the single-threaded case, most things running in parallel will eat that free lunch of bit-identical results due to timing differences.
While this specific project does have a github page, the R world is 'complete amateur, avoid avoid avoid'. It's not really a 'programming language' in the way software engineers would see it. It's more a loose collection of stats functionality that is tied together with text interfactes in a way that somewhat looks like programming to the uninitiated. I mean, batch scripting is technically 'programming', and Excel (even without VBA) is technically Turing complete, but neither of those would be considered 'programming' by software engineers, at least not under an intuitive understanding of what 'programming' is. (by that I mean, it's easy to be pedantic and argue that R and batch files and Excel files are 'programming' because of [xyz] where [xyz] will probably involve real 'definitions' and selection criteria etc; but despite those tools being useful, you can't do real software engineering in them, which you sometimes want/need).
This argument seems elitist. R is more than just technically Turing complete.
It's definitely a specialized language. It's not the go-to for managing servers or anything with a lot of I/O, but it has those capabilities because they're useful for managing projects. And I'd be hard-pressed to justify using a language for statistical analysis if it doesn't focus on statistical analysis. It'd be like rolling my own cryptography.
You need to differentiate between "base R" (everything that comes with a new install) and community-contributed packages. Base R is amazingly reliable. It has detailed documentation[0].
User-package land is more of a Wild West, that's true. I would personally not use anything that's not on CRAN unless I can walk up to the maintainer's desk (in non-pandemic times).
shrug. It's largely opinion-based, I guess. My pet peeve (which also illustrates my point, but again, in an opinion-based way): there is no documented, 'officially supported' way to get the path of the current script in R. That is not a problem for amateur programmers who don't think about things like robustness, distribution etc, and it's needlessly complicated and bolted on in SAS, too. But it's still silly and indicative of R's typical use cases. Excel is reliable and well documented too, and I still wouldn't call even complicated workbooks 'software engineering'.
And CRAN... well... let's just say that people used to point to CPAN as a strength of Perl, too... All that sort of archives, after the first few years which comprise mostly of contributors with deep knowledge and who can produce high quality libraries, turn into dumping grounds for trivial half-assed 'libraries' under the guise of 'community contributions'. Example: try to do trivial compound interest simulations in R. So basic that it's barealy worth calling 'finance'. There are (at least) three packages on CRAN that claim to do this, except that (depending on which variable in the equation you want to solve for) they all provide only part of the solution, in mostly incompatible ways. And this is because very few of the people putting code into CRAN know how to... well... write good code. This is not an indictment of those people; many of them are much more intelligent than a bunch of us combined. It's just that for them coding is a byproduct, and with good intentions they share what has been useful for them, it just leads to a situation of 'in the land of the blind one eye is king'.
Of course you can. There's packages, with excellent software engineering structure, that are designed to include documentation and tests.
R has so much good software engineering, that clever people with no software engineering background can easily make their own packages!
And come on, the R language is a masterpiece. It's not cobbled together like JavaScript or bash. It's got impeccable functional programming language pedigree, you can even look at the AST directly of a function directly inside code.
I'm not sure how you came to any of your conclusions, other than not bothering to understand the language to start. It's a beautiful language with a messy, user contributed set of stats code.
> Of course you can. There's packages, with excellent software engineering structure, that are designed to include documentation and tests.
For me, the problem with R is that the language is inconsistent.
Many packages arose to address many problems, but they all feel like a hack on top of the core language.
Take the whole Tidyverse; it just does dataframes from R core but then from the ground up.
Now, users can choose between the core language dataframes and the Tidyverse dataframes.
Same holds for plotting.
The core issue, I think, is that the core language misses some essential features which other languages do have nowadays.
For example, a type system.
In R, since types are missing, everything is a table (dataframe) which I find just weird.
> It's not cobbled together like JavaScript or bash.
But also not as good as my favorite: Julia.
Comparing it to Bash is like saying that its better than COBOL.
We all know Bash is quite old, but for certain situations it just works.
The tidyverse is the benefit and the curse of metaprogramming, something that R takes from lisp, and something that has cursed (helped?) C++ since it was added.
As far as type systems, there's really two different types of "types": individual types objects that can have generic functions attached to them, etc. This is not as well known, and there are actually several object systems for typing:
But these sort of objects are not quite as commonly created by programmers, because the second type of "types" are much more useful: data frames, which is kind of a vectorization of structs. This is what would be used in data oriented design, which is apparently much more common in modern game design.
A further concern: the repository for this R package [1] doesn't include any test files. Am I right to think that we should be wary of R packages that don't have any unit tests?
Could also be that the package manager doesn't use spaces and most people use package managers?
Ie maven will create a folder structure like "/home/user/.m2/repository/com/example/example.jar" which will never have spaces unless the username has spaces (Can linux usernames have spaces?).
On Unixy systems, spaces are uncommon because so little software can deal with them, so that people are trained from the very beginning to treat spaces like the plague. I do it too - I've been burned by treatment of spaces in shitty 0.x level software so many times (25+ years ago) that I now have an intuitive aversion of anything with spaces.
Spaces in filenames are a reality though, especially on Windows (where the home directory itself used to have spaces in it, and also where many home directories on corporate networks are on network drives and start with \\), and any software that can't deal with those kinds of paths has just not been exposed to much (if any) real world use. That was the point I was trying to make - software that can't handle anything but the most bog-standard path names in its core configuration is 'hey guys look at what I hacked up yesterday evening' quality at best. (yes yes it is possible to imagine exceptions, like software that is decades old and ported across platforms; I'm talking about something new that is meant to solve a general problem).
"Update January 6th, 2021 A reader alerted me to a bug with the current groundhog (version 1.1.0) where you cannot set the groundhog library to be a folder containing spaces in the name."
So we are talking about software here that somehow made it to version 1.1 *without anyone ever using a directory with spaces in it with it". This can be interpreted in two ways: either very few people have spaces in their paths, or very few people have actually ever even tried (not even really used, I'm only talking about the most basic trial use) this package. I'm not a betting man, but if I were, I know where I'd put my money...