Apart from all the other considerations and problems with various types of packa...

bayindirh · on Jan 7, 2021

As I can see from the researchers in our cluster and my own academic research, most people still avoid spaces in paths and files like the plague.

YMMV of course.

fjcp · on Jan 7, 2021

As a Linux user I can relate to that. I always avoid spaces in folders and filenames as they make it more annoying to manipulate them using command line tools. Years later I carried this habit to whatever OS I am using.

dstick · on Jan 7, 2021

If my own hobby python projects are anything to go by, there aren’t even folders ;-)

I have a friend who taught herself R for her research and it was basically one big procedural codebase.

YeGoblynQueenne · on Jan 7, 2021

Best way to know where every bit of code is: put it all in one source file.

Sarcasm aside, I've worked with codebases like that- thousand-line java methods and classes and the like. The problem is that there's nothing that really forces modularity on a codebase. There isn't even any consensus, objective way to modularise code. Otherwise, a machine could do it and we wouldn't have this kind of problem. But, a machine cannot, and so we do.

roel_v · on Jan 7, 2021

Of course, and so do I. But nobody ever even encountering the situation and/or bothering to report it, that's a whole different matter.

bayindirh · on Jan 7, 2021

My guess is people are encountering the situation, working around it and calling a day. Maybe a little note here and there but, I don't think someone would report it due to a couple of reasons.

First of all, I don't think people report this type of stuff because they don't know how to report it, and secondly think it doesn't need to support this use case anyway since space is a latecomer to naming and path game.

kristaps · on Jan 7, 2021

Don't remember the source and probably misquoting, but I like this truism: there's software that people complain about and software that nobody is using.

st1x7 · on Jan 7, 2021

The original quote is from Bjarne Stroustrup, the creator of C++. The quote also doesn't apply here. (You can't just use it to excuse any problem with software that you come across). The author of the article and the library in it just seems out of their depth in many ways.

cat199 · on Jan 7, 2021

> there's software that people complain about and software that nobody is using.

> The original quote is from Bjarne Stroustrup, the creator of C++

i find this ironic, given the 'popularity' (either way) of C++

st1x7 · on Jan 7, 2021

I don't think it's ironic, the quote directly addresses the many criticisms towards C++.

cat199 · on Jan 7, 2021

ah whoops- completely misread it

jcelerier · on Jan 7, 2021

> This can be interpreted in two ways: either very few people have spaces in their paths

it's been years since I've seen anyone doing that - a main reason, is that a very widely used dev tool, make, does not handle spaces in paths:

http://savannah.gnu.org/bugs/?712

thus leading to inertia in the whole ecosystem - if make does not support spaces in paths, why bother

IshKebab · on Jan 7, 2021

> So we are talking about software here that somehow made it to version 1.1 without anyone ever using a directory with spaces in it with it.

This is extremely common, especially on Linux. Basically anything that uses things like Bash or CMake will almost certainly not work in directories containing spaces.

Developers don't use paths containing spaces because it causes so many issues with badly written Bash scripts, and as a result they don't test their code with paths containing spaces.

Bash and CMake and similar hacked together languages have very error-prone quoting rules that make it very easy to accidentally make something work with paths without spaces but fail on paths with spaces.

Sebb767 · on Jan 7, 2021

> Developers don't use paths containing spaces because it causes so many issues with badly written Bash scripts, and as a result they don't test their code with paths containing spaces.

It is also a PITA to use when typing in a shell, as you need two characters ( \ + space ) instead of one. So even though my scripts can handle them, I still avoid them if possible.

benibela · on Jan 7, 2021

Some programs also use URLs

Today I wanted to send a screenshot by mail.

Should be simple, but with not Gnome. I make the screenshot, Gnome creates a file "Screenshot from ...", but does not tell you where. Then I search it in the file explorer, find it, copy the path. Then I paste the path in the mail program, file:///....Screenshot%20from%20. Then the mail program: "File not found"

CJefferson · on Jan 7, 2021

If you start discarding software which has problems with a space in a directory name, you should start with libtool, at which point you can't build significant chunks of the Linux ecosystem.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=193163

I hit this when trying to test libgmp (as an example of an important library you would lose).

This means in practice you can't really build most software which uses configure scripts and libraries in a directory with a space -- this may well be what they are hitting.

mattmanser · on Jan 7, 2021

It doesn't even seem to be on GitHub, in fact the source doesn't seem to be listed anywhere on the project website.

Which in our world would scream 'complete amateur, avoid, avoid, avoid', but perhaps it's different in the R world.

qwantim1 · on Jan 7, 2021

No, I think you’re correct. Incomplete source is bad in any world.

Unfortunately, it’s that world we live in for pretty much everything.

Reproducibility? What if all of the source were to depend on part of a CPU instruction set that we stop using? How long must things be reproducible? We don’t even make lab equipment exactly like we used to with the experiments our current sciences are based on.

However, I give a thumbs up to Groundhog for trying to do the right thing.

corty · on Jan 7, 2021

Reproducibility down to CPU bit differences is a sign that you did something wrong. Usually calculation with insufficient precision and no thought given to the range of simulation error. Simulation must be treated like a measurement, there is a maximum precision for your instrument and you have to know and apply it.

And even if you might disagree for the single-threaded case, most things running in parallel will eat that free lunch of bit-identical results due to timing differences.

cowsandmilk · on Jan 7, 2021

Is it not on GitHub at https://github.com/CredibilityLab/groundhog ?

roel_v · on Jan 7, 2021

While this specific project does have a github page, the R world is 'complete amateur, avoid avoid avoid'. It's not really a 'programming language' in the way software engineers would see it. It's more a loose collection of stats functionality that is tied together with text interfactes in a way that somewhat looks like programming to the uninitiated. I mean, batch scripting is technically 'programming', and Excel (even without VBA) is technically Turing complete, but neither of those would be considered 'programming' by software engineers, at least not under an intuitive understanding of what 'programming' is. (by that I mean, it's easy to be pedantic and argue that R and batch files and Excel files are 'programming' because of [xyz] where [xyz] will probably involve real 'definitions' and selection criteria etc; but despite those tools being useful, you can't do real software engineering in them, which you sometimes want/need).

vharuck · on Jan 7, 2021

This argument seems elitist. R is more than just technically Turing complete.

It's definitely a specialized language. It's not the go-to for managing servers or anything with a lot of I/O, but it has those capabilities because they're useful for managing projects. And I'd be hard-pressed to justify using a language for statistical analysis if it doesn't focus on statistical analysis. It'd be like rolling my own cryptography.

You need to differentiate between "base R" (everything that comes with a new install) and community-contributed packages. Base R is amazingly reliable. It has detailed documentation[0].

User-package land is more of a Wild West, that's true. I would personally not use anything that's not on CRAN unless I can walk up to the maintainer's desk (in non-pandemic times).

[0] https://cran.r-project.org/manuals.html

roel_v · on Jan 7, 2021

shrug. It's largely opinion-based, I guess. My pet peeve (which also illustrates my point, but again, in an opinion-based way): there is no documented, 'officially supported' way to get the path of the current script in R. That is not a problem for amateur programmers who don't think about things like robustness, distribution etc, and it's needlessly complicated and bolted on in SAS, too. But it's still silly and indicative of R's typical use cases. Excel is reliable and well documented too, and I still wouldn't call even complicated workbooks 'software engineering'.

And CRAN... well... let's just say that people used to point to CPAN as a strength of Perl, too... All that sort of archives, after the first few years which comprise mostly of contributors with deep knowledge and who can produce high quality libraries, turn into dumping grounds for trivial half-assed 'libraries' under the guise of 'community contributions'. Example: try to do trivial compound interest simulations in R. So basic that it's barealy worth calling 'finance'. There are (at least) three packages on CRAN that claim to do this, except that (depending on which variable in the equation you want to solve for) they all provide only part of the solution, in mostly incompatible ways. And this is because very few of the people putting code into CRAN know how to... well... write good code. This is not an indictment of those people; many of them are much more intelligent than a bunch of us combined. It's just that for them coding is a byproduct, and with good intentions they share what has been useful for them, it just leads to a situation of 'in the land of the blind one eye is king'.

epistasis · on Jan 7, 2021

> you can't do real software engineering

This is completely, 100%, absolutely wrong.

Of course you can. There's packages, with excellent software engineering structure, that are designed to include documentation and tests.

R has so much good software engineering, that clever people with no software engineering background can easily make their own packages!

And come on, the R language is a masterpiece. It's not cobbled together like JavaScript or bash. It's got impeccable functional programming language pedigree, you can even look at the AST directly of a function directly inside code.

I'm not sure how you came to any of your conclusions, other than not bothering to understand the language to start. It's a beautiful language with a messy, user contributed set of stats code.

huijzer · on Jan 7, 2021

> Of course you can. There's packages, with excellent software engineering structure, that are designed to include documentation and tests.

For me, the problem with R is that the language is inconsistent. Many packages arose to address many problems, but they all feel like a hack on top of the core language. Take the whole Tidyverse; it just does dataframes from R core but then from the ground up. Now, users can choose between the core language dataframes and the Tidyverse dataframes. Same holds for plotting. The core issue, I think, is that the core language misses some essential features which other languages do have nowadays. For example, a type system. In R, since types are missing, everything is a table (dataframe) which I find just weird.

> It's not cobbled together like JavaScript or bash.

But also not as good as my favorite: Julia. Comparing it to Bash is like saying that its better than COBOL. We all know Bash is quite old, but for certain situations it just works.

epistasis · on Jan 7, 2021

The tidyverse is the benefit and the curse of metaprogramming, something that R takes from lisp, and something that has cursed (helped?) C++ since it was added.

As far as type systems, there's really two different types of "types": individual types objects that can have generic functions attached to them, etc. This is not as well known, and there are actually several object systems for typing:

http://adv-r.had.co.nz/OO-essentials.html

But these sort of objects are not quite as commonly created by programmers, because the second type of "types" are much more useful: data frames, which is kind of a vectorization of structs. This is what would be used in data oriented design, which is apparently much more common in modern game design.

Hansi · on Jan 7, 2021

https://github.com/CredibilityLab/groundhog

jbullock35 · on Jan 7, 2021

A further concern: the repository for this R package [1] doesn't include any test files. Am I right to think that we should be wary of R packages that don't have any unit tests?

https://github.com/CredibilityLab/groundhog

tpxl · on Jan 7, 2021

Could also be that the package manager doesn't use spaces and most people use package managers?

Ie maven will create a folder structure like "/home/user/.m2/repository/com/example/example.jar" which will never have spaces unless the username has spaces (Can linux usernames have spaces?).

roel_v · on Jan 7, 2021

On Unixy systems, spaces are uncommon because so little software can deal with them, so that people are trained from the very beginning to treat spaces like the plague. I do it too - I've been burned by treatment of spaces in shitty 0.x level software so many times (25+ years ago) that I now have an intuitive aversion of anything with spaces.

Spaces in filenames are a reality though, especially on Windows (where the home directory itself used to have spaces in it, and also where many home directories on corporate networks are on network drives and start with \\), and any software that can't deal with those kinds of paths has just not been exposed to much (if any) real world use. That was the point I was trying to make - software that can't handle anything but the most bog-standard path names in its core configuration is 'hey guys look at what I hacked up yesterday evening' quality at best. (yes yes it is possible to imagine exceptions, like software that is decades old and ported across platforms; I'm talking about something new that is meant to solve a general problem).

nerdponx · on Jan 7, 2021

No, the R package manager can tolerate spaces in filenames.