> it's substantially more work to clean and organize the code for publishing, it will increase the surface for nitpicking and criticism (e.g. coding style, etc).
Matt Might has a solution for this that I love: Don't clean & organize! Release it under the CRAPL[0], making explicit what everyone understands, viz.:
"Generally, academic software is stapled together on a tight deadline; an expert user has to coerce it into running; and it's not pretty code. Academic code is about 'proof of concept.'"
> "Generally, academic software is stapled together on a tight deadline; an expert user has to coerce it into running; and it's not pretty code. Academic code is about 'proof of concept.'"
What do you know, it turns out the professional software developers I work with are actually scientists and academics!!
You don't put science on your name if you're a real science
Having flashbacks to when a close friend was getting an MS in Political Science, and spent the first semester in a class devoted to whether or not political science is a science.
"Information science" is basically long form of "informatics" so that breaks it I'd say. Also, "information" tends to imply a focus on state and places computational aspects (operations performed on information) as second hand.
I've yet to find a classification I really like but this is an interesting take. I still tend to like CIS (Computing and Information Sciences). The problem with CS is it focuses on computation and puts state as second class. The problem with IS is it focuses on state and puts computing as second class. To me, both are equally important.
Maybe a little OT, but, I'd rather it be called "computing science." Computers are just the tool. I believe it was Dijkstra who famously objected to it being called "computer science," because they don't call astronomy "telescope science," or something to that effect.
Peter Naur agreed with that which is why it is called "Datalogi" in Denmark and Sweden, his English term Datalogy never really caught on though.
He wrote it in a letter to the editor in 1966 and it has pretty much been used since then here as Naur founded the first Datalogy university faculty a few years later.
This also lead to a strange naming now we have data science as well where it is called "Data videnskab" which is just a literal translation of the English term.
I never thought about this, but it's right. However, to get more nitpicky, most of the uses of "Comput[er|ing] Science" should be replaced with "Computer Engineering" anyway. If you are building something for a company, you are doing engineering, not science research
The average developer isn't often doing "engineering". Until we have actual standards and a certification process, "engineer[ing]" doesn't mean anything.
The average software developer doesn't even know much math.
Right now, "software engineer" basically means "has a computer, -perhaps- knows a little bit about what goes on under the hood".
> The average software developer doesn't even know much math.
Well, I know stupid amounts of math compared to the average developer I've encountered, since I studied math in grad school. Other than basic graph traversal, I only remember one or two times I've gotten to actually use much of it.
Engineering is something like “deliberately configuring a physical or technological process to achieve some intended effect”. That applies whether you’re building a bridge or writing fizzbuzz
I'm not talking about the "average developer", I'm talking about college graduates having a "Computer Science" degreee but in practice being "Computer engineers"
College degrees aren't standardized and most of the time don't really mean anything. Ask some TAs for computer science courses about how confident they are in undergrads' ability to code.
There isn't a standard license that show that someone is proficient in security, or accessibility, or even in how computer hardware or networking work at a basic level.
So all we're doing is diluting the term "engineer", so as to not mean anything.
The only thing the term "software engineer" practically means is: they have a computer. It's meaningless, just a vanity title meant to sound better than "developer".
I agree. I can't find it right now, but there was an article on HN within the past few days talking about how software engineering is often more like writing and editing a book than engineering. That makes perfect sense to me. Code should be meant primarily to communicate with people, and only incidentally to give instructions to computers. It seems to be lost to the sands of time who said this first, but it is certainly true that code is read by humans many more times than it is written. Therefore, ceteris paribus, we should optimize for readability.
Readable code is often simple code, as well. This also has practical benefits. Kernighan's law[0] states:
> Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.
Dijkstra says it this way:
> The competent programmer is fully aware of the strictly limited size of his own skull; therefore he approaches the programming task in full humility, and among other things he avoids clever tricks like the plague.[1]
Knuth has a well-documented preference for "re-editable" code over reusable code. Being "re-editable" implies that the code must be readable and understandable (i.e. communicate its intention well to any future editors), if not simple.
I know that I have sometimes had difficulty debugging overly clever code. I am not inclined to disagree with the giants on this one.
Yes. Specifically, it's referring to what the academic discipline ought to be called.
Incidentally, I don't think I know any SWEs who actually majored in software engineering. I know one who didn't even bother to graduate high school and therefore has no academic credential whatsoever, a couple of music majors, a few math majors, and a lot of "computer science" majors, but I can't think of anyone who actually got a "software engineering" bachelor's degree. Hell, I even know one guy with a JD. I think I know 1 or 2 who have master's degrees in "software engineering," but that's it.
Please don't use this license. Copy the language from the preamble and put it in your README if you'd like, but the permissions granted are so severely restricted as to make this license essentially useless for anything besides "validation of scientific claims." It's not an open-source license - if someone wished to include code from CRAPL in their MIT-licensed program, the license does not grant them the permission to do so. Nor does it grant permission for "general use" - the software might not be legally "suitable for any purpose," but it should be the user's choice to decide if they want to use the software for something that isn't validation of scientific claims.
I am not a lawyer, just a hardcore open-source advocate in a former life.
I‘m a proponent of MIT and BSD style licenses normally, but this calls for something like AGPL: Allow other researchers and engineers to improve upon your code and build amazing things with it. If someone wants to use your work to earn money, let them understand and reimplement the algorithms and concepts, that’s fine too.
That's probably not viable under US copyright law, especially with the Bright Tunes Music v. Harrisongs Music precedent; if someone is going to reimplement the algorithms and concepts without a copyright license, they're better off not reading your code so they don't have to prove in court, to a computer-illiterate jury, that the aspects their code had in common with your code were really purely functional and not creative.
Yeah, this seems a bit verbose and overbearing to me. The open-source licenses I've used myself include something like this, which seems quite sufficient:
> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
From a non-academic's point of view I might include a brief disclaimer in the README too, explaining the context for the code and why it's in the state it's in, but there's no obligation to hold the user's hand. To be frank, anybody nitpicking or criticizing code released under these circumstances with the expectation that the author do something about it can go fuck themselves.
Competitive advantage, on the other hand, is a perfectly valid reason to hold code back. There may also be some cost in academia to opening an attack surface for any sort of criticism, even irrelevant criticism made in obvious bad faith. Based on what I've heard about academia, this wouldn't surprise me.
From the post: "The CRAPL says nothing about copyright ownership or permission to commercialize. You'll have to attach another license if you want to classically open source your software."
It is explicitly the point of the license that the code is not for those purposes, because it's shitty code that should not be reused in any real code base.
That's not a good excuse for putting your readers at legal risk of copyright infringement. A real, non-shitty code base could easily be a "derivative work" of the shitty code.
In the research industry, it is well established for anyone wanting to publish or utilize / include another's research in their own, to contact the source author and receive explicit permission to do so.
More often than not, they are more than willing to help.
It looks quite explicitly designed as a short-term temporary license for the period when the main paper is unpublished and you'd be expected to keep the code non-public (due to e.g. reviewing anonymity requirements), so the basic open source freedoms are explicitly not included.
I would expect that anyone wanting to actually publish their code should publish the code with a "proper" license after the reviewing process is done and the relevant paper is published.
> the permissions granted are so severely restricted as to make this license essentially useless
Indeed, also there're things like "By reading this sentence, You have agreed to the terms and conditions of this License.". That can't hold up in court! How can I know in advance what the rest of the conditions say before agreeing to them?
While I whole heatedly agree with you, I would seriously question anyone trying to reuse research code in production without completely reimplementing it from scratch.
I'd like to piggyback and say that increasing the surface for nitpicking and criticism is exactly why OP should release his code. It improves the world's ability to map data to the phenomenon being observed. It becomes auditable.
Certainly don't clean it up unless you're going to repeat the experiment with the cleaned up code.
Agreed on both points! As somebody who bridges research and production code, I can typically clean code faster than I can read & understand it. It really helps to have a known-good (or known-"good") example so that I can verify that my cleanup isn't making undesired changes.
And, yeah. I've found some significant mistakes in research code -- authors have always been grateful that I saved them from the public embarrassment.
The CRAPL is a "crayon license" — a license drafted by an amateur that is unclear and so will prevent downstream use by anyone for whom legal rigor of licensing is important.
I think this could be done much better by putting a very restrictive license like GPLv3 / AGPL and then in the README putting in that I don't support this project at all and ignore everything associated with wherever you are hosting it.
Using this license would actually make me suspect that your results aren't even valid and I don't trust many experiments that don't release source code.
In case OP, and others don't know, it is the copyright holders that can decide on the license. Copyright holders are the persons who contributed to the code. In this case, it sounds like OP is the sole author and therefore the sole copyright holder.
You cannot change the past, but as a copyright holder, you can always set a new license for future releases.
Thus, OP, if you're uncertain, I definitely was when I started out, go with a restrive license as recommended here (GPL). That, together with publishing the code online (e.g. GitHub, Gitlab, ...) as well as a with your article, will give you some protection against plagiarism. Anyone who use include parts of your code for their research code, will have to share theirs code the same. If you later on feel like you want to relax the license, you can always change it to, say, MIT.
Anecdotally most of the research papers I see and have worked on publish their code but don't really clean it up. Even papers by big companies like Microsoft Research. Still significantly better than not publishing the code at all.
A thousand times this. A working demonstrator of a useful idea that is honest about its limits is so valuable. Mover over most commercial code is garbage! :)
If only to help people who simply can't read academic papers because it's not a language their brain is wired to parse, but who can read code and will understand your ideas via that medium.
[EDIT]: to go further, I have - more than once - run research code under a debugger (or printf-instrumented it, whichever) to finally be able to get an actual intuitive grasp of the idea presented in the paper.
Once you stare at the actual variables while the code runs, my experience is it speaks to you in a way no equation ever will.
This has explicit usage limitations that matter in science land, which is very much the kind of thing that belongs in a license.
Eg:
You are permitted to use the Program to validate scientific claims
submitted for peer review, under the condition that You keep
modifications to the Program confidential until those claims have
been published.
Moreover, sure, lots of the license is text that isn't common in legal documents, but there's no rule that says legal text can't be quirky, funny or superfluous. It's just most practical to keep it dry.
In this particular case, however, there's very little risk of actual law suits happening. There is some, but the real goal of the license is not to protect anyone's ass in court (except for the obvious "no warranty" part at the end), but to clearly communicate intent. Don't forget that this is something GPL and MIT also do besides their obvious "will likely stand up in court" qualities. In fact I think that communicating intent is the key goal of GPL and MIT, and also the key goal of CRAPL.
From this perspective, IMO the only problem in this license is
By reading this sentence, You have agreed to the terms and
conditions of this License.
This line makes me sad because it makes a mockery of what's otherwise a pretty decent piece of communication. Obviously nobody can agree to anything just by reading a sentence in it. It should say that by using the source code in any way, you must agree to the license.
Again, this is not how a license work. You can express your intents, ideas and desires in a README file and in many other ways.
The license is nothing more than a contract that provides rights to the recipient under certain conditions. Standing up in court is its real power and only purpose.
That's why we should prefer licenses that stood up in court and have been written by lawyers rather than developers or scientists.
I strongly disagree. Contracts very much primarily communicate intent, ideally in such a way that they also stand up in court. People regularly argue over details in contracts, people regularly look up things in contracts, also when there is no court to be seen and no intention anywhere to go to court. The vast vast vast majority of contracts never make it to court.
Plenty of contracts aren't even written down. When you buy a loaf of bread at the bakery, you make an oral contract about that transaction.
The idea that contracts, or licenses, need to be written in dull legalese and be pretty much impenetrable to be useful or "valid" or whatever, is absolutely bonkers. Lawyers like you to think that but it's not true. It's an urban legend.
If you need to make sure that you can defend your rights in court, then sure, you're probably going to need some legalese (but even then there's little harm in also including some intent - it's just not very common). Clearly that's not the goal here. No scientist is gonna sue another scientist who asked for support and got angry about not getting any even though the code was CRAPL licensed.
That's a well known fact. And it's besides the point.
> Lawyers like you to think that but it's not true.
Is that a conspiracy theory? Writing long, detailed contracts on a persistent medium is safer: it lowers the risk of he-said-she-said scenarios and ambiguities.
That is meant to save you tons of legal expenses.
> No scientist is gonna sue another scientist
Then there is no need for such license in the first place. Just a readme file.
I agree. There's a lot of confusion surrounding even the most established ones, so there's no need to further muddy the situation with newer licenses. In my opinion a "fun" license, with its untested legal ambiguity, restricts usage more than a well established license with a similar level of freedoms.
For instance, the Java license explicitly forbids the use in/for real-time critical systems, and such limitations are good to stress in a license so that they may reach legal force, also to protect the author(s).
Incidentally, I've seen people violate the Java "no realtime" clause.
Used to, OpenJDK is licensened under GPLv2 with the classpath excemption that allows this for years. If not running an OpenJDK build it depends on your vendor license.
Yes, if you feel you have to make it "release ready" then you'll never publish it. I'm pretty sure a good majority of the code is never released because the original author is ashamed of it, but they shouldn't be. Everybody is in the same boat.
The only thing I would add is a description of the build environment and an example of how to use it.
4) You recognize that any request for support for the Program will be discarded with extreme prejudice.
I think that should be a "may" rather than a "will." If I find out someone is using my obscure academic code, and they ask for help, I'd be pretty pumped to help them (on easy requests at least).
The point of the license is to set your expectations as low as possible. Then, when you actually /do/ get support, you'll be ecstatic rather than non-plussed.
Discarding a request for support with extreme prejudice might entail using LinkedIn to look up the boss of the person who asked you for support, then phoning them up to complain about the request for support, or it might entail filing for a restraining order against the person requestings support. The point of this clause is to intimidate people out of making the request in the first place.
You should probably familiarize yourself with the meaning of the phrases it's alluding to, "dismissed with extreme prejudice" and "terminated with extreme prejudice".
Exactly. If nothing else, a request for support has a chance of being an indication that there's somebody else in the field that cares about some aspect of the problem. I might not act on it, but it is good to have some other human-generated signal that says "look over there."
Why does he think that but presumably not the same about the paper itself and the “equations”, plots, etc. contained within?
It’s really not that hard to write pretty good code for prototypes. In fact, I can only assume that he and other professors never allowed or encouraged “proof of concept” code to be submitted as course homework or projects.
I think you don't understand the realities of work in scientific/academic organisations. Unless you work in computer science you likely never received any formal education on programming except for some numerical methods in c/matlab/fortran course during your studies (which often also focus more on the numerical methods and not the programming). So pretty much every person, just learned by doing.
Moreover you are not being paid for writing reasonable programs you're paid for doing science. Nobody would submit "prototype" papers, because they are the currency of academic work. There is lots of time spend on polishing a paper before submission, but doing that for code is generally not appreciated because nobody will see this on your CV.
I understand it fine. Like I said, it’s really not that hard to write pretty good code for prototypes. I'm not saying the code needs to be architected for scale or performance or whatever else needless expectation. I don't have a formal education in programming or computer science and write clean code just fine, as do some other non-computer science people I've worked with in advanced R&D contexts. And then some (many?) don't. It's not really about educational background, it's more about just being organized. Even when someone is "just" doing science, a lot of times, the code explicitly defines what's going on and has major impacts on the results. (Not to mention that plenty of people with computer science backgrounds write horrible code.)
If code is a large part of your scientific work, then it's just as important as someone who does optics keeping their optics table organized and designed well. If one is embarrassed by that, then too bad. Embarrassment is how we learn.
Lastly, you're describing problems with the academic world as if they are excuses. They're reasons but most people know the academic world is not perfect, especially with what is prioritized and essentially gamified.
I'm not making excuses I'm just talking about the realities. For the significant majority of researchers I know about version control is still program_v1.py, program_v1.py, program_final.py, program_final2.py (and this is the good version, at least they are using python), so talking about clean code is still quite a bit away. I'm teaching PhD students and it's even hard to convince them, because they just look at how to get the next publishable result. For academics it becomes even more unrealistic, most don't even have time to go to a lab, they are just writing grants typically.
I'm actually a strong supporter of open science, release my code OSS (and actually clean it up) and data (when possible). But just saying it's easy and there is no cost, is just setting the wrong expectations. Generally unless you enjoy it, spending significant time on your code is not smart for your career. Hopefully things are changing, but it is a slow process.
Funny that you talk about optics I know some of the most productive people in my field and their optical tables are an absolute mess (as is their code btw). They don't work at universities though.
I still think it’s really not that hard, and actually, it’s really not even about code. It’s really just about organization, because as you point out, not everyone is great at it. But for example, messy optics tables, labs, or whatever do in fact cause problems, like efficiency and knowing what “state” of supporting tools yielded what result and several other derivative problems. I think my push would be just applying even a modicum of organization on supposedly ancillary things will go a long way rather than accepting them as reality.
I understand the realities and have even been a part of PowerPoint development where slides are emailed back and forth. Sometimes one just has to go with things like that. But I have also seen the reality of stubbornness. I have tried introducing source code control to scientists or even stuff like wikis, all supported and already up and running by IT and used by other groups. Scientists and engineers, especially those with PhDs, can a bit rejective and set in their ways. I have been told flat out by different people that they wouldn’t use tools like a wiki or that Python and C was all they ever needed. I have even noticed physicists saying “codes” for software instead of “code”. It’s fairly rampant, and I have seen it in research papers, published books, and in industry in person. I have never seen that use of “codes” anywhere else. That alone is evidence of a certain amount of culture and institutionalization of doing things incorrectly but viewed as acceptable within the field.
I have written code in some research contexts. I get the constraints. One just needs to take it seriously. But organization, in general, often takes a back seat in basically any field. The only way to change things like this are like anything, which is to push against culture.
I'm not a scientist so maybe I don't get it, but it seems like code could be analogized to a chemist's laboratory (roughly). If a chemist published pictures of their lab setup in their paper, and it turned out that they were using dirty glassware, paper cups instead of beakers, duct taping joints, etc etc, wouldn't that cast some doubt on their results? They would be getting "nitpicked" but would it be unfair? Maybe their result would be reproducible regardless of their sloppy practices, but until then I would be more skeptical than I would be of a clean and orderly lab.
MIT and BSD are established and well accepted licenses, literally named after the academic institutions where they originated. Licenses are legal documents, part of what makes them "explicit what everyone understands" is their legally recognized establishment.
If you want to set expectations, this can simply be done in a README. Putting this in a license makes no sense. Copyright licenses grant exceptions to copyright law. If you're adding something else to it, you're muddying the water, not making it better.
FWIW I basically did this: My thesis numbers were run on a branch based on the unstable version of an upstream project that was going through a major refactoring. I took a tarball of the VCS tree at that point in time and posted it online. Over the years 3-4 people have asked for the tarball; nobody has ever come back to me with any more questions. I can only assume they gave up trying to make it work in despair.
I think I tried to build it a couple of years ago, and it wouldn't even build because gcc has gotten a lot more strict than it used to be (and the upstream project has -Werror, so any warnings break the compilation).
I think it's definitely worth doing, but I think you need to be realistic about how much impact that kind of "publishing" is really going to have.
> Matt Might has a solution for this that I love: Don't clean & organize! Release it under the CRAPL[0]
> "Generally, academic software is stapled together on a tight deadline; an expert user has to coerce it into running; and it's not pretty code. Academic code is about 'proof of concept.'"
This is brilliant!
edit: it seems that, aside from that great snippet from the text, the license itself isn't so great. another comment [1] has a great analysis of the actual license and suggests using a superior solution (copying the preamble part yet still using MIT/(A)GPL).
You don't need an esoteric license, just use a standard license like MIT>
relevant section from MIT license:
"THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE."
Matt Might has a solution for this that I love: Don't clean & organize! Release it under the CRAPL[0], making explicit what everyone understands, viz.:
"Generally, academic software is stapled together on a tight deadline; an expert user has to coerce it into running; and it's not pretty code. Academic code is about 'proof of concept.'"
[0] https://matt.might.net/articles/crapl/