Hacker News new | past | comments | ask | show | jobs | submit login
Rise of the Scientific Programmer (byterot.blogspot.com)
251 points by StylifyYourBlog on Jan 1, 2015 | hide | past | favorite | 142 comments



As other people have mentioned - the problem is that academia rewards producing papers, not stable software libraries.

For example - scikit-learn is an amazing project, lead mostly by a small group at INRIA with Gael at the helm - and in terms of academic prestige, scikit-learn is probably 'worth less' on your CV than a couple of Nature papers.

This is of course ridiculous - scikit learn is used by a huge amount of people, it takes an insane amount of work to run the project, yet the incentives are what they are.


> the problem is that academia rewards producing papers, not stable software libraries.

isn't this the good thing about academia? aren't you kids getting paid the big bucks to write and maintain the software libraries, while we work on novel problems for pennies?


I am an academic myself! Aside from that - it's actually a bad thing, poor software quality is incredibly harmful when trying to create reproducible research.

A few people are fighting this (Titus Brown, etc) but it's mostly swimming against the tide of bad incentives.


i have started using Docker for this kind of stuff. you can build an isolated environment for your software and experiments, where you can absolutely guarantee that anyone who wants to can easily replicate your experiments, since they don't need to create the environment themselves - just pull the docker image for conference-paper# and run the scripts.

if the experimental data is proprietary, or you want to keep it separate, you can set a mount point for it in the lxc.


"anyone who wants to can easily replicate your experiments"

Replicate the experiments, or just repeat the results?


How about, verify the published code (!) even produces the published results?


the stuff i work on is in the area of machine learning, so most published work involves one or more well-known data sets.

i would argue that the two are the same in this case.

the lxcs provide all the source code i write [plus of course the compiled version], all third-party libraries, and all scripts used to run and evaluate the experiments, and the data as well, where that is permitted.

it's still not perfect, but for my area, i honestly think it is the best, and most accountable way to do things that i have seen.


And hopefully one or more not-so-well-known, local data sets to check that the results are actually as claimed?


well, the idea is that you should be able to run any data set you have, and get good results relative to other solutions. but that is an open question with any research.

the point of the docker/lxc aspect is to provide a simple working environment to facilitate replication and validation.

so in comparison to the status quo, which is basically 'write a paper, include some high level equations, and results', i think this is a step forward in a better direction.


+1 for this. There is so much more to repeatability beyond "When I click run, does it give me the same number again?"


if it's an entirely computational experiment, which is not uncommon, then 'replicate the experiments' is correct.


I tend to worry that an error in the code will be baked into the theory for generations.

I don't deal with much scientific code myself, but at one point I dealt with a proof-of-concept cryptographic library from a reasonably well-respected researcher. The code behaved correctly from the outside, but when I dug into it, deviated wildly from the published specification.


Recent Eurpoean economic policy was based on a paper that relied on an Excel formula error http://theconversation.com/economists-an-excel-error-and-the...

It only lasted a few years, but I find the idea of exiting long lasting research founded on bad code a to be very real possibility.


A distressing number of runs on our HPC system simply aren't reproducible twice in a row anyway. They get repeated until, or in the hope that, they don't deadlock or segv, not that users typically believe in deadlock. They aren't debugged -- it's blamed on supposed system problems, not the code -- and it doesn't seem to worry the people publishing results from them. I doubt our users are unique.

Even for decent code, docker is being over-sold for this sort of thing. Serious large-scale calculations, in particular, simply aren't hardware-independent in practice. Consider a 1024-core PSM MPI job with Haswell-specific code or requiring some GGPU, or a 128-core, 2TB SMP one; you can't run them anywhere. Even if you can package and run in docker at another site, if you don't get the "right" results, what do you do about it if you don't have source?


source code should also be included as a matter of course...

i don't think it is an oversell, in the sense that it is still unusual to include source code and experimental setups [at least in my field]. a replicable environment with included source code is a large step forward.

sad as that might be.


This reminds me of Phillip Guo's work; maybe this one?

http://pgbovine.net/publications/CDE-create-portable-Linux-p...


also it hadn't occurred to me that this might be something interesting to even publish a paper about. so thanks for that too [assuming someone else hasn't already done this too..]

edit: well no surprise there i guess! http://www.nextflow.io/blog/2014/nextflow-meets-docker.html


cool, i had not heard of this. i just started using docker for work and came to the conclusion that it was epically well-suited to this purpose as well. i think docker might be even nicer, since there is no special tools required [but ill definitely take a closer look at this work]


Guo's work is a bit old, docker is a very new thing.


I agree that this is a serious issue in our community, but I am not sure I agree that stable libraries <=> reproducible experiments.


If those experiments involve automated data collection or computer models, then stable data collection or modeling libraries would be kind of important for reproducing them.


Try to reproduce the analysis published in a paper when all you have is a matlab script with one letter variable names and zero comments :)


If you're running someone else's code, imo that's not reproduction in the first place, just like re-running an experiment using the original experimenter's preparations and lab apparatus is not what's usually meant by "reproducing" an experiment. Too much undocumented stuff can creep in if you don't independently reproduce, with independent apparatus, preparations of samples, etc. (I don't think having someone's code is useless, and it can be especially useful for elaborating on the original experiment, but I would purposely avoid looking at it if I were aiming for an independent reproduction.)


You are always running someone else's code. It starts the moment you boot up your machine.


Not if you bootstrap from the silicon up.


If you even have access to the source code, detailed algorithm, or even a matlab script. It's either a citation or a plain old equation.

Often times, and especially from what I've seen in the computer vision papers, the authors merely state what algorithm they used, and how they combined it with their novel method. And that algorithm is in another paper, by the way, probably by the same author. Definitely not the implementation you're working with, too, if you have it.

It's almost as if they need a combined repository. And each paper that presents a novel algorithm, or implementation of an existing one, is a "changeset" or "branch". And the citations to algorithm's used in a paper would be changeset hashes, or branch names. Hey, it's the first thing that popped into mind for me to solve this horrendous problem.


I certainly agree with this. The computer vision field is awash with papers proposing a 'new' algorithm which is then poorly compared to some select group of existing techniques under criteria chosen by the author. A paper is a very poor substitute for the code itself and really it should be mandatory for code to be submitted with the paper, especially in a field such a computer vision where the entire experimental apparatus could be packed into a zip file. That way any other group could take the code and independently evaluate the technique without reimplementation. Indeed my own experience is that often the maths described in the paper is not necessarily responsible for all the results! As you say this could even become the start of collaborative improvement.

Unfortunately my experience is that too many academic groups believe that their source code is the route to untold riches.


Better than nothing. (Been there, done that).


I agree with you here, but I think 'stable libraries' is perhaps a good target for a few reasons, right now the culture isn't just bad code, it's "There is no advantage or benefit to showing your code". I would say a difference between computer scientists and programmers is that frequently the work isn't just the code, but still, nurturing something like an open-source scientific community would accelerate a lot of learning.


The bad thing is academia is badly paid so are the support staff which is why I left a world class RnD organisation to work in commercial software.

have you read Cryptnomicon look at how Randy's first job at a University is described.


Prestige should accumulate to anyone who does good work, of whatever kind. Stable software libraries can help science as much as producing papers (if nothing else, because of their effect on future production of papers!).


In the middle of my PhD I got offered to work on some major Python projects. I had to turn it down since it would have hurt my already shaky PHD progress. I couldn't have used it to contribute to my degree.


It's just a standard consequence of underfunding. No one gets paid to develop scikit-learn full time. An academic gets paid by getting grants. When you look at a lot of the best libraries out there, how were the top contributors employed at the time?

Generally speaking, you'll find that most of them were employed by companies who were willing to let them burn money to develop the library. Universities don't have that luxury anymore.


There actually are some full time developers for scikit-learn - and they are amazingly talented.

There's lots of difficulties in securing that funding though. Getting funding for permanent staff is incredibly tough - usually you have 3 year contracts for 'postdocs' that pay around ~40% of what a similarly skilled software developer could earn. The only stable positions in academia are professorship - and even those are going away in favour of positions where people are expected to pay increasing portions of their salary from their grants.

How can you credibly offer someone 40-50k dollars a year, with no job security, tied to some grants that might or might not be renewed and expect them to turn down offers from companies looking for data scientists?


"How can you credibly offer someone 40-50k dollars a year, with no job security, tied to some grants that might or might not be renewed and expect them to turn down offers from companies looking for data scientists?"

The same way that restaurant owners ask their servers to work for sub-minimum wage and expect them to bring that up to minimum wage via "tips"(Law mandates they bump that up to minimum wage of 7+ if tips don't cover it, but that's separate). Not entirely the same, but it's an alternate form of payment for services rendered. If you ask me, I think it's perfectly reasonable as long as there are no laws meddling and causing an increase of such "odd" remuneration schemes.


I work in academia and I have job security and full benefits. I can't work on projects like this all the time though. I'm responsible for more generalized infrastructure like webservers, but when I have free time I'm welcome to work on stuff like this. I think more academic IT departments should leverage their talent like this.


>>'worth less' on your CV than a couple of Nature papers.

Because Nature is for biologists? A couple of Nature paper's won't get you a job at Google.


Nature is also important in Physics and Astrophysics.


Is here where I point out that Google employees have publications in Nature?


yes, it would almost seem so, if you had presented sources to back the claim :)



we were talking in plural form


The other night I was searching for Python science books and stumbled across this one titled "Python for Biologists: A complete programming course for beginners"...the homepage for the book is here: http://pythonforbiologists.com/index.php/introduction-to-pyt...

Admittedly, it only has two reviews on Amazon, but they're both five stars, and they both seem to come from biologists who are apparently thrilled at being able to leverage code for their work...the funny thing is, the book itself is not "advanced" as far as what most professional programmers would consider "advanced"...the Beginners' book ends with "Files, programs, and user input" and the Advanced book ends with Comprehensions and Exceptions...

I think we as programmers vastly understimate how useful even basic programming would be to virtually anyone today. I work at Stanford and it continually astounds me when I run into non-programmers who are otherwise doing data-intensive research, who fail to see how their incredibly repetitive task could be digitized and implemented as a for-loop. It's not that they are dumb, it's that they've never been exposed to the opportunity. And conversely, it's not because I'm smart, but I literally can't remember what it was like not to break things down into computable patterns. And I've been the better for it (especially because I'm generally able to recognize when things aren't easy patterns)

Sometime ago, I believe it was Stephen Hawking who speculated that the realm of human knowledge was becoming so vast that genetic engineering of intelligence might be required to continue our progression...that may be so, but I wonder if we could achieve the same growth in capacity of intellect by teaching more computational thinking (and implementation), as we do with general literacy and math. As Garry Kasparov said, "We might not be able to change our hardware, but we can definitely upgrade our software."

http://www.nybooks.com/articles/archives/2010/feb/11/the-che...


> I think we as programmers vastly understimate how useful even basic programming would be to virtually anyone today.

I used to work in second-level support. Out of a department of maybe 12, I think 2 of us knew how to program. I hate to even begin to describe some of the things I saw people trying to do by hand, and we were a software development company. I remember pleading with this one guy, "Please, before you ever try to do something like that again, come find me and ask if there is a way to automate the job."

There are so many people that would stand to benefit if only they had so much exposure to programming that they could tell when it would be crazy to not grab the guy next to them and ask if a quick script might not turn a 2-day job into 1 hour's worth of work -- coding, testing, debugging, and execution all included. Sadly, even people (nominally) in technology jobs are ignorant of this.


From my experience, I've come across many individuals in academia who I've tried to suggest particular approaches to their choice of problem and be rejected because they are too proud or too afraid to learn something new. I'm not trying to be an ass when I do, because that would undermine my attempt to share information, but something as simple as suggesting "for i in list:" instead of "for i in range(len(list))):" is so offensive to them since they didn't learn it first when they learned how to program (and god-forbid they learn it from a second year graduate student).

May be biologists or people at Stanford (or biologists at Stanford) are less proud, but my experience here has made me stop trying to relate basic programming concepts to fellow academics.


I had the same experience in grad school, and I can tell you that giving people (good!) style advice when they don't ask for it is pretty pointless. It's kind of like giving grammar advice that wasn't asked for.

People need to be in the right frame of mind to learn new things. Otherwise, if they're getting their point across, just let them keep talking (or coding).


You're probably right about it seeming rude, comparing it to grammar advice makes is a good analogy--I didn't think about it like that.


Programmers are expected to give and receive style, performance, and idiomatic advice in every code review. Is there some way this sort of peer code review could be integrated into the academic process?


Principle Investigators (i.e. professors that run labs) could establish a code review process for code that their group produces, just like any other manager can establish such a process.

Many PIs don't have the expertise to do that well, and many of them don't especially value style, performance, or idiomatic code. You have to remember that most academic code gets used by 1-5 people, and is run something like 1-50 times total. In those cases, it's actually kind of ok that that code doesn't end up being maintainable.

The important risk, of course, is erroneous results. But good researchers generally find many independent ways to check their results, so ideally, bugs that affect results should get caught in that process.

I personally love writing high quality code, and I found the academic science attitude about code to be frustrating. But there are structural reasons for these attitudes, and it's wrong to imagine that you can change them by arguing in terms of things that software engineers find valuable. You'd need to make your case in terms of things that science PIs find valuable, and in the process of trying to do that, you might actually discover that what you wanted to argue for isn't so critical after all.


> You'd need to make your case in terms of things that science PIs find valuable, and in the process of trying to do that, you might actually discover that what you wanted to argue for isn't so critical after all.

This is a great sentence - thanks for contributing.


As part of my PhD, I developed a Maple package for deriving FEM element matrices from first principles. My supervisor, despite years of work related to FEM and having done a lot of Maple programming didn't understand a lot of my code as I used some Maple techniques he had never seen. Even after co-authoring a conference paper on it he still doesn't understand the details.

Sometimes, it can be the tools we use that end up forcing one into poor practices. Maple supports doing multiple substitutions if you pass it a list, but I found that if you had enough, it would just ignore part of the list. So, I had to code a workaround that looped through the substitutions one at a time.

My supervisor was very meticulous about verifying results. So, I verified my code against various known analytical solutions and showed good convergence.

So often I saw the same thing as you that the code is written for a single project and never run again (I've written code like that), so the attitude becomes why spend too much time on it?

Of course, Maple, Matlab, and Mathematica's interactive environments don't help matters as you're often working through the problem and then turning that session into a function or a script.


"I think we as programmers vastly understimate how useful even basic programming would be to virtually anyone today."

There is also the attitude that when a non-professional programmer does assemble a few lines of code, automates a particular task, thus saving countless of hours, and creating lots of value, the programmers go: "Pffft, thats not real programming, that's only scripting"

and the coworkers possibly go "Why the hell did you so that, that's not your job. Slacker!"


Usually because we have to fix / maintain the crap after it has evolved into a hideous mess.


The opportunities to do this are widespread. On the business side, I've worked with both non-technical and technical (but non-developer) people (in multiple orgs) manually doing things like comparing data in files, joining columnar data together in word processors, and copy and pasting numbers into a spreadsheet to do calculations - and spending hours of their day on it. After getting sick of it, they would approach me and ask if there was any way I could think of to help them speed the process up. The answer was often extremely simple, and wound up completely removing them from the process except for seeing the output. They were absolutely thrilled, since they were able to focus on their actual jobs instead of messing around with a thousand line spreadsheets. Extremely gratifying to be able to do that.

Even though so much has been automated (by dint of simply using systems that have useful 'inherent' automation) over the past 15 years, there's still to much fruit out there to picked like this. It's also interesting to note how once a non-developer needs to color outside the lines of a system they are working with when dealing with data processing, they may go right to the manual process since they don't have a way to extend a system themselves. This presents opportunities to help even as our systems get more sophisticated (or just more complicated. Perhaps especially as they get more complicated, since the path of least resistance may be "forget this incredibly convoluted junk! I'll just do it by hand!", which can be a completely rational course of action for them.)


>I think we as programmers vastly understimate how useful even basic programming would be to virtually anyone today.

>It's not that they are dumb, it's that they've never been exposed to the opportunity. And conversely, it's not because I'm smart, but I literally can't remember what it was like not to break things down into computable pattern

I think that this dynamic can make for a great opportunity for those that really enjoy the research side of things and who can write software and looking for alternatives outside of pursuing degrees, especially for those who are entrepreneurial.

>…but I wonder if we could achieve the same growth in capacity of intellect by teaching more computational thinking (and implementation)

My experience with working with neuro postdocs is in line with noobermin, and toufka comments. I'm not bummed out by it, but it's kind of like seeing that there is untapped value there to be exploited by me and many others who see the opportunity being ignored by others.


This is true, but it's probably important to point out how unique the situation is for biologists.

We, quite suddenly, have the ability to generate large data sets to address very particular problems. Since each experiment is different, this process cannot be automated, but the basic workflow never changes that much. All we need is to ask some basic questions about the data; the more complicated stuff is generally not very interesting, or even harmful, as we're ultimately dealing with very blunt tools. We want to know about simple correlations, feed things into statistics libraries, and visualize the data with a few well-known tools.

The result is that anyone who takes the time to learn a little programming knows most of what they need to know. You learn a few good libraries and you're done. Bonus points if you write a script to automate your workflow a little.

I'm not saying this is uncommon, but it seems especially lucrative for biologists currently. Even if you're just putting libraries together like Lego pieces, you're getting a lot done. You don't need to proceed into programmatic thinking, thought the reward is certainly there.


I have noticed a general trend upwards in the interest of scientific programming for a few months now, and the community (most specifically Hacker News) has driven my interest in that area as well. The idea of functional programming and thinking in mathematically sound ways really appeals to me, but my lack of math and comp sci background is holding me back from going full-speed learning and getting better at it.

I feel many of us are lost swimming in a sea of opinions and juggling frameworks du jour, development methods, and business strategies, that it keeps us from focusing on improving our skills in areas that matter. This frustrates me and I've been looking for ways to get out of it. There is also this fear of another bubble mixed with trying to keep up with the trends and hipness of the industry, to remain gainfully employed.

I realize I am sort of just reiterating the authors point, so I guess what I'm saying is I agree.


HN tends to gives this perception of rapid change because you're always hearing about hot new tech and how great it is from the various tech evangelists. It's a side effect of their faith/vision in their own tech that you feel like you're missing the train.

However, I've been following HN for 7.5+ years now, and change is hardly as fast as you think. There really is nothing new under the sun, and you see the same core principles (and human needs) being expressed as the underlying technology changes. The core principles of functional programming have been around since the 30s as lambda calculus, and since the 50s with their initial expression as LISP. It's better to learn FP to add a new way of thinking about problems to your toolbox, rather than treating it as a panacea of programming.

If you're feeling lost in the sea, define your lighthouse. HN will most certainly have you paralyzed by the many waves of every new choice if you haven't defined a clear vision for how you want to harness your finite energy.


"I feel many of us are lost swimming in a sea of opinions and juggling frameworks du jour, development methods, and business strategies, that it keeps us from focusing on improving our skills in areas that matter. "

I sympathize with this a lot. It's extremely hard to prioritize learning new things and improving existing skills over just trying to plow through all the things on your plate. However, I keep trying to remind myself that the best way to get more stuff accomplished is to keep "sharpening the saw"[1]. Making time now for improving your skills will pay dividends in the future.

[1] http://c2.com/cgi/wiki?SharpenTheSaw


Great perspective on the future, we would need many more similar discussions. I love the main message of the post, i.e. "we need to grow up". I have the impression that while we are all optimistic that the future is owned by software developers we don't realise that not for all. There will certainly be more segmentation in our profession and there will be great demand for high-end developers. This requires a lot of learning, and I personally feel it's a tough challenge.

The post also made me realise how much we still think in terms of disciplines. E.g. we think a developer should learn more mathematics. If we were thinking in terms of problem solving, or "modelling reality" (at least in part with software) we couldn't separate these so easily. E.g. if you are writing a software for vehicle condition monitoring you use a combination of engineering, physics, mathematics, computer science - the less you try to - or need to - separate them the better you do.

I can't quite put it simple, but in my mind I can see the future "developer", how got a BSc in Physics, went on to work as a software developer for a couple of years and then continued to learn every day maths, physics, biochemistry, worked in various projects where she could use all these. She is neither a physicist, nor a software developer or mathematician.


> you use a combination of engineering, physics, mathematics, computer science - the less you try to - or need to - separate them the better you do.

I disagree that mentally separating math, physics, and computer science is a good thing. I think a good scientific programmer should understand the science on its own, and then figure out how to model/approximate the science to do something useful on a computer. For instance, if you want to implement Reed-Solomon codes efficiently, you'll realize that understanding the algebra required to understand the code is more or less an orthogonal skill to designing an efficient encoders and decoders.

As a personal anecdote, I had much better luck learning about waves and quantum mechanics after I knew about differential equations, orthogonal decompositions, and a fair amount of linear algebra than I did before I knew this math, even though the physics classes included all of the necessary math. I attribute this better understanding to have a cleaner mental map, because I knew which statements were true because of some sort of physical fact, and which statements were just mathematical results.

In line with the generally smiled upon principle of decomposing problems, I think its particularly critical that the scientific programmer can decompose these interdisciplinary problems. A scientific programmer should understand the science (at least mostly), understand the hardware, and figure out how to efficiently and cleanly map the scientific problem to something that to be solved by a program.


I started on the science/engineering side at BHRA (on campus at CIT) only problem with technical/scientific vs commercial is the pay is so poor.


And academically the prestige is poor. One is not granted 'research time' to develop software, but to 'get things done' (see u/Danso's comment). As such there's no one to take the first step in actually making software that would benefit anyone. And in the generous circumstance that one can be allotted time to write the software, the result is a pat on the head - 'good job' - for reducing everyone's workflow from weeks to minutes. Sometimes you get an acknowledgement. And no one will ever support/read your software when you leave - it will be used ritualistically until the lab's last computer's OS no longer supports it.

On the topic I see two other significant problems:

1) In basic research there is often a need for 'Every Option' style software - you're doing something that's never been done before and you need to be able to tweak it exactly how you need (but also be able to 'just hit run' for a first pass when coming from your native field). And those types of software are inherently a mess to design and build (ie. photoshop, CAD, 3D, programming languages).

2) Some of this software can only be written by those who directly do the research - or someone who very closely collaborates with them. Scientific software contains scientific assumptions in it that are very hard to evaluate if you're not part of the field. Deciding to go right-way-round, or rounding up, or leaving off the last element in an array, or any other of those programming tricks can really mess up scientific work. Or conversely, using the entire array, using a non-weighted, 'avg' or treating the red channel mathematically the same as the blue channel is a very different way of designing software than other industries - and is not common and rarely given much thought.


There's a modest living to be made at the interface between "scientific programming" and "commercial implementation of programs scientists write". This is an under-appreciated niche because it takes a lot of work to get into it: you need to be a good developer and a good scientist with diverse experience. My way in was through experimental and computational physics, but there are certainly other avenues these days.

Modern statistics is the biggest piece of the picture that every interesting area has in common. If you're interested in scientific programming you need to understand Bayes as well as algorithms etc. I have friends in psychology, biology, etc and we can communicate surprisingly well because we all speak the same statistical language.

But more importantly you need to understand how scientists think. They are amazingly hard to pin down to the kind of specs developers need.

For example, a guy on my team once said after talking to one of the scientists we were working with for a couple of days, "I now have a much better grasp of the problem, but I still don't know what the default value of this parameter should be." I spent ten minutes talking to the scientist and came back and told the developer "5", because I could tell from the way the scientist was talking that he had no clue if the number should be 3 or 10, but seemed to be favouring the lower values. I didn't need to understand the problem domain in detail to make that judgement, but to have a reasonable grasp of the psychology of working scientists. So far as I know, there's no way to get that without working as a scientist yourself.


true I was talking more about doing tech/scientific programing as a discipline - I started as an associate professional what's some times called a professional apprentice though at the time if you had called me or my pears apprentices we would have told you where to go in no uncertain terms

I recall I was considered a bit flash because I used Mixed Case in my Hollerith statements for input prompts.


Yes. And this trend is reversing. Have a look here: http://www.itjobswatch.co.uk/jobs/uk/machine%20learning.do

Salaries have gone up by 16%


I know I am tempted to go back to the more technical side - though when a recruiter asked about working on the MET police's Registry I wasn't that keen :-)


The page you linked to only say 9% increase? 55k -> 60k


Shameless plug: In January I'll be focusing on a project which deals specifically with scientific software: http://sciencetoolbox.org/ This current version is a product of a hackathon, but this month will be improving it and adding functionality which brings the scientific software developer and her efforts into focus. Scientific software is gaining importance, but the recognition its developers get is trailing behind - I want to raise the level of associated recognition/prestige (among other related things). Some other projects rely on data collected here, e.g. a recommendation engine for said software that enhances GitHub: http://juretriglav.si/discovery-of-scientific-software/

Shameless plug continues: if you'd like to keep track of what I'm doing I suggest you either follow the project on GitHub (https://github.com/ScienceToolbox/sciencetoolbox) or Twitter (https://twitter.com/sciencetoolbox).


Units. I'd love so much for a standard fileformat/interpreter/concept that contained SI units as a requirement for most datatypes. Much to be learned from my TI-89.


I loved the unit conversions on my '89. You might enjoy checking out the units command line tool, or the frink programming language.

http://linux.die.net/man/1/units

http://futureboy.us/frinkdocs/


I don't know what you mean by "as a requirement for most datatypes". I work with molecules, where distances are typically measured in Angstroms and masses in amu. I don't want to have factors of 1E−10 and 1.660538922E−27 hanging around my code.


I assume that he means that the type system knows that these are in useful units, and tracks them, so you can have it check that your calculation actually results in a value of the unit you expected.



Yeah. There's really no reason why our hot new scientific computing languages and libraries should all be lacking capabilities that graphing calculators had in the 1980s when they only had 2k of RAM. HP demonstrated that it doesn't even require a CAS to be extremely useful.


I know what unit libraries are, and why they can be useful. There are several units libraries for Python, the Boost C++ library includes one, etc.

I don't know why they should be a 'requirement for most datatypes'. Could someone please explain the requirement part?

As a further clarification, why should people in scientific computing, in fields which use non-SI units like eV, amu, Angstrom, barn, light year, and megaparsec, use a programming language which requires SI units? Quasar 3C 273 is 749 megaparsecs from us, or 2.31E25 meters away. I don't see why SI should be preferred.


Any programming language that makes units a first class part of its type system would allow for defining custom units just as easily as defining other data types. Nobody's saying that SI has to be the only units expressible, just that it has to be the foundation of the unit system. Likewise, unitless numerical quantities will necessarily still be expressible, but using those types for variables that represent a value with a unit should be considered extremely poor practice, just like when communicating those values on paper.

There's really not a good reason to argue against using only SI units for data interchange formats. It's trivial to map to the preferred units on import or display, if and only if you know what units the input data is in. I've dealt with too many bugs where interacting programs have differing assumptions about meters, centimeters, and millimeters to believe that the flexibility of storing different units on disk is ever worth the trouble.


I use Python. If I use one of the third-party packages for units then I can do what you say I can, but at extreme cost. Every single operation checks for unit conversion, on the off-chance that the values aren't compatible. The system, in trying to be nice to me, ends up making things invisibly slow.

(In practice, the performance code runs in C, so the Python/C boundary would have to negotiate the array types for full unit safety.)

In my work the base unit of length is angstroms. I've used nanometers a few times, and never used any other length unit, though I know that GROMAC's xtc format uses picometers. Saying something has a volume of 600 cubic angstroms is much more useful than 6E-28 cubic meters. While I can appreciate that other fields closer to human scale use may like to standardize through SI, I don't want your preferences enforced on my field. All I see is the chance to make things worse, and slower, and don't see any advantages.

One of my data formats has coordinates in angstroms, like "8.420 50.899 85.486". How would you suggest that I write that in an exchange format? As "8.420E-10 50.899E-10 85.486E-10"? (Or the last two normalized to E-11.) At the very least that's a lot of data for very little gain. It gets worse for trajectories, which might save 1 million time steps x 10,000 atoms/time step x 3 coordinates/atom = 3 billion coordinates to an exchange file. I see no advantage to doing that in SI units.

In practice those distance coordinates will likely internally represented in angstroms. Consider that the Lennard-Jones potential is sometimes written as A/r^12 - B/r^6 , with expected values of r around 1E-10m. The denominator of the first will go to 1E-120 in intermediate form, and not be representable in 32-bit float. While not relevant for Python, which uses 64 bit floats, some molecular dynamics programs will use 32 bit float. (Eg, for older GPU machines, or to save space.)

My other example was the atomic mass unit, another non-SI unit. I have only used amu (for chemistry) or dalton (for biology) in my work, not kilograms. It seems pointless to require that I store the mass of a carbon as 1.9926467051999998e-26 kg instead of 12 amu.

I therefore disagree, and believe there are good reasons to argue against SI units for some data interchange formats. I agree that I want to store a single distance unit on disk, only that unit is the non-SI unit angstrom and amu, and not the tremendously huge meter or kg.


The idea that you can export a csv file or excel worksheet full of unitless numbers is scientifically whacky. In the same way most would scoff at an graph without axis labels why too don't we scoff at a data file without units.


What's whacky about implicit units for a given format?

I use data files without explicit units all the time. A format specification, which might be explicit or implicit, might say that a given file is stored in CSV format where the first column contains a molecule representation (in my case as a SMILES string), followed by an identifier, followed by the molecular weight (in amu, which is the only reasonable unit), followed by surface area (in Ų, which is always the case), followed by volume (in ų, again, always the case). You'll note that none of these are SI units.

I have another file containing a molecular structure in SDF format, which starts:

   16125001
     -OEChem-04231101242D
   
    44 45  0     1  0  0  0  0  0999 V2000
       4.5411    4.0194    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
       3.6750    2.5194    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
Line 4, the 44 is "number of atoms", the 45 is "number of bonds", and the (4.5411, 4.0194, 0.0000) is a coordinate in angstroms. I see a distinct lack of units in the data file.

I have another file, in PDB format, containing lines like:

    ATOM     34  N   GLY 1   6       8.420  50.899  85.486  0.50 51.30

The (8.420, 50.899, 85.486) is a coordinate in Å. The other numbers are identifiers of one sort or other, or the unitless occupancy and B-factor.

Are you really going to scoff at my entire field, for working with data files without explicit units since the late 1960s?

When working specifically with Excel files, an organization tends to stabilize on what certain column titles mean, so the a title of "MW" means "molecular weight in AMU", etc. This is more complicated when applied to values which are parameter dependent (charge at pH 7.5 vs. 6.0), or depend on specific models and/or software version. You'll notice that pH is a unitless number.

Units are only a subset of the ontology usually omitted from a given file format. Others include unitless terms like B-factor, pH, "number of atoms", prediction model, and implementation version. This ontology is often instead made explicit in external format documentation or through shared knowledge of the users of that data file.

What advantage there is for me or my field to have 'SI units as a requirement for most datatypes', especially since we often deal with non-SI units like Å and amu? I don't see anything except more confusion, more chances for error, and performance overhead of going from/to SI units instead of staying in the domain-specific preferred system.

The people who I've seen try to use a Semantic Web/Linked Data approach end up bogged down in verbose and slow to parse data formats that make it hard to do real work, because the software has to be wary that the input one moment might be in Å, the next in nm, the third in m, and the fourth in yoctoparsecs.


The PDB format is an agreed-upon format that most fields don't have the luxury of using. And its implicit units keep it from being significantly more useful - outside of its own particular field. But you are correct, it has units - and that is a HUGE advantage over nearly any other scientific data format.

If the PDB format really had explicit units you could start to use it in other fields, easily - without knowing anything about the format itself. But again, PDB is an example of a well-codified format.

It'd be great if every figure/table you saw in a paper had an associated <.xsciformat> which was united (interesting that I meant unit-ed, but that's exactly what it would do - it would unite). That way you could download files from a gel-shift assay and directly and computationally compare the data with the diffusion data from a microscopy assay, and utilize pHs estimated from PDB files, or any other such really interesting co-interactions with the raw data itself. Right now this kind of co-linking of data across disparate fields is impossible. And I think much of it could be clarified if the user couldn't print out a graph/dataset that didn't have units - implicit or otherwise.


You asked "why too don't we scoff at a data file without units". You just answered your own question: because it's an "agreed-upon format."

All of my response was to point out that an Excel spreadsheet, CSV file, etc. can equally be considered an "agreed-upon format" by those who use it, so don't need explicit units.

My original question was a simple one. Why should units be a requirement for most datatypes?

I know all of the reasons for why it's useful. I don't understand why it should be a requirement.

The PDB format is not an easy format to understand. Unit conversion is one of the least of the problems in using it outside of its field. Determining bond assignments is much harder, and bond type assignment harder still. In fact, I have a hard time figuring out an example where an explicit "this is in unit X" would make things appreciably easier, as compared to near useless data taking up space.

Could you give an example of how someone could start to use it in another field, easily, where they couldn't now? I can only see it occurring by completely replacing the format, since adding an "A" after each coordinate, or a comment at the top that the coordinates are in angstroms, can't be what you mean. (Nor would including the PDB spec as a comment in each record be what you mean either - though it would be self-documenting!)

For that matter, the X-ray resolution field in a PDB record contains significant digits, so "2.0Å" and "2.00Å" mean different things. The ontology of units is not easy. 2.00Å is 2.00E-10m, not 2E-10m.

In any case, I deal with a lot of unitless numbers as well: pH, molarity, number of atoms and bonds, number of rotatable bonds, ratio between elongation and fixed elongation, etc. The ontology of values is not easy, and a required SI-unit system looks much more like it would get in the way than be useful.


I agree, but in some fields this is already done. For example, This is exactly why geoscientists have standardized on the fully self-described netcdf file format. With netcdf, you can specify units, axis labels and other metadata very straightforwardly.


How about sorting by stars?

And how to add things - I am trying, but without a success?


Like I said, the current minimal product is hacked together from various pieces (scrapers, graph database generators, API wrappers, etc.) and is far from what it will be in a few weeks. Things like adding were changed from manual to automatic (based on citations), and some of the interface remained the same. I suggest you check back with http://sciencetoolbox.org at the end of January.

Feature list: Full-text search through the software (also by metadata, tags, citations) More extensive/up-to-date indexing of citations User profiles (for owners/developers and users) Citation notifications Usage visualizations (graph network of cited software, percent citations in a given category and much more)

Should be fun :)


Good luck with this project! As a (ex?)scientist, the problem I remember is looking if there is a mature-enough library and still in active development/support. (Usually there are many, but only a few actually useful.)

A few other ways to go is to scan arXiv for citations of software, https://pypi.python.org/pypi for scientific packages, etc.


Computational and biological sciences will likely meet on a financial equivalent to commercial software applications at the intersection of epigenetics and pharmaceuticals in the new few decades.

When scientists begin to discover feasible methods to cure or manage previously incurable diseases (a more recent example of this has been attempts to cure Cystic Fibrosis), or more specifically reversing some of the diseases that our older baby boomer populations are suffering from via epigenetic methods, you can bet your bottoms that there will be a huge influx of capital in the sector and a subsequent increase in demand for computational biologists.

Of course we could end up in a sort of quasi-understanding parallel to that of quantum mechanics and end up in a epigenetic limbo, but the general feeling is that of high hopes.


Pair programming keeps it human, and transfers knowledge very well.

TDD is about reproducibility of results, which is very in line with the scientific method. Benchmark tests will show you when your solutions are getting out of hand on performance.

The sunk cost fallacy is a big problem. Moving to a new platform like HTML5/iOS/Android gives a short reprieve, but soon those proprietary code bases will age.

The other big problem is that usually a smaller portion jobs goes towards management in flatter organizations. Managers want lots of layers for job security.

Eric Meijer is right that small teams which are given narrow mission objectives instead of detailed requirements, and measure their problem domain instead of guessing, will be effective.

I'm curious if a Fat-Tree model of management will take hold, http://en.wikipedia.org/wiki/Fat_tree You get a flatness that improves communication latency, lots of bandwidth, and managers are happy because there are a lot of jobs at the top.


I think many in the comments are misunderstanding what the author mean to say in his post. He is not talking about working in academia or making scientific software. He talks about improving one's skills in basic science and such fields in computer science as A.I. which have historically been entrenched more in academia than industry.


The author of the post started by getting into programming/CS by wanting to earn more/better money, by picking up an Access book and working it out from there, and now that he's more established, he looks down at the young hungry people who picked up an Access book in the hopes of more/better money, running through all the tropes that people who got where they are through knowledge will use.

Yes, it's useful to understand things better, and to know math. And as always since the first CS degree started, CS people gripe that people should Know More Math. Sure it helps. Other, less prestigious, things also help but you don't hear people griping about it. TDD allows the idiots in. Yes, that's effectively why you want TDD, you want to get more mileage and solving more complex or more bug-sensitive problems using the same people. Building software is not about being smart (although that helps on occasions), it's about getting stuff done.

Yes, machine learning and AI are the new kids on the block, and like Web programming, they will see a bloom of increased customer demand, and like Web programming, we'll get a progression from bespoke boutique software to frameworks that make people's lives easier to frameworks that allow any person with the intelligence of a pet rock to do simple stuff productively. Why is that? Because building frameworks is the only way that the smartest people can earn money faster than programming the (N+1)th variation on that theme everyone follows -- frameworks are what make people more productive, or allow you to use a workforce that's more accessible.

As a Wizard With a Pointy Head (aka academic), I'd say that the need for Wizards With Pointy Heads in production work is often overestimated and/or idealized. There is a large number of PhD graduates, and the market happily gobbles them up (indeed, realizing that you can hire PhDs and have them do productive work is one of the things that made Google successful as a company back in the early 2000s).


I like the spirit of the article, but in my view, much of the author's writing falls prey to "either-or" thinking. See https://en.wikipedia.org/wiki/False_dilemma.

> We can argue about how to version APIs and how a service is such RESTful and such not RESTful. We can mull over pettiest of things such as semicolon or the gender of a pronoun and let insanely clever people leave our community. We can exchange the worst of words over "females in the industry" while we more or less are saying the same thing, Too much drama.

> But soon this will be no good. Not good enough. We got to grow up and go back to school, relearn all about Maths, statistics, and generally scientific reasoning.

The programming community is large and diverse. We can do all of these things, including the rigorous (what the author seems to call "scientific" [1]) ones and the UI/marketing ones too. I think this diversity is a strength, as long as managers and entrepreneurs find and retain the skill-sets necessary for their domains.

[1] I prefer to use "science" to mean falsifiability, preferably with strong experimental designs. Much of what the author talks about is mathematical rigor (from computer science), which is also important, but not experimental science. See also: https://en.wikipedia.org/wiki/Philosophy_of_science#Defining...


Mostly agree with the article. Being myself fascinated with machine learning and in the process of refreshing mathematic knowledge I haven't used since university (too many years ago) in order to dive deeper into it, I can definitely relate.

However, I think the main point is not that software developers should all hone their academic math skills (that would be probably be pointless for many if not most software developers), but rather that it would be best if software developers would strive to follow the scientific mindset when developing software - In my experience, occam's razor is just as important in software development (design, architecture, algorithms, testing, you name it) as it is in physics, chemistry or other sciences and it is this aspect (which I feel is the most basic and most important) that gets lost sometimes in the noise of software development trends and fashions.


The problem with scientific software is that the market is so small.

It is far more profitable to just write mainstream software.


As someone who makes a living writing scientific software it is not that the market is small, but that scientists find it very hard to pay for software. They are happy to spend hundreds of thousands of dollars on some piece of hardware, but in most scientist's mind software should be free.


So install your software on some hardware and sell them that ;-)


This is exactly what I do :)


It's more profitable to do scientific research slowly and tediously than it is to automate it?


Disclaimer: I'm in the scientific equipment and software business. There are some issues to overcome. These may just be excuses from people who don't want to change their game, but nonetheless:

1. Sometimes, wages and equipment / software come from different pots of money. No matter how much sense it makes to replace labor with automation in the grand scheme of things, if you can't move money from one pot to the other, then you're stuck with the status quo.

2. I think that people sometimes underestimate the degree of customization and effort required to automate a specific process. Or sometimes overestimate, as in, "our process is so special that no commercial tools will work for it."

3. If anybody is going to work on automation, it will be the students themselves, as it's a way to learn a valuable skill. They may be doing it with minimal fanfare, and there is a strong movement towards open source tools. Students realize that the generous budgets and site licenses will vanish when they leave the academe, and are interested in preparing themselves for freelance work, startups, etc. This may also favor general purpose tools, rather than those designed specifically for science.

My anecdote, from 25+ years ago: I taught myself electronics and programming by automating my student projects, culminating in my thesis experiment.


I really think we need more required (good) computing courses in both math and science curricula.


I was really lucky that I took to programming pretty easily while still in high school. (graduated '82). Likewise with math. As a result, I was able to integrating computing into my work with minimal guidance.

But it's my view that the teaching of math and science should involve computation, starting as early as possible. It still amazes me that a kid can go through high school without learning about something that has had so much impact on our society. In my utopian world, there would be at least one question in each physics homework assignment with the instructions: Solve this with computation. And it wouldn't be a big huge deal.


The people capable of the automation have financial incentives to move to industries with broader appeal. There are a lot of people doing great scientific computing work in academia (e.g. Sean Eddy, Titus Brown), but they are notable exceptions.


This of course is stereotyping, but for me it feels like there is truth to it:

It has more prestige to have a bunch of Phd's doing busywork in the lab than to pay for software which could replace them. Expensive equipment at least looks cool and impressive if the bosses go round with guests and want to show off where the money went, software can at the very best make pretty pictures on a screen somewhere.

Academic research is not necessarily oriented on profit on that level: It cares about getting grant money, but not always about using that money as efficiently as possible.


software can at the very best make pretty pictures on a screen somewhere.

The solution we have for that in the commercial software world is really big screens.


And report graphs. Lots of report graphs. Oh, look, a real-time dashboard. Do the metrics its tracking mean anything to anybody that is reading it? Who cares?

This seems to be one of the main ways salesmen and execs communicate over enterprise software deals.


Amusing aside: Whenever I see anybody mention a "dashboard" in business, I'm reminded of the kiddie toy for long car trips, that has a brightly colored steering wheel, several interesting levers and knobs, a mirror, etc. The idea is that the tykes can pretend that they're driving.


Exactly this. Unless you find a smart way of making money out of your scientific software.


A smaller market means less competition so potentially higher prices and profit.


Not always. There has been commercial molecular modeling code since the 70-80's, but the number of players has noticeably shrunk over the years. The whole market might be 50-100mil/year and as pg said there's no exit strategy for something that small. It's hindered by a perceived lack of impact and various scientific problems yet to be solved. There's also a disconnect between data held proprietary by the customers vs. commercial and academic developers and researchers.

And bioinformatics is (or at least was) different in that there were good, open-source alternatives which made it financially impossible to to sustain a commercial effort. There are players but I don't think they've done better than molecular modeling. Hard to compete with free and good.

The business needs some real scientific breakthroughs to jump forward. Currently, everything can be coded well but the tools are too blunt.


I'm having trouble understanding the definitions of these roles. I see the chart, but the terms are all vague to me. What does a data scientist do that a mathematician or scientist doesn't do, and what does a scientific programmer do that a data scientist doesn't do?

My impression was that "data scientist" was a colloquialism for "statistician that knows how to program." Is a scientific programmer just a programmer that knows some statistics? Why is the direction important? The author says he/she feels that a programmer that knows statistics can make "more robust software" than the other way around, but what exactly does that mean? Do they mean "doesn't crash as much", or do they mean "gives the right answer more often?"


Basically all of my Master's and PhD work involved scientific programming, but it was definitely not data scientist work. You're making a big assumption that science is statistics. There's scientific simulations (and deriving the models) and scientific visualization as well. Those are only two out of many possibilities.


The funny thing is: statisticians do know how to program, they just do it in R instead of some hipster language.


Data scientist - Here is a metric ton of data, find something useful from it

Scientific Programmer - Here is a set of physical laws and differential equations which govern this chemical reaction, write a simulation for it

Mathematician - This looks like a fun theorem to prove

Scientist - This looks like a fun hypothesis to test


Robust software generally refers to it not crashing. "Correct software" would be how I described software that "gives the right answer more often".


for one thing statistics don't really encompass A.I., computer vision etc.


Sorry but this is wrong. Statistics and probability theory underpin most of AI and machine vision.


One thing can underpin another without encompassing it.

Mathematics don't encompass physics and finance.


Here's my attempt at a TL;DR:

* The '90s "Access in 24 hours" programmer has been replaced by the latest anecdote-based technique/toolset preacher; e.g., TDD.

* Becase deep learning is better than humans at finding useful patterns in data (whether concerning biochemistry or web site interaction) it is the best technique.

* Aesthetic (e.g., language) and social justice (e.g., feminisim) issues distract from utilitarian effectiveness.

* Utility is only furthered by math and science (where for "science" read "patterns inferred from data"), and we should aspire to be "scientific programmers" who apply only math and science.


My biases may be guessed at in my summarization, but let me make them more explicit.

I think it odd that some of the techniques he rails against are inspired by mathematics: TDD tries to preserve invariants; RESTful design tries to impose the invariance of idempotency. If the question is whether those techniques make those who use computers more productive, then a scientific answer would involve a stunningly expensive human subjects experiment involving large numbers of people and complex problems. The likely result would be this: https://xkcd.com/1445/

I'm a sysadmin doing my best to automate (e.g., puppet); I rarely have the luxury of collecting data sufficient to the immediate problem, so I rely on math and (unreliable) heuristics. I write perl/shell/puppet/ruby/anything in small fizz-buzz complexity chunks; an "artisan" if you will. I support CAE environments with low-latency and poor parallelism opportunities, and until that changes (e.g., becomes cloud compatible) I don't see my tactics changing significantly.


Your summary is indeed very biased. Maybe someone could use the first bullet point. Better to look at the last paragraph:

We got to grow up and go back to school, relearn all about Maths, statistics, and generally scientific reasoning. We need to man up and re-learn that being a good coder has nothing to do with the number of stickers you have at the back of your Mac. It is all scientific - we come from a long line of scientists, we have got to live up to our heritage.


A post about data science and scientific programming featuring a set of graphs with no y-axis scale and labels. At my gaff this kind of presentation of data leads to "scrap the whole analysis and start again".


dude, have you ever used Google Trends? It is a cut and paste from the source. The y axis is the popularity as the title says. It is so obvious even Google has omitted it.


should have labelled it "Magic Google Unitless Dimension"


You can put all of the terms on the same graph; that would allow people to compare these on a like for like basis as opposed to "look at the shapes, here is an argument".


Deeplearning4j and ND4J contributor here: We've created a distributed framework for scientific computing on the JVM, ND4J, which is the linalg library behind Deeplearning4j, which includes ConvNets and other deep-learning algorithms. Contrary to the author, we believe Scala is the future of scientific computing. While Python is a wonderful language, the optimization and maintenance of scalable programs often happens in C++, which not a lot of people enjoy.


Thanks guys for your hard work.


A look at the trends for "R programming", compared to "Python programming", is quite interesting too: http://www.google.com/trends/explore#q=R%20programming%2C%20...

(Their curves are more or less parallel since 2011)


I believe the author is right - This is the reason why I'm spending my final ECTS points on statistics and machine learning.


What if we define intelligence, not from an anthropomorphic view, but from a systemic view; as all systems have intelligence.

What is "Artificial Intelligence"? The opposite of "Natural Intelligence"?


Saying that all systems have intelligence == anthropomorphizing all systems IF AND ONLY IF you consider intelligence to be a uniquely human trait.

A more useful distinction for the realm of intelligence could be 'designed intelligence vs. grown/evolved intelligence' instead of 'artificial intelligence vs. natural intelligence'; however, stuff like reinforcement machine learning is then neither, or perhaps a hybrid form of intelligence. In the end the pragmatic value of the concept intelligence is low, both for systems as for humans.


What is deep learning currently applied to besides object recognition in images?




So Machine Learning in general is an almost solved problem?


I think the author claims more like "it works", not that "it is solved".


Plug: I made a list of software that is useful for scientists: https://gist.github.com/stared/9130888.


First of all, I hate this "Agile" nonsense. I've seen it kill companies. It's truly awful, because it gives legitimacy to the anti-intellectualism that has infected this industry. It's that anti-intellectualism that, if you let it, will cause a rot of your mathematical and technical skills. Before you know it, you've spent five years reacting to Scrum tickets and haven't written any serious code, and your math has gone to the birds as well. It's insidious but dangerous, this culture of business-driven engineering mediocrity.

I hope that it'll be the fakes and the brogrammers who get flushed out in the next crash. Who knows, though? Obviously I can't predict the future better than anyone else.

To me, Python doesn't feel like a "scientific" language. Python's a great exploratory tool, and it's got some great libraries for feeling out a concept or exploring a type of model (e.g. off-the-shelf machine learning tools). That said, science values reproducibility and precision, which brings us around to functional programming and static typing... and suddenly we're at Haskell. (Of course, for a wide variety of purposes, Python is just fine, and may be a better choice because of its library ecosystem.) I do think that, as we use more machine learning, we're going to have a high demand for people who can apply rigor to the sorts of engineering that are currently done very quickly (resulting in "magic" algorithms that seem to work but that no one understands). I also agree that "deep learning" and machine learning in general are carrying some substance, even if 90% of what is being called "data science" is watered-down bullshit.

I still don't feel like I know what a "scientific programmer" is, or should be. And I'd love to see the death of business-driven engineering and "Agile" and all the mediocrity of user stories and backlog grooming meetings, but there's nothing yet that has convinced me that it's imminent just yet. Sadly, I think it may be around for a while.


I think you may be under-weighting the value of Python being a great exploratory tool a bit.

For the (relatively small) amount of research I've done, the bulk of the time spent working with code was spent exploring large-ish sets of data to see what results came up. Python is a very nice language for doing this, and also has a great ecosystem of tools and libraries for doing it.

I think it's also true that the benefits of FP and static typing are most pronounced on large codebases. If you're writing small specific bits of code you can mostly hold in your head all at once you don't gain nearly as much from a language like Haskell.

Which is not to say I wouldn't like to see more people using languages like Haskell (my primary programming language is OCaml now), but I think the bad habits people in scientific fields have around programming are a lot more of a detriment than the specific language they're using.


Curious, what are the software development processes that you do not consider nonsense?


I think it is the fake "Agile" processes that michaelochurch is referring to, not the real one.


I'm negative on "Agile" because it's an attempt to patch closed-allocation, business-driven development. If you have a high level of talent and you're solving interesting problems, you can do open allocation. If those aren't the case, and for some reason can't be, then you need to take different approaches entirely (but you should seriously question whether what you're working on is worth doing in the first place).


"Agile" in many shops means letting the passengers fly the plane (to borrow a phrase from one of your blog posts).


"Python's a great exploratory tool"

Is it great at that in ways Haskell isn't? If so, is that something we could capture and host within Haskell meaningfully?

I like doing exploratory work in Haskell, but if there's something I'm missing I'd like to capture that too where it's appropriate.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: