During my PhD I wrote similar software to manage a large number of lab instruments and perform experiments in a reproducible fashion. The code is online on Github:
The idea was to create an MVC-like framework where you could create and instantiate instruments, bundle them together into a system and use them to perform measurements. The data from the measurements would be saved in a text-based format and enriched with the meta-data about the state of the whole system (in order to make it reproducible).
Your work seems to go in the same direction and seems to be very interesting! I think there is definitely a need for a system like this in Python, although it is a difficult problem since the requirements vary quite a bit as a function of the research field that you're in.
I've yet to see an experiment software I've really liked. I've seen a few people use and struggle with OpenSesame recently, which seems like an incredibly featureful and useful project, but is also awkward because of it. Pyexperiment seems like a nice lightweight solution, but it has the same problem, which is that you need some scripting and computer skills.
There's an argument which says that a modern researcher needs to be able to script/program, at least to a degree. But I don't like the idea of otherwise very able and very skilled scientists struggling to do good research because they're not great with computers.
I'm not sure what can be done about this. On a large scale, I'd love to see proper investment into the UX and UI of existing scientific codes (and I have a long list of where to start with that). On a personal level, I'd really like to make an alternative to something like OpenSesame; not a replacement, but something more lightweight for smaller scale/student studies, with a nice GUI and easy to use. I wouldn't really know where to start though: what the essential features are and what you can leave out, because I don't do that sort of study myself.
> I don't like the idea of otherwise very able and very skilled scientists struggling to do good research because they're not great with computers.
The solution already exists: skilled scientist hires grad student who's done some programming + some electronics classes. Happens everywhere already, works well.
Talking about UI and UX as a problem with scientific codes makes me giggle though. We have much bigger issues with reproducibility, providence, mandating codes be open source, even getting people to use a version control system, that we need to fix first.
Interesting, would love to hear some of your experiences! It sounds like the issues we face are slightly different. From my field, version control is pretty much a non issue[1], most of the widespread codes use that (typically svn, but that's fine). Build systems are often an issue, as is incomplete (or outdated) documentation, and a lack of proper changelogs. A lack of proper tests too. Actually, maybe the issues we face aren't so different after all...
> The solution already exists: skilled scientist hires grad student who's done some programming + some electronics classes. Happens everywhere already, works well.
I've never actually seen this work well. Scripting an existing solution, sure. But writing a new one from scratch... Most scientists I have met personally who code, write awful awful code. Sometimes they just lack the time to do something better (I'm as guilty of this as anyone), other times they're just not very good at it. And it's almost never maintainable: when they leave the project, you might as well re-write from scratch (a more cynical person would say that's by design, but I actually don't believe that). There are many, many exceptions to this, obviously, but as a rule...
I am optimistic though. It seems like things are changing for the better, slowly.
Edit: [1] That said, I've been trying to persuade a colleague to use version control for ages. I'm at a loss. Live and let live, I guess.
(What's you field? Mine's computational physics, but I collaborate a lot with experimentalists.)
When I say version control, I meant for in-house projects, that tend to be just on someone's laptop and also a horrible mess, as you say. Most public codes are indeed on some VCS, git is very popular in my field. But reproducibility is a whole other matter. There's been some interesting work on providence systems that can attach all input needed to run a code for reproduction to e.g. figures in a paper, but until we've got a good universal one that journals start caring about, they're not going to see wide adoption.
As for non-computer-savvy profs using grad students for programming: I see a lot of either LabView (for controlling experiments) or Python/Matlab in those cases, and it's often "write-only" code, but it mostly gets the job done.
I'm also optimistic, but there are too few people working on and caring about the software carpentry we need to back modern svience in a good way.
Looks interesting, though I think the word "experiment" is what's throwing people off, as it usually has a research/science connotation. As you've explained it here, your project is mostly a "commons" library for reducing regular boilerplate you've encountered in your area of work. As such, I would consider renaming it to something more suitable.
Also, these types of projects aren't really reusable by the general public, unless it fits your use case exactly. For example, I'd have no need for matplotlib or NumPy, and would want to output JSON logs (with something like [structlog][0]). That said, trying to accommodate everyone's use case is impossible, so as long as it solves your problem, mission accomplished.
Thanks for your feedback. I see your point, but at this point I am kind of attached to the name - as a researcher many of my scripts are related to some kind of experiment. This also explains the NumPy and matplotlib dependencies. As for structlog, if I understand the documentation correctly, you could easily add structlog to your project on top of pyexperiment, right?
I want to voice my support for keeping the driving use case "experiments".
I'm coming from the perspective of a software engineer here. To a software engineer, a "program" is a collection of stateless routines and behavior. Data is external and separate, the same program should be able to process a wide range of data. "Reproducibility", as much as that matters, is having a tested system that responds in a predictable and reliable way to inputs, and data is one such input.
When I first worked extensively with a scientist on an experiment, I was shocked how much common wisdom from computer science was turned on its head. One is expected to load up a Matlab workspace with data and code all in the same file? Scripts irreversibly mutate data, and often run exactly once? How could one possibly keep track of such an environment? How does one fix bugs in a series of commands typed into an interactive prompt? Reproducibility to a scientist is a log of actions that could be repeated by another human, but the environments used often just dropped such things on the floor, to be caught only by the most diligent researcher with an unusually well-kept notebook.
I think there is definitely a happy medium somewhere. Reproducibility as a scientist understands it; interactivity in a way that makes sense to a scientist writing a one-off script. Program state stored easily so that the scientist doesn't feel lost every time they restart their environment, as I imagine they must do when editing python scripts in vim as a software engineer might. But all this in a world where scripts can be maintained and versioned and fixed without their hair catching fire.
Thanks for the kind words. Until I left academia I used to work with matlab a lot, and pyexperiment is probably a result of trying to get that experience while making scripts that can easily be shared along with the data needed to run them.
E.g., the issue with irreversible mutation of data is addressed in pyexperiment with rotating state (and log) files. For example if you store the state of your experiment in one run, and then change it in the next, you will get a backup of the old state with a numerical extension (by default up to 5 backups are rotated). Moreover, pyexperiment by default comes with commands to display stored state and configuration options (though they still need to be improved), and both are stored in formats compatible with a host of other software (including matlab).
Btw., along the same lines, I love ipython notebooks, but the way I use them makes them very hard to share, and compared to plain python scripts, version control is a pain (even with the usual hacks to make diffs readable).
That looks like it could be the part of something more generic, like yeoman for node.js. I like the idea of yeoman for Python (and any language/technology in general)
I...I really don't understand what the primary use-cases are for this library. By "experiment", do you mean some kind of scientific experiment? A double-blind study on file input? That...doesn't make a whole lot of sense, but that's the closest thing that I can glean from a skim of the first documentation page that makes any sense at all...
Motivating Example
Let’s assume we need to write a quick and clean script that
reads a couple of files with time series data and computes
the average value. We also want to generate a plot of the
data.
That...provides no motivation whatsoever. If I want to do that, I'll use Pandas and matplotlib. This example sheds absolutely no light on what pyexperiment does or why I would want to use it.
Pandas and matplotlib are great, I use them all the time.
Yet whenever I write a python script for such experiments, beyond some point, I find myself writing code that parses command line arguments, handles simple configuration options, saves the results of my computation in a shareable way, etc.
Pyexperiment collects these bits and pieces into a library where I can just write the relevant stuff - it's mainly solving my own pain-point and I thought I could share it.
>> So...it's for configuration management? What do >> "experiments" have to do with this?
Yes and no, pyexperiment handles configuration management for you (by using argparse and configobj etc.), but it's main point is that it saves you from having to set up these components. My main goal was to reduce the overhead of writing the same "framework" code every time, so pyexperiment sets you up with simple one-command solutions for many common tasks.
>> What does this package do that isn't done by click,
>> docopt, or argparse?
It will set you up with a framework where e.g., you don't have to write the same boilerplate code to get a multiprocessing cabable logger that logs to a file. You just call log.debug("bla") and that's it.
I think most people would refer to this as a "boilerplate" or "project template", even if it is not exactly so. You may have more luck explaining it to people using these terms.
Thank you for releasing this. I'll be trying it later today. The badges are very enticing :).
This project is in a crowded space (Python frameworks), and your README isn't selling it well. It's too heavy on the tell, and too light on the show. Ideally your opening section would be something like.
## pyexperiment
<badges>
pyexperiment is ... <1 line, maximum 2>. Here's an example:
<less than 10 lines of python>
Here's the result of running that command, complete with <top 3 features>
<less than 10 lines of command output>
Additionally pyexperiment gives you <another 3 features>, and more.
The general-purpose part: https://github.com/adewes/pyview
Specific components for my experiments: https://github.com/adewes/python-qubit-setup
The idea was to create an MVC-like framework where you could create and instantiate instruments, bundle them together into a system and use them to perform measurements. The data from the measurements would be saved in a text-based format and enriched with the meta-data about the state of the whole system (in order to make it reproducible).
Your work seems to go in the same direction and seems to be very interesting! I think there is definitely a need for a system like this in Python, although it is a difficult problem since the requirements vary quite a bit as a function of the research field that you're in.