Ultimately, we could have a full gene-expression-as-a-dynamical-system model which we could perturb and use to study the dynamical features of genetic networks.
Hi HN. I'm a relatively new coder so I have only just begun to play around with more sophisticated issues like running code on clusters, networking (ports and stuff like that).
I was hoping someone could kindly explain what a task queue is used for. If I understand it correctly, a task queue would be used for running automation code that is written during the development of a python project? But that doesn't sound right.
Can someone please tell me what a "task queue for python" means in context
A task queue or message queue is generally used to delegate jobs to worker processes.
Imagine you are GitHub and a user comments on an issue that has 1000 people participating. As soon as the user makes a comment you want to send an email to every user that there are new comments.
The straight forward way is to do that in the web application when the comment is posted, but that takes a lot of time and leaves the commenter staring at a loading screen until everyone is notified. (There are other reasons as well)
With a message/task queue you would create 1000 messages (each saying "send e-mail about this issue to this user"). Then you have worker processes that in a loop look at one message at a time and sends an e-email. When the message is sent the worker marks (acks) that message as handled and it is removed from the message/task queue.
I'm an undergrad doing some neuroscience research; I think the existence of such tools are essential to progress. As soon as you can start doing CompBio on the web, more people (i.e. students, non scientist programmers) will be encouraged to start playing around which will lead to more progress in our understanding of complex biology, and spur the people's excitement about recent progress in the field. The beautiful thing about code is that it is tremendously empowering – your writing a computational biology library for the web will empower individuals to get into computational biology. Furthermore, it will enable individuals to write blog posts with live JS examples, etc all to further broader progress. Lastly, I'm sure it will a learning experience for you too (albeit probably of much lower magnitude than a PhD). I say do it!
It's a bit beyond me too, but here's my take from skimming: It's already known that you can do differential calculus in some pretty abstract spaces (not just functions from R^n to R^m). Automatic differentiation is a way of automatically keeping differentiability and computing derivatives when sticking other differentiable stuff together. This puts those ideas together, so you can automatically differentiate some more interesting things stuck together in some interesting ways related to programming. That was what I got from my rusty semi-layperson's skim, at least.
Karpathy's explanation in CS231n is great: for any computational unit that is a function, you can calculate a derivative. So propagating the derivative backward through the function, just aplly the chain rule. This is much with simpler functions, so understand your algorithm down to the smallest computable units in the graph.
Does this only work for mathematical programs?
I don't understand how this would work for more general computations, involving loops, writes to files, etc.
This paper isn't really a great introduction into the concept of AD. That said, one way to visualize it is to see how a basic AD implementation works. There are generally two different varieties, operator overloading and source transformation. Let's focus on the first.
Say we have a function:
double myfun(std::vector <double> x) {
auto y = double(0.);
for(int i=0;i<=5;i++)
y += x[0]*x[1];
return y;
}
We may want to find the gradient of myfun, or simply the derivative of myfun with respect to x[0] and x[1]. What we're going to do is reply double with a special AD type, which requires us to code things like:
template <typename Real>
Real myfun(std::vector <Real> x) {
auto y = Real(0.);
for(int i=0;i<=5;i++)
y += x[0]*x[1];
return y;
}
Now, instead of using a double, we'll set Real to be our AD type. In C++, this'll be something like a class where all of the AD action occurs. What's in this class depends on the particular AD algorithm, which depends on the what derivatives we want and what kind of recomputations of the derivative we want to make. In general, though, it consists of some kind of tree. At its most basic level, the nodes of this tree consist of the types variable, constant, unary function, or binary function and they contain information that relates to the chain rule. When we want to compute a derivative, we traverse the tree in a very particular order to compute and accumulate the derivative information. How this is done can vary as well, which has implications for the efficiency of the algorithm. Most implementations use a tape that predefines the order of the tree traversal. It's also possible to do a topological sort to solve for a tree traversal. Anyway, traversing a tree can be expensive for a really, really large computational tree, so the goal is to only touch every necessary node once.
Returning to your original question, finding the derivative of a program involves building up this tree of computation. Control structures don't really matter in this context because we'll just keep adding to the tree regardless of loops or conditionals. This, by the way, can be really, really expensive since the computational tree can be huge for a long running loop. There's some game in determining how and when to collapse the tree in certain ways. Source code transformation tools can just leave the original, or really modified, control structures in place, which alleviates a lot of this expense, which is partially why they're faster than operator overloading.
Thanks for your explanation. So what this really does is enable one to train code using machine learning techniques such as gradient descent, traversing a code-space instead of traversing a space of numbers?
Well, from my perspective, there are three different components to machine learning techniques like multilayer perceptrons. Basically,
1. What's the model?
2. How do we define the error between the model and the data?
3. What algorithm do we use to fit the model to the data?
In the case of a multilayer perceptron with a single hidden layer, the model is
m(alpha,W,b,x) = alpha'*sigmoidal(Wx + b)
where
alpha : nhidden x 1
W : nhidden x ninput
b : nhidden
' denotes tranpose and sigmoidal is some kind of sigmoidal function like a logistic function or arc tangent. As far defining the error, we tend to use the sum of squared errors since it's differentiable and easy:
J(alpha,W,b) = sum_i (m(alpha,W,b,x[i]) - y[i])^2
Finally, we need to apply an optimization algorithm to the problem
min_{alpha,W,b} J(alpha,W,b)
Most of the time, people use a nonglobalized steepest descent. Honestly, this is terrible. A matrix-free inexact-Newton method globalized with either a line-search or trust-region method will work better.
Anyway, all good optimization algorithms require the derivative of J above. Certainly, we can grind through these derivatives by hand if we're masochistic. Well, for one layer it's not all that bad. However, as we start to nest things, it becomes a terrible pain to derive. Rather than do this, we can just use automatic differentiation to find the gradient and Hessian-vector products of J. In fact, most AD codes already do this. I can never find a paper that's run through the details, but back propagation is basically just a reverse mode AD algorithm applied to J in order to find the gradient.
What is the data that the machine learning algorithm would optimize with given an arbitrary program that is differentiated?
Is it just a faster program or something else?
For any changes you would need some kind of perfect 100% coverage regression test that proves that the optimized program is still correct and handles all cases because the differentiation only recorded one possible path through the program.
Alright, so machine learning is a pretty broad term that encompasses a lot of things. Very specifically above, I was talking about things I call curve fitters, which include multilayer perceptrons. There's also classifiers, which often people create by use a curve fitter and then thresholding it. There's other strange techniques as well that people call machine learning that don't fit into these categories as well.
Anyway, for curve fitters, we literally need a set of points {(x_i,y_i)}_{i=1}^m given to us and our goal is to map the n-dimensional vectors x_i to the scalar output y_i. If we want a multidimensional output, we just do this more than once. Now, theoretically, we can try and match any function, or program, to this data. Let phi be this model, so that we want phi(alpha,x)=y. Here, alpha are the parameters we want to tune for the matching. Also note, these parameters can't be arbitrary. We really need them to be some kind of floating point number. Now, how we do the matching can vary, but sum of squared errors is easy, so we solve the optimization problem
min_{alpha} sum_i (phi(alpha,x_i)-y_i)^2
Basically, we want to find a parameter alpha so that the output from program phi matches our data {(x_i,y_i)}_{i=1}^m as closely as possible in a least-squares sense. Since all good optimization algorithms need a derivative, we use automatic differentiation to find the gradient, and probably Hessian-vector product, of
J(alpha) = sum_i (phi(alpha,x_i)-y_i)^2
Now, if phi is a program, then so is J. It's a program that takes alpha as an input, runs a for-loop to sum up the squared errors, and then it returns the result.
Alright, that was long winded, but the key is this. We're not optimizing a program when we do this. Rather, the optimization occurs when we minimize the difference between our inputs and our outputs, which were given to us ahead of time. In this context, we have two programs. One, the program that we're trying to fit to our data. Two, the program that determines the amount of misfit between our inputs and our outputs. We need the derivative of the composition of these two programs, which is why we use automatic differentiation.
It is very important this research was shared. However, the very difficult and likely impossible part is to actually change the emotional and interpersonal ethos of society. Given the very intangible nature of the problem, no one in power will care enough to initiate a national program to motivate such social change. Therefore, it must come from us, the people. We ought to stand to the call and do something of value to nudge society towards a more interactive state.
Can we offload rationality to our subconscious without allowing ourselves to succumb to heuristics? What are some strategies we can use for relying less on heuristics and more on logic?
How can we avoid perceptual errors which influence our judgement?
Think about how hard you mentally push yourself on a day to day. It isn't a stretch to think that cognition can be trained stronger.
Do you ever wonder what an Eagle's vision would be like if they lived inside of a 10x15 room their whole life?
> At birth, the number of synapses per neuron is 2,500, but by age two or three, it’s about 15,000 per neuron. The brain eliminates connections that are seldom or never used, which is a normal part of brain development. [0]
How much do you think the average kid, or their adult counterpart, would need to use their brain for sitting in front of the tv or working a menial job / boring school?
I believe we set our own cognitive limits and I hope that we can learn to reset them.
> Can we offload rationality to our subconscious without allowing ourselves to succumb to heuristics?
An almost textbook definition of cognitive bias right there.
> How can we avoid perceptual errors which influence our judgement?
You have to work incredibly hard to achieve that. Perceptual and bias errors are the most common and difficult to find in ourselves.
My personal take is to become really good at two things: understanding when and how it is appropriate to trust in someone else's domain authority/expertise, and whatever it is you'd like to be an expert on.
Don't expect this behavior out of your fellow citizens, or even yourself, most of the time. It requires battling with some pretty strong evolutionary drives.
Frankly, cultivating humility is probably easier (which is saying something, as that's pretty difficult for most humans as well).
I'm a college freshman now and I wish my middle and elementary school teachers were as devoted to their students as you seem to be. I would suggest going to ResearchGate.com where a lot of scientists and grad students hang out. It's likely you'll be able to find one there who would be interested in helping them out.