Linear Programming: How Does It Work?

ColinWright · on June 6, 2016

Here is a brief summary that gives some context both for the article, and for the discussion here.

In linear programming the solution space is a convex[0] polytope[1] - the optimum is always at a vertex.[2]

The Simplex Method proceeds by:

* Finding a vertex

* While there exists a neighbour who is better, move to it.

Done.

The usual implementation doesn't even bother trying to find the best neighbour, just anyone will do. The math tells us that you never get into a dead end. In other words, if you are not at the optimum, at least one of your neighbours is guaranteed to be better, and the pivot step of the Simplex Method will find one.

You can refine the technique slightly by checking several neighbours and choosing the one that gives the greatest improvement, but in practice it doesn't save a lot of work. There is no search, no back-tracking, and no queue of points to consider. It's not Dijkstra's algorithm in disguise - it's not any kind of search.

In pathological cases it can take exponential time, but there are two ways to avoid that. One is to add a little noise, the other is to choose, from among those neighbours that are better, one at random. Other comments in this discussion give some references that go into the detail of ellipsoidal and interior point methods.

Simplex method for the win - it's simple to implement, robust, efficient in practice, and applicable to a wide range of problems.

[0] https://en.wikipedia.org/wiki/Convex

[1] https://en.wikipedia.org/wiki/Polytope

[2] This depends crucially on the fact that this is "linear" programming. As you sweep a hyper-plane across a polytope, the last point of contact will contain at least one vertex.

[3] If you've read this far and found it useful, but want more, contact me, because I've been thinking of writing this up properly for some time.

merraksh · on June 6, 2016

the optimum is always at a vertex

As you point out in [2], it would be more correct to say that among the optimal solutions there is always one at a vertex of the polyhedron</nitpicking>

Also, simplex methods are very versatile when combined with other methods (see branch-and-bound for mixed integer linear optimization), but are sometimes easily beaten by barrier methods.

Excellent textbooks for LP: Chvátal [4] and Bertsimas and Tsitsiklis [5]. For mixed integer linear optimization: Wolsey [6], Nemhauser and Wolsey [7], and Conforti et al. [8].

[4] V. Chvátal, Linear Programming. MacMillan 1983.

[5] D. Bertsimas, J.N. Tsitsiklis. Introduction to linear optimization. Vol. 6. Belmont, MA: Athena Scientific, 1997.

[6] L.A. Wolsey (1998). Integer programming (Vol. 42). New York: Wiley.

[7] L.A. Wolsey, G.L. Nemhauser. Integer and combinatorial optimization. John Wiley & Sons; 2014 Aug 28.

[8] M. Conforti, G. Cornuéjols, G. Zambelli (2014). Integer programming (Vol. 271). Berlin: Springer.

nhaehnle · on June 6, 2016

Actually, choosing an improving neighbor at random (the random edge pivot rule) also has expected exponential worst-case behavior, see Friedman, Hansen, Zwick, Subexponential lower bounds for randomized pivoting rules for solving linear programs.

More generally, it is not known whether a pivot rule exists whose (expected) worse case running time is polynomial.

For the shadow vertex rule, it has been shown that the smoothed complexity (i.e. after perturbation of the input) is polynomial. However, the shadow vertex rule is totally impractical.

mathgenius · on June 6, 2016

> It's not Dijkstra's algorithm in disguise - it's not any kind of search.

It's a kind of superficial statement to make: "this is not that". Obviously these things are different on the surface, but probing a bit deeper can sometimes bring similarities out. This is more interesting to me, although perhaps I am just getting old and too many neurons are interconnected and this is what makes me seek the similarities between things, instead of just saying "this is not that".

As has been noted by arghbleargh [0], the simplex method can be used to implement Dijkstra. How sure are you that the converse is not also true? Will you at least admit this is an interesting question? Or even the question of how to ask this question in such a way that it is provable/disprovable. I think this is worthwile.

Continuing to probe: Dijkstra (and dynamic programming, etc.) hinges crucially on the idea of a semiring, or generalized distributivity [1], [2]. In the case of a minimum weight path this boils down to noticing that a+max(b, c) = max(a+b, a+c). The plus distributes over the max (think of the usual distributivity: a(b+c)=ab+a*c).

Convexity also has a kind of distributivity, which can be most easily seen by taking averages: a+(b+c)/2 = (a+b)/2 + (a+c)/2. So addition distributes additively over "averages". And this formula extends to arbitrary convex combinations. My spider sense is telling me that something interesting is going on here, and there may be some more ways to link linear programming up with dynamic programming.

[0] https://news.ycombinator.com/item?id=11845375

[1] Aji, Srinivas M., and Robert J. McEliece. "The generalized distributive law." Information Theory, IEEE Transactions on 46, no. 2 (2000): 325-343.

[2] Gondran, Michel, and Michel Minoux. Graphs, dioids and semirings: new models and algorithms. Vol. 41. Springer Science & Business Media, 2008.

DannyBee · on June 6, 2016

"How sure are you that the converse is not also true? Will you at least admit this is an interesting question? Or even the question of how to ask this question in such a way that it is provable/disprovable. I think this is worthwile."

It's an interesting question, but one which has a sad answer: integer linear programming (and other restricted forms, such as 0-1 programming) are NP-Hard.

If you could use dijkstra's algorithm to solve them, in less than exponential time, you will have proved P=NP.

So i would go with "pretty sure".

(now, could you do it in exponential number of dijkstra invocations? Maybe? But that is, to me, not an interesting question :P Same with could you expand the linear programming problem into an exponential size graph that was then solvable by dijkstra's algorithm - we already know that expspace is a superset of p, etc)

ColinWright · on June 6, 2016

  >> It's not Dijkstra's algorithm in disguise
  >> - it's not any kind of search.

  > It's a kind of superficial statement to make:
  > "this is not that".  Obviously these things
  > are different on the surface, but probing a
  > bit deeper can sometimes bring similarities
  > out.

Agreed, although I was trying to answer the question as asked, and not start writing an entire essay on how instances of one type can be encoded as instances of another. MineSweeper is NP-Complete, so we can encode Hamilton Cycle (for example) in it. MineSweeper is not an instance of Hamilton Cycle, they can each be encoded in the other, this is deeply interesting, but it doesn't help someone trying to get to grips with what the Simplex Method "is". Coming to understand what it "is" is the gateway to seeing how it might, or might not, then be used to encode, and solve, other problems.

  > This is more interesting to me, although
  > perhaps I am just getting old and too many
  > neurons are interconnected and this is what
  > makes me seek the similarities between things,
  > instead of just saying "this is not that".

I'm 55, and finding deep connections is (part of) what I do. Understanding that some things are genuinely different is, to me, an important place to start.

  > ... the simplex method can be used to
  > implement Dijkstra.  How sure are you
  > that the converse is not also true?

I'm pretty sure Dijkstra can be used to solve instances of Linear Programming, although the encoding is likely to be exponential. It might then be simplified by using the specific characteristics of linearity, etc, but that's not the same as saying that it is an instance of LP.

  > Will you at least admit this is an
  > interesting question?

Absolutely - I'm not even forced to "admit" this sort of thing. It's what I spend much of my time trying to convince other people about.

But Dijkstra is not the Simplex Method in disguise, not vice versa. Perfectly happy to say that each could (perhaps) be encoded in the other, and that doing so might turn out to be interesting. But just as finding a Hamilton Cycle is not the same as 3-Colouring a Graph, they are not the same.

  > My spider sense is telling me that
  > something interesting is going on
  > here, and there may be some more
  > ways to link linear programming up
  > with dynamic programming.

I'd love to see more detailed speculations and investigations.

In summary, I suspect we are in agreement, and that you're objecting to something I didn't actually say. I don't deny that there might be, or even probably are, deep connections. But for someone trying to understand what the Simplex Method is and does, I don't think that's particularly helpful. Maybe I'm wrong - I'm pleased to wrote what you did.

swanson · on June 6, 2016

This is a really cool technique to have in your toolbelt. When you are able to model a problem in this way, you can get something that also seems magical. I've found these kind of constraint solvers to be a nice middle ground between naive algorithms and full blown machine learning.

A couple examples:

* Building a scheduling system for work-shifts -- add constraints for number of hours per week, double shifts, minimum roles (manager/supervision) then use wages as the cost function to minimize

* Exercise recommendations -- add constraints for calories burned, time, etc and then minimize a heuristic for "difficulty"

* Fantasy sports optimizer -- this is very niche but something I built that optimizes a fantasy team projected points based on position limits and player cost (for FanDuel)

The trouble I had was that almost all of the material and tools are very academic focused. It was difficult for me to grok and the libraries were not well documented.

The best I have found is Google's or-tools. The documentation is still lacking, but I was able to understand the basics from their examples. The best part is that there are bindings for a handful of languages so you aren't necessarily forced to also learn some thing like R or C++ to work with it.

Start here: https://developers.google.com/optimization/lp/glop#output

and be sure to read through the Stigler Diet example to get a taste for how to transform real-world problems into solvable linear systems.

mathgenius · on June 6, 2016

Wait, so this is straight up gradient ascent on the vertices of the feasable set? I always thought the simplex method was more complicated than that. I guess you do need to maintain some kind of queue of current-best vertices.

If we squint a bit do we get Dijkstra's algorithm?

catnaroek · on June 6, 2016

> Wait, so this is straight up gradient ascent on the vertices of the feasable set?

The simplex method literally crawls the edges of the feasible region, which is always a convex polytope. Sadly, the simplex method runs in exponential time in the number of variables. A more efficient alternative for large LP problems is interior point methods: https://en.wikipedia.org/wiki/Interior_point_method . As the name suggests, interior point methods traverse the interior of the feasible region.

To see why interior point methods are more efficient, suppose your feasible region is a convex n-gon in XY plane, for some very large n. Your initial point is the lowest vertex (least Y coordinate), and the objective function to maximize is Y, so you want to reach the highest vertex. The simplex method will literally crawl the vertices one by one until you reach the top. Interior point methods will take a more straightforward path through the interior of the polygon.

> If we squint a bit do we get Dijkstra's algorithm?

No. Dijkstra's algorithm operates on fundamentally discrete structures (finite graphs), whereas the simplex method solves linear programming problems with continuous variables (conceptually ranging over the reals, although of course on a computer one uses floating-points or some other approximate representation).

Integer linear programming, where all or some of the variables range over the integers, is also possible, but it requires extending some base LP-solving algorithm with a means to “discretize” the feasible region, such as branch and bound methods: https://en.wikipedia.org/wiki/Branch_and_bound .

repsilat · on June 6, 2016

>> If we squint a bit do we get Dijkstra's algorithm?

> No. Dijkstra's algorithm operates on fundamentally discrete structures (finite graphs), whereas the simplex method solves linear programming problems with continuous variables (conceptually ranging over the reals, although of course on a computer one uses floating-points or some other approximate representation).

I think you've misunderestood the grandparent. Try to think of the vertices of the simplex as vertices in a graph, and edges on the simplex as edges on the graph. The (primal) simplex method chooses an entering variable by saying, "which edge from this vertex improves the objective most quickly?"

The reason the graph search doesn't turn into Dijkstra's algorithm is that we would never have any reason to backtrack. The next basis is always at least as good as the last, so we don't "get stuck" and we don't ask whether we would rather have gone another way earlier on.

EDIT:

> To see why interior point methods are more efficient...

I know they exist, but I've never seen a problem for which interior point methods are actually faster in practice. My experience isn't too deep with very large LPs, mind -- mine tend to have tens or hundreds of thousands of variables, not millions.

mathgenius · on June 6, 2016

> The reason the graph search doesn't turn into Dijkstra's algorithm is that we would never have any reason to backtrack. The next basis is always at least as good as the last, so we don't "get stuck" and we don't ask whether we would rather have gone another way earlier on.

I wonder if there is a sharper way of putting this.

Dijkstra's algorithm doesn't backtrack either, it is always merely propagating a "wave-front" of current best vertices (and perhaps how we got there). And the simplex method will also do this when presented with a connected component of vertices all of which have the same objective value.

I feel there are some deeper connections to "dynamic programming", of which Dijkstra's algorithm is a good example. Having trouble putting my finger on it.

One obvious difference is that the Simplex method doesn't know the target vertex, whereas we do know this in Dijkstra. So perhaps the (possibly non-existent) analogy I am searching for links vertices of the Simplex method with paths in Dijkstra.

arghbleargh · on June 6, 2016

There is a linear programming formulation of the shortest path problem: see https://en.m.wikipedia.org/wiki/Shortest_path_problem#Linear.... Indeed the vertices of the simplex method would correspond to paths, but I think running the simplex method on the dual formulation is more akin to Dijkstra's algorithm. Actually, I think the simplex method on the dual and Dijkstra's are equivalent, although I did not work through the details carefully.

mathgenius · on June 6, 2016

Aha, yes, this is what I was looking for. Thankyou!

Reminds me also of using linear programming to do belief propagation and decode sparse codes.

repsilat · on June 6, 2016

> best vertices (and perhaps how we got there). And the simplex method will also do this when presented with a connected component of vertices all of which have the same objective value.

It has been a while since I did implemented the simplex method, but I don't think this is accurate. There is only ever one basis stored. You don't need a collection of them to fix stalling/cycling.

> Dijkstra's algorithm doesn't backtrack either

I suppose this was a poor choice of language. The algorithm doesn't backtrack, but the whole point of the wavefront is to allow you to be noncommittal about which way you're going -- you explore many paths at once, something that would be very expensive to do on a simplex (where the big cost is the number of edges explored, not the number of edges on the shortest path.)

Edit: I guess you could see the simplex method as Astar with the heap key being just the heuristic (no "cost to this point" term.) Because of the structure of the graph, there's no need to keep the heap at all.

Animats · on June 6, 2016

Worst case for the simplex algorithm is exponential. Average case is polynomial. It's possible to get out of the pathological exponential case by adding some noise.

wolfgke · on June 6, 2016

> To see why interior point methods are more efficient, suppose your feasible region is a convex n-gon in XY plane, for some very large n.

The convex n-gon is a really bad counterexample since the encoding length for inequalities describing it is Omega(n) - thus the required number of simplex iterations is linear in the input length.

Better consider the Goldfarb cube (Nina Amenta and Günter M. Ziegler: Deformed products and maximal shadows of polytopes, Contemporary Math. 233 (1999), 57-90.).

> No. Dijkstra's algorithm operates on fundamentally discrete structures (finite graphs), whereas the simplex method solves linear programming problems with continuous variables (conceptually ranging over the reals, although of course on a computer one uses floating-points or some other approximate representation).

It is equivalent to optimize over a finite set of points in Z^m or over its convex hull. The convex hull of the integer points in a rational polyhedron is a rational polyhedron again. So we just have to know its facet-defining inequalities. Sometimes these are known - for example for the problem of finding the shortest path with nonnegative edge weights (here the Total Unimodular (TU) property holds). Any step of the Dijkstra algorithms corresponds to some specific simplex iteration (or dual simplex iteration? - at least one of them). Yes, as an implementation it is faster to implement it on discrete data structures, but fundamentally it is a simplex algorithm in strong disguise.

Source: I currently do my PhD in discrete optimization.

brianchu · on June 6, 2016

Theoretically.

In practice, simplex is competitive with and often beats, interior point methods.

catnaroek · on June 6, 2016

Depends on the size of your problem. If you have tens of thousands of variables, you definitely don't want to risk dealing with the simplex method's worst-case behavior.

luu · on June 6, 2016

> If you have tens of thousands of variables, you definitely don't want to risk dealing with the simplex method's worst-case behavior.

The risk is extremely low. In fact, the risk is so low that, for deacdes, it was an open question why the simplex algorithm was so good in practice on a wide variety of workloads despite having poor worst-case complexity (unlike, for example, pure quicksort, which is often problematic in practice).

Spielman solved that problem in 2001 and showed that even very small perturbations from a contrived worst-case input have good performance: http://arxiv.org/pdf/cs/0111050.pdf

Pamar · on June 6, 2016

It has been decades but I think that no, you don't need a list of best-current vertices. The solutions space is convex, so you move from one solution to a better one (or are on a edge where all solutions are identical) so the moment you get to Solution N which is < of Solution N-1 you know you have just moved over the maximum and N-1 is what you were looking for.

Basically you visit all vertex in sequence moving along the boundary of the solution space. As soon the result start decreasing you are done.

mathgenius · on June 6, 2016

You need a queue of current-best vertices when the objective function is equal best on all those vertices. Obviously you don't need to search less optimal vertices.

adrianN · on June 6, 2016

Either you have a neighboring vertex that is strictly better than your current solution, or you have found an optimal solution. I don't think you need to store vertices with equal value, because you never backtrack.

mathgenius · on June 6, 2016

Ok, I think I'm getting it. This line from the OP confused me:

'But there’s an obvious risk of winding up in a “cycle” where you get back to a vertex that you already hit earlier–in which case the program will never terminate.'

sevensor · on June 6, 2016

You only get a cycle if multiple vertices have the same objective function value. There's a similar problem of degeneracy, where you pivot back and forth between binding constraints on the same vertex and never go anywhere.

bernardom · on June 6, 2016

And a way to avoid cycles in simplex is known: https://en.wikipedia.org/wiki/Bland%27s_rule

argonaut · on June 6, 2016

You do not need to store vertices with equal value. Simplex is basically a greedy iterative algorithm - you keep moving forwards.

ufo · on June 6, 2016

> If we squint a bit do we get Dijkstra's algorithm?

Turns out that its actually the other way around! If you squint a bit you can solve the shortest-path problem with linear programming

https://en.wikipedia.org/wiki/Shortest_path_problem#Linear_p...

ArkyBeagle · on June 6, 2016

Dunno from gradient ascent. Here's an example Simplex tableau:

https://www.utdallas.edu/~scniu/OPRE-6201/documents/LP06-Sim...

I'm not sure how the Djikstra algorithm and Simplex tableaux marry.

mathgenius · on June 6, 2016

Yeah every time I go to learn about the simplex method I end up staring at these monster tables and think "woah this is complicated". But I think maybe that's just not the way to learn about this stuff, and is effectively implementation details.

ArkyBeagle · on June 6, 2016

It's automatable. But the old saw is that anything you need Simplex to show is probably pretty obvious anyway.

andars · on June 6, 2016

Fun fact: recent notable rocket landings seem to use convex optimization to solve the soft landing problem, AFAICT.

See here for more info: http://web.mit.edu/larsb/www/iee_tcst13.pdf

> This convexification enables the planetary soft landing problem to be solved optimally by using convex optimization, and is hence amenable to onboard implementation with a priori bounds on the computational complexity and realtime execution time.

onurcel · on June 6, 2016

For further reading there is an excellent resource here http://web.mit.edu/15.053/www/ with a leitmotive example and both theoretical and practical approches. It also explains in detail the theory behind the real world implementation of simplex method (aka revised simplex) with inversed matrix factorization methods, etc. This is a quite old book but as far as I know even modern linear solvers use those techniques (at least open-source ones for sure).

This is a very specific and little subset of convex optimization and yet very powerful (and exciting).

Convex optimization methods can also be used in many non-convex situations, for example when you have quasi-convex or log-convex functions as objective or constraints. I recommend Boyd's excellent course https://lagunita.stanford.edu/courses/Engineering/CVX101/Win... and the online free book https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf (warning : very addictive)

oselhn · on June 6, 2016

When I hear linear programming I always remember this method https://en.wikipedia.org/wiki/Ellipsoid_method. It is nice example of algorithm which is not useful for programming only for mathematicians (I believe it was used as proof that linear programming is in P). I think that most of the programmers will have really hard time to implement it and also it is on average slower than simplex method despite a fact that simplex is not in P and this algorithm is.

mschulze · on June 6, 2016

It's been a while since I studied mathematics (and I wrote my thesis in optimization), but at the time it was not clear if simplex is in P or not. I think no one has found a selection function for variables that is always polynomial.

oselhn · on June 6, 2016

See this https://en.wikipedia.org/wiki/Simplex_algorithm#Efficiency. But simplex is still widely used because real problems are usually solved in polynomial time. You have to design your problem to not finish in polynomial time.

But I am not a matematician this opinion it just how I understand it.

Pamar · on June 6, 2016

I vaguely recall that Soviet planners used something based on n-dimensional ellipsoids (instead of Simplex) to optimize resources allocations of Aereoflot flights but I cannot remember anything more specific, alas.

(I read this on the magazine of my country's equivalent of ACM in '80s and it somehow stuck with me all these years)

signa11 · on June 6, 2016

Anyone knows of a good introduction to karmarkar's algorithm ? Which finds optimal solutions waay faster than simplex...

sevensor · on June 6, 2016

This is supposed to be true only for very large linear programs. The simplex method is the fastest thing in town for most LPs. Even though it's exponential time where Karmarkar's algorithm is polynomial.

graycat · on June 6, 2016

How does it work?

For the set of real numbers R and some positive integer n, the problem starts with a set F that is the intersection of finitely many closed half spaces in R^n. Each such half space is from a constraint that is a linear inequality with relation either >= or <=.

Then the set F is a closed, convex subset of R^n. Set F is called the feasible region.

In addition, are given a linear function z: R^n --> R, the objective function. Then the problem is to find a point x in F that makes z(x) as large as possible. Such point x in R^n is said to be optimal. Even if optimal point x exists, it may not be unique. Indeed, if there is more than one optimal solution, then there are infinitely many, and the set of all optimal solutions forms a closed, convex subset of feasible region F.

If set F is empty, then the problem is infeasible and there is no optimal solution. Else set F is non-empty and the problem is feasible. Any point in set F is said to be a feasible point.

If the problem is feasible, then it may be that function z is unbounded above on F. Then the problem is unbounded and there is no optimal solution. If the problem is feasible and not unbounded, then it is feasible and bounded.

It is a theorem, not trivial to prove, that if the problem feasible and bounded, then it has an optimal solution.

Given points x, y in R^n and real number t in the interval [0,1], the point

w = tx + (1-t)y

is a convex combination of the points x and y. Easily the set of all convex combinations of points x and y is closed and convex.

Suppose C is a convex subset of R^n, x is in C, and x is not a convex combination of two other points in C. Then point x is an extreme point of convex set C.

It is a theorem that if the linear program has an optimal solution, then it has at least one optimal solution that is an extreme point of the feasible region F. So, in looking for optimal solutions, it is sufficient to look only at extreme points of the feasible region F.

It is a theorem that the feasible region F has only finitely many extreme points.

The classic algorithm for finding optimal solutions is the Dantzig simplex algorithm.

Intuitively and geometrically the algorithm starts at an extreme point of the feasible region and moves, one iteration at a time, to adjacent extreme points (and adjusts what it regards as the basic variables) until it satisfies a sufficient condition for optimality. With appropriate refinement of the simplex algorithm, it is always able to achieve this sufficient condition.

Algebraically, the simplex algorithm is a small but clever modification of Gauss elimination for solution of systems of linear equations. Each iteration of the algorithm corresponds to some elementary row operations on the system of linear equations.

There are more technical details, but the above is a good outline and overview of the core math.

Some of the consequences of the linear programming and some of the properties of the simplex algorithm yield a nice collection of inequalities, theorems of the alternative, the saddle point theorem and optimal strategies of two person game theory, etc. E.g., in the game paper, scissors, and rock, play each of the tree moves with probability 1/3rd and independent of all the past. Then in the long run will break even.

Is the simplex algorithm a polynomial algorithm? Apparently not: There is a classic example problem of Klee and Minty that shows that at least one version of the algorithm is exponential. In practice, however, the simplex algorithm runs in time linear in the number of half spaces. There are more details in some now classic work of K. Borgward.

Is there a polynomial algorithm for linear programming? Yes.

In practice, the simplex algorithm has many refinements and extensions.

If we ask that the components of an optimal solution x be integers, then the problem is integer linear programming and is in NP-complete. One solution approach is via branch and bound where we solve an ordinary linear program at each node of a large tree. But that approach is not a bad as it might sound since usually one more solution is just a little work from what was left over from the last solution -- the simplex algorithm has lots of such modifications.

Another problem is least cost flows on a directed graph (network) with on each arc a maximum flow and a cost per unit of flow. There the simplex algorithm takes on a special form that commonly permits astoundingly fast solutions of astoundingly large problems. Moreover, if the arc capacities are all integers, then, starting with an integer feasible solution, the simplex algorithm maintains integer solutions and will find an optimal integer solution -- we get integer programming for no extra effort.

We can also use linear programming as a tool in solving convex problems, nonlinear problems, make use of Lagrange multipliers, achieve the Kuhn-Tucker necessary conditions, etc.

Some of the promise of linear programming is from how common are linear expressions, especially in business, accounting, budgeting, business planning, etc.

Dantzig worked out his simplex algorithm in the late 1940s at Rand Corporation working on the logistic problem of the USAF of how best to deploy a military force rapidly to a distant location.

An early commercial success was feed mixing for livestock: Given a list of feeds and the nutritional content and prices of each and given what total nutritional content want for the livestock, find how much of each feed to buy to feed the livestock at least cost. There are rumors that Ralston Purina runs linear programs daily when mixing feeds.

Another problem of early high interest was how to ship from some factories to some warehouses to get each warehouse what they needed, take from each factory only what they had, and minimize total shipping cost. Well, now we know that this is a special case of the least cost flow problem that we can do so well solving. An early statement of this problem, the transhipment problem, resulted in a Nobel prize in economics.

Another early success was operating an oil refinery: So, given crude oil supplies, which differ in chemistry and price, and refinery products, which differ in price, find what crude oil supplies to use and what products to make to maximize profit. Apparently eventually this problem was refined to make use of nonlinear programming, e.g., by C. Floudas.

One use of linear programming is for scheduling. E.g., consider the original FedEx: Given 90 US cities with loads to be picked up and loads in Memphis to be delivered. Given 33 airplanes, say, all Dassault DA-20 Fanjet Falcons. Now, way which airplanes go to which cities in what order to move all the loads, obey all safety and engineering constraints, arrive within specified time windows at each of the airports, and minimize direct operating costs.

So, it looks like a huge, non-linear, integer problem. But, can attack the problem in two steps where the first is just some enumeration likely doable with some nonlinear arithmetic and the second is just some 0-1 integer linear programming which likely has an okay chance of good results.

So, the first step is just to get a list of what a single airplane might do, that is, what cities it might visit and in what order. So, just write the code: Try all the reasonable sets of cities in reasonable order (e.g., don't waste time on evaluating absurd cases like having one plane serve NYC, Seattle, Miami, Chicago, and LA, in that order). Then for each reasonable candidate for what one plane might do, do the nonlinear arithmetic to find the costs. For a given set of cities served, keep only the one order of the cities that minimizes the costs. The result will be a list of some 100,000 or so cases for what one plane might do, the cost of the case, and the cities served (right, cheat a little -- argue that if an plane stops at a city, then it delivers everything the city gets and picks up everything the city has to ship, etc.). So, with 90 cities and 100,000 cases set up a linear program with 100,000 variables and 90 constraints. That is, for each of the 90 cities, want some one of the 100,000 cases to serve that city. So, each of the 100,000 columns has just 0s and 1s, a 0 if that case does not serve that city and a 1 if it does.

This is an old approach and is called 0-1 integer linear programming set covering. Yes, likely it is in NP-complete. But there is a also a good chance that save 5, 10, maybe 15% of direct operating costs over other approaches.

Can do a lot of scheduling problems with such approaches.

For linear programming software, can consider, say, R. Bixby and his work Gurobi.

At times I have had good success with the old IBM Optimization Subroutine Library (OSL). E.g., once with some Lagrangian relaxation, I got a feasible solution within 0.025% of optimality to a 0-1 integer linear programming problem with 40,000 constraints and 600,000 variables in 905 seconds on a 90 MHz PC!

blahi · on June 6, 2016

The OP and the discussion is sround technical implementation, but I am interesred in the analysis part of LP.

I wanted to use LP to optimize a budget but my senior coleague said that LP is just too damn hard in the real world. Is this true? Is there a textbook that goes over practical applications of LP?

merraksh · on June 6, 2016

It could be that the current method (i.e. human solves it by making a decision for each item) could be proved to give an optimal solution. However, if that is not the case, the hardest part will be to get the right model.

There are very competitive open source solvers for LP with 10ˆ5 or even 10ˆ6 variables/constraints (https://projects.coin-or.org/Clp), so the true limit here is formulating the model correctly. There are various optimization modeling languages out there, also in the open-source arena.

As for getting the model right, some trial-and-error approach should work: 1) if the solution violates a constraint you hadn't thought of, just add it to the formulation and re-solve; 2) if the solution is too conservative, then there might be a constraint that's too restrictive (you can check which constraints are active, i.e. restricting, at the optimal solution you found), so just eliminate it or relax it.

ColinWright · on June 6, 2016

I am surprised that your senior colleague says it's "too damn hard." Are they referring to the conversion of your data into the right format, or are they referring to getting an implementation running? The latter is fairly straight-forward, the former depends on how clean and convertible your existing data are.

If your data are currently messy, you need to clean it up anyway. There will be political issues with doing work your senior colleague says is not possible, but if you can solve - or avoid - the political problems, learning how to do this, and then doing it, will be good for your personal development.

Warning: I suspect the political problems will actually be the most difficult part.

blahi · on June 6, 2016

The data is squeaky clean, accounting-like data. Getting it to right format is a few lines of R. What he says is too difficult is putting in place all the right constraints. He says that it is very easy to miss a necessary constraint and this will obviously blow up your solution. How can I examine the solution to make sure it would be valid? The data consists of tens/hundreds of thousands of line items which compete for the same budget and need to meet some (fairly simple) metrics. Line items have a hidden dependency between them, so presumably the solver can scrap some items which will in turn make others (possibly the ones with the best metrics) tank too.

garethrees · on June 6, 2016

The concern about missing necessary constraints seems reasonable, but then how does the current budgeting process avoid the same concern? Perhaps it does so by changing in small steps, so that each period's budget is similar enough to the previous period's budget (which was known to have worked reasonably well) that the differences can be analyzed and the consequences understood.

So something you might try, is to locate the current budget in your linear system and then find a nearby point that improves the objective function. It should then be simple to explain the proposed improvement ("move $A from B to C and $D from E to F and the objective function goes up by $G") and not too complex to examine it for missing constraints ("hang on, you can't reduce B by $A because the supplier has a minimum order").

The idea is that instead of using linear optimization to replace the current budgeting process (which is risky), to use it to generate proposals for submission to the current budgeting process, and to capture the constraints that are revealed when proposals are rejected.

(But note also that it's quite likely that a real objective function will be nonlinear due to economies of scale, in which case linear optimization methods won't suffice.)

blahi · on June 6, 2016

Things like minimum orders are obvious enough that I will manage to figure them out.

My colleague's concern (and I see his point) is about what happens when the solver says "stop selling coffee machines", but when we stop, consumables drop off a cliff and consumables is where we make the profit.

ColinWright · on June 6, 2016

So there's a missing constraint that M >= S/1000 or similar, where M=machines and S=supplies. And yes, that is an issue that needs to be considered. Over 10s or 100s of thousands of items that becomes difficult, but it's still the case that the system you describe now is not really under control.

But perhaps your senior colleague is right, and that's just the way it has to be. However, one person's time spent investigating this properly might be money well spent.

blahi · on June 6, 2016

Yeah, but that's the problem. There is no way to know all the relationships. This was an obvious one, there are a lot more subtle ones. In any case, the comments have been very helpful. A gradual implementation is the only sane way to go. Politics are not too big of an issue, if I can make a concise case and what's a better way to do it than a small project.

ColinWright · on June 6, 2016

Documenting these relationships would potentially be a really, really valuable thing to do "going forward". It helps to make explicit things that are currently implicit, and possibly unknown. It then opens up the possibilities of making optimisations like using Linear Programming, but at the very least it reduces risk and helps to share knowledge.

Good luck!

ColinWright · on June 6, 2016

Now I'm less surprised. If you don't know all the constraints, explicitly and up front as part of your data, then it will be difficult.

How is it being done now? How close to optimal is the current version/solution?

The constraints form part of your data, and it sounds like they are not squeaky clean. With all computer optimisation, you need to make your data clean, clear, and complete. This can sometimes be done by simply trying it and seeing what the obvious flaws are in the proposed solutions, sometimes that's infeasible.

blahi · on June 6, 2016

People examine data manually in reports currently. If an item turns a profit, leave it, no yield optimization. If it's in the red, stop it. If it's around break-even, schedule it for examination to see if you can turn it green. No idea how close to optimal it is. I'd imagine not at all.

ColinWright · on June 6, 2016

OK, I'm looking in from the outside, and I know all too well how these sorts of situations arise. However, it sounds to me a lot like the managers have largely lost control of the situation, and they are limping along doing the best they can with a system that has evolved from something system into a monster.

That happens. The challenge is how to make it better.

Thinking of it like software, there are no unit tests, and touching anything runs the risk of breaking everything. The usual way to proceed is to find a small section that you can write tests for, and then gradually transform into something you understand. The equivalent here might be to find a small collection of items that are largely independent of the rest of the product line, and see if you can bring that under control.

However, I don't know your context, and I do appreciate that wholesale changes might well be risky. The question is, can you even make a start on a small section? If not, all is lost and the system will limp along as is. But if progress is desirable, maybe there is a way to start.

Given that your senior colleague has actually spoken with you about this, maybe you can open that conversation, proposing it as a gradual way to make the system more efficient, profitable, and to reduce the existing risks.

Again, on the political side, make sure they don't feel threatened, bring them on board as a partner, make it clear that you want to learn from them, and couch it all in terms of reducing risk to the company in return for a small outlay in time and effort. See it as an opportunity for you to learn, them to gain credit, and the company to become stronger.

These are my thoughts - you must adapt them to your context.

Pamar · on June 6, 2016

Well, you will have to actually write code to create the constraints starting from the line items. This may optionally require a DSL to represents the line items and their relationships.

One question: assuming you manage to write an LP-backed solution, this will replace what? Manual inspection of the budget? Nothing at all? Some other kind of cross-reporting?

blahi · on June 6, 2016

A ton of reporting and manual inspection. I imagine it would increase the ROI significantly too. Currently if it's green, it's good. I think there are spots which would be better for us if more budget is allocated there.

DSLs are WAY over my head though.

Pamar · on June 6, 2016

What if you started by "zooming out" so that the variables (and constraints) are limited and manageable in a semi-manual way (i.e. you can put together a high-level model with http://scpsolver.org/) and try to understand if there is anything that stands out and then try to decompose that?

(Note: this is not how LP is supposed to work - you do not "optimize" the whole by iteratively optimize subsets, but it could give you a better understanding of the process, and maybe also get some low-hanging fruit to motivate you and your colleagues to go forward).

arethuza · on June 6, 2016

I remember watching an Open University maths program on the practical application of linear programming when I was a high school - it was for blending food ingredients to meet the required nutritional requirements.

That was probably the late 70s!

Pamar · on June 6, 2016

I am not sure what your colleague means with "too damn hard" - I used it once in production code to optimize distribution of lots on an assembly line, for example. That was a COBOL program dynamically preparing the equations and then calling a C routine (copied straight out of the "Numerical Recipes in C") and that was all.

Of course you have to get a bit of practice about how to represent your problem in terms of equation and objective function, and yes, there are cases where it can become quite complicated and/or become intractable due to variable explosion... but it is not necessarily so.

(I cannot suggest a book, sorry, but if you want to contact me in email we can go over your problem in general terms and see if I can provide some general advice, maybe?)

thanatropism · on June 6, 2016

This is pretty good, although not business-oriented necessarily.

http://www.springer.com/us/book/9783540306979