Would be interested to see this compared to IPOPT. https://coin-or.github.io/Ipo...

ChrisRackauckas · on March 26, 2024

Definitely an interesting response I didn't see coming. This is one of the reasons to share preprints!

IPOPT is a nonlinear optimization tool, not a nonlinear solver. Generally from what I've seen it's not a good idea to solve nonlinear systems with a nonlinear optimizer, but you can phrase it as a "constraint satisfaction problem", i.e. a nonlinear optimization with a trivial `maximize 0` loss function but with equality constraints to be satisfied `f(u) = 0` and allow it to do that. IIRC the sparse symmetric linear solver is only used in the optimization process as it's that part where it can guarantee a sparse symmetric linear system, the constraints themselves `f` would have to be symmetric to reuse that in the Newton method which isn't true for any of the benchmarks. As such, I don't think you'd get most of IPOPT's advantages even showing up in a constraint satisfaction problem at all. Don't get me wrong, IPOPT is an amazing software for nonlinear optimization, but when not doing optimization it would lose a lot of what makes it amazing.

However, perfectly agreed that I cannot lay this to rest right now because I don't have a benchmark to point to. The best thing to do here would be to add it to the benchmarks and fully describe why in the paper. We'll definitely follow up with this in a revision.

In the meantime, if you have more requests for the benchmarks, please feel free to open issues at https://github.com/SciML/SciMLBenchmarks.jl so we can track them. All of our benchmarks run on this open platform and anyone can add things via a PR. In particular, the 23 benchmarks diagram in the paper for example is simply just the result of this script https://github.com/SciML/SciMLBenchmarks.jl/blob/master/benc... which runs automatically on any PR (for new folks it requires we click yes for security reasons though). So please feel free to send any benchmark requests. We do plan to do a lot more comprehensive nonlinear optimization benchmarks in the near future. For this case with IPOPT though, we're happy to add that in ourselves hopefully in the next few weeks.

wenc · on March 26, 2024

Yes tribal knowledge says NLP solvers aren’t tailored to NLE solvers as NLE solvers are. There have been papers in this past showing this but in my experience, it’s not obvious in practice. Nonlinear systems can often surprise us. The initial point (and randomness) has more influence on the solution time than almost anything else.

I forget the exact internal formulation of the problem in IPOPT but for an NLE it’s likely the efficiency of the linear algebra (MA57) algorithm will dominate since it is as you say, constraint satisfaction.

guyomes · on March 26, 2024

> The initial point (and randomness) has more influence on the solution time than almost anything else.

On that remark, a theoretical tool to choose the initial point is the Newton Polygon. For univariate polynomials, this leads to some of the fastest solver [1]. The idea to choose the initial point in this case is notably described in Section 6 of this paper [2].

A very rough intuition of the idea is that a polynomial p(z) doesn't vanish if one of its monomials is significantly larger than all the other. Reciprocally, the polynomial has more chances to vanish for values z such that two monomials have the same order of magnitude. Hence this is a good value to choose your initial point. Finally, deciding if two monomials have the same order of magnitude can be done by taking their log, and then it becomes geometry with polygons, hence the name Newton Polygon.

[1]: https://numpi.dm.unipi.it/scientific-computing-libraries/mps...

[2]: https://woelen.homescience.net/science/math/exps/polynomials...

adgjlsfhk1 · on March 26, 2024

isn't MA57 just a standard good sparse linear solver? For sparse problems, NonlinearSolve.jl (via LinearSolve.jl) will either use KLU or Cholmod both of which are also quite fast. Also, even if the linear solver is the bottleneck, that doesn't mean different algorithms won't make a differenc.e Different nonlinear solve algorithms can lead to needing fewer iterations (e.g. fewer linear solves).

wenc · on March 26, 2024

I don’t know. I’m curious is all.

MA57 seems generic but I’ve tried different linear solvers with IPOPT even ones that were supposed to be better, but MA57 still performed the best.

Nonlinear numerical problems involve a bit of art. Theory doesn’t always pan out in practice.

kxyvr · on March 26, 2024

MA57 is a sparse symmetric solver. IPOPT uses this because it solves a modified KKT system that includes both the Hessian and constraint information simultaneously. This differs from composite step methods, which split the optimization step into a step for feasibility (quasi-normal step) and a step for optimality (tangential step). An algorithm like NITRO uses a composite step method and as a result uses a different kind of linear system solver.

sobellian · on March 26, 2024

You can also put it as a minimization problem to minimize ||g(u)||, but I assume that does not magically make anything more efficient.

ChrisRackauckas · on March 26, 2024

That makes your Newton method require calculating the Hessian of the objective function, which is a second derivative. A normal Newton method for nonlinear systems uses the Jacobian, which is the first derivative (the connection is that the Hessian is the Jacobian of the gradient). But indeed, formulating it so that you have discontinuities (absolute values) and require second derivatives instead of the first for the same algorithm is not the good kind of magic :-/. You could then try using only gradient-based methods, like Adam, but you can see that is actually using less problem structure. BFGS for example is related to Broyden's method and is thus using simple Hessian approximations, so for nonlinear system solving this is effectively less stable than just using Newton.

That's not to say that there's no reason to use this trick though. I have seen that some optimization tools can improve robustness in some cases, and this is a way to make use of heuristic methods (genetic algorithms, differential evolution, etc.) for globalizing the process, but it's probably not the first trick you'd want to do in most cases.

While raising derivative order is one way to understand why this should be less efficient, but another is to understand that it loses some specific structure. In the nonlinear system problem f(u)=0, any u that solves this is a solution. All solutions are the same, if you find one you're good. In the optimization problem, you want to find "the" u that minimizes f(u). Now if you are minimizing f(u)=||g(u)|| and you know there is a u that causes g(u) to be zero, then all u that causes g(u) to be zero is a solution, all match the global optima. But solvers don't necessarily know that. Solvers need to use second order information to know that they have not hit a saddle point or other behavior. They don't know that "near zero = done", and they can't specialize on this information. This is another way in which you're just giving sophisticated more solvers more work to do where you already know the answer by problem design.

kxyvr · on March 26, 2024

This is not entirely true. A typical trick for accomplishing this is to simply use the Hessian approximation `g'(x)*g'(x)` (star denotes adjoint), thus cutting off the second-derivative evaluation of g, and then use a modified CG method like Newton-Krylov for line-search methods or Steihaug-Toint for trust-region methods. It's very robust, matrix-free, and doesn't require an excessive amount of code to implement. Further, the method can be preconditioned with a positive definite preconditioner.

And, to be clear, there's no guarantee that a zero residual be found, only that `g'(x)*g(x)=0`, but that tends to be the nature of nonlinear equations.

ChrisRackauckas · on March 26, 2024

> It's very robust, matrix-free, and doesn't require an excessive amount of code to implement. Further, the method can be preconditioned with a positive definite preconditioner.

It is not robust, in fact it's well-known as a numerically unstable method. g'(x)*g'(x)` is numerically unstable because it squares the condition number of the matrix. If you have a condition number of 1e-10, which is true for many examples in scientific problems (like the DFN battery model in the paper), the condition number of J'J is 1e-20, which effectively means a linear solve is unable to retrieve any digits of accuracy with 64-bit floating point numbers. So while I agree that you can use symmetric linear solvers as a small performance improvement in some cases, you have to be very careful when you do this as it is a very numerically unstable method. For some details, see https://math.stackexchange.com/a/2874902

Also, if you look at Figure 5 in the paper, you will see that there is a way to choose operator conditioning https://docs.sciml.ai/LinearSolve/stable/basics/OperatorAssu..., and if you tell it to assume the operator is well-conditioned it will actually use this trick. But we do not default to that because of the ill-conditioning.

kxyvr · on March 26, 2024

Yes, care must be taken to stabilize the algorithm. However, I do not believe it to be true that no digits of accuracy can be obtained. The algorithm falls back to the Cauchy (steepest-descent) point on the first iteration and this will give reduction. That said, I'm willing to solve this particular problem and see. Where's your code for the DFN battery model? The page here leads to a dead github link:

https://help.juliahub.com/batteries/stable/api/#JuliaSimBatt...

ChrisRackauckas · on March 26, 2024

The full DFN has a lot more going on. Use the benchmark-ified version from the benchmarks:

https://docs.sciml.ai/SciMLBenchmarksOutput/dev/NonlinearPro...

But as Avik states, this same kind of thing is done as part of the trust region methods, and so the trust region stuff with some ways to avoid J'J is effectively doing the same thing, though of course that cannot target a symmetric linear solver because J is not generally symmetric and it needs to do the L2 solve so it needs a QR/GMRES.

But indeed, let's get IPOPT in this interface and benchmarks and show it in the plots.

kxyvr · on March 26, 2024

What's your convergence metric? It looks like x_sol gives a residual on the order of 1e-7. That's fine, but I'd like to measure an equivalent number to the benchmark.

ChrisRackauckas · on March 26, 2024

It's varied, that's why it's measured as work-precision. It would be good to similarly produce a full work-precision line as point estimates do not tell the full story.

Note that we are preparing an extension to easily use the optimizers https://github.com/SciML/NonlinearSolve.jl/pull/395 to add it to the benchmarks.

avikpal1410 · on March 26, 2024

That is precisely how trust region methods work for nonlinear least squares problems and nonlinear systems. MINPACK, which both SciPy and Matlab use, defaults to this kind of TR scheme (without the matrix-free part). We do have TR comparisons in the paper.

Note that we don't however solve the normal form J'J system as that is generally bad (unless user opts in to doing so) and instead use least squares formulation which is more numeraically stable

akutlay · on March 26, 2024

I dont know much about this but the fact that they did not mention IPOPT or Highs (which I would think them as two of the most popular software used for these problems) made me question this too.

avikpal1410 · on March 26, 2024

IPOPT as mentioned in other posts solves Nonlinear Optimization Problems. Working around it to make it solve nonlinear equations or nonlinear least squares problems typically doesn't go well. We will add benchmarks for the NLLS part (which we don't talk about in the paper) and that will contain IPOPT comparisons.

For some comparison (this is not a benchmark me or anyone on this paper have written), see https://juliapackagecomparisons.github.io/comparisons/math/n.... Specialized solvers for NLLS pretty much always beat general optimization solvers.

affinepplan · on March 26, 2024

those are indeed very popular tools but for a slightly different kind of problem than this package is attempting to solve

ChrisRackauckas · on March 26, 2024

Yes, especially Highs is completely unrelated because it's for (mixed integer) linear programming problems and quadratic programming problems, so it's not able to solve the types of problems described in this manuscript.

IPOPT is a bit of a stretch but I do know that there are some people in the mathematical programming space that use IPOPT's constraint satisfaction piece for solving a nonlinear system in a pinch, but it's not the main use case of IPOPT which is nonlinear optimization with nonlinear (in)equality constraints. This case has no optimization and is just nonlinear equality constraints, which you can then specialize quite a bit on. But while it is considered common knowledge in numerical circles to not use a nonlinear optimizer to solve nonlinear systems, I don't have a good canonical benchmark to point to on that, so we might as well make this it.

akutlay · on March 26, 2024

Thank you very much for your answer!

jpfr · on March 26, 2024

This article is about “solving” differential equations and not convex optimization.

wenc · on March 26, 2024

> This article is about “solving” differential equations and not convex optimization.

This article is about solving nonlinear equations (not differential equations, not sure where you got that from). All NLP optimizers can solve nonlinear equations — it’s a special case where the objective is constant.

Ipopt is not a convex solver so am not sure what convex optimization you are referring to. It is a general nonlinear solver, which covers nonconvex problems as well (I worked on nonconvex nonlinear programs for a decade and it was my primary solver)

Also all nonlinear equation systems are nonconvex. (A convex program requires equality constraints to be linear)

FreakLegion · on March 26, 2024

> all nonlinear equation systems are nonconvex

Maybe you have something more particular in mind when you say "systems", but not all nonlinear functions are non-convex. Least squares, for example, is nonlinear and convex.

Also note that IPOPT, while wonderful, is a local solver. It may not be limited to convex problems, but those are the only ones it's guaranteed to solve to optimality.

wenc · on March 26, 2024

I was talking in terms of convex optimization. The criteria for convex optimization is convex objective, convex inequality constraints and convex feasible region, and linear (not just convex) equality constraints.

I’m aware that ipopt is a local solver.

FreakLegion · on March 26, 2024

> all nonlinear equation systems are nonconvex

> I was talking in terms of convex optimization

I'm not sure what subtlety you're pointing to here. As far as non-convex problems, though, my point is that IPOPT isn't special in this regard. Any convex solver can be a non-convex solver if you don't care about global optimality.

wenc · on March 26, 2024

> Any convex solver can be a non-convex solver if you don't care about global optimality.

Aside: Structurally I’m not sure how this would be true.

Convex solvers have very specific forms. For instance a QP solver requires a very particular form and does not admit any arbitrary non convex form except for one: the non-PSD Hessian which is the concave problem.

My point is that all NLEs power inside an optimization problem gives rise to a non convex optimization problem with no guarantee of a global solution. So convex optimization is not applicable here.

Bimos · on March 26, 2024

The feasible region {x | f(x) = 0} is nonconvex no matter whether f is convex.

graycat · on March 26, 2024

Consider claim:

> The feasible region {x | f(x) = 0} is nonconvex no matter whether f is convex.

For some positive integer n and for the set of real numbers R, consider closed (in the usual topology for R^n), convex set A a subset of R^n. Define function f: R^n --> R so that f(x) = 0 for all x in A, f(x) > 0 for all x not in A, and for all x in R^n f infinitely differentiable at x. It is a theorem that such an f exists.

Then { x | f(x) = 0 } = A and is both closed and convex in contradiction to the claim.

ubj · on March 26, 2024

I'm assuming you're referring to nonlinear f(x) because this statement is trivially false for linear f(x).

But consider the function f(x) = max(0, g(x) - c) where the following holds:

* g(x) is nonlinear, positive definite in x, and convex.

* c > 0

Then f(x) is nonlinear and convex (it's the pointwise maximum of convex functions), and the set {x : f(x) = 0} is a convex set.

adgjlsfhk1 · on March 26, 2024

it should be pretty easy to add to the benchmarks. From a first glance, it looks like they have a lot of the necessary machinery to do well.