Use a constrained optimization code or transform the function arguments?

If you have a constrained optimization problem

minimize f(x1,x2) with x1 >= 0

you can use an optimization code that handles bound constraints, or you can minimize a function of transformed variables that ensures x1 is non-negative, solving the unconstrained problem

minimize g(u,x2) where g(a,b) = f(exp(a),b)

or define a function with a penalty

h(x1,x2) = f(x1,x2) if x1 >= 0 or f(0,x2) + x1**2 if x1 < 0

Is it possible to make a general statement which approach tends to be faster?

Once I suggested to the authors of Numerical Recipes that they add codes for bound-constrained optimization (their Fortran library has codes for unconstrained optimization). One of the authors suggested that if a variable is constrained to the region [-1,1], you can use a sine or cosine transform of a function argument.

1 Like

There is a third option in some cases; look to the physics/chemistry etc of the problem and recast in variables and parameters that do not require (as many) constraints. In my experience this is always better than either of the “last resort” methods you mention and which are often necessary.

The challenge IMHO is always to find a continuous representation for the constraint. That’s historically always been the case even before computers came along (think Lagrange multipliers). All optimizers use 1st/(sometimes 2nd or quasi-2nd) order local approximations of the merit function and proceed based on gradient/Hessian information, so, discontinuities are going to be a problem whatever method you use.

Regarding what analytical trick / change-of-variables you’re going to use, that’s extremely problem dependent, but it boils down to what are the numerical bounds of the non-constrained function and how large is the penalty function compared to them (should be >>, but far from overflowing)

1 Like

This is a problem that happens very often in economics and finance, where e.g. x1 denotes the amount of labor supplied on the market, which of course has to be non-negative.

If you go for the transformation approach, shouldn’t the transformation function be one-to-one (i.e. invertible)? If so, the sine transformation that you mention in your post is not a good idea

If your transformation is such f(x) is minimized at 0, pi, 2*pi etc., and your initial guess is say x=0.1, I think a local optimization algorithm should converge to the x=0 solution. Since results will be reported for the untransformed parameters, the existence of multiple solutions for the transformed parameters is not necessarily a problem.

1 Like

I don’t think there is a general statement about which approach is best. I just wanted to add a couple of comments about this problem. If the solution lies exactly on a boundary constraint, then that sometimes introduces optimization problems for the transformation approach, while if the solution is not on a constraint boundary things can work quite well with that approach. In the example given, with constraint x1>=0, the situation to watch for is when the optimal solution is (x1*,x2*)=(0,x2*) and when the gradient at that point is nonzero.

There is another approach when there is an equality constraint of some kind, say h(x1,x2)==0. In these situations, you can introduce a new Lagrange multiplier value, x3, and then solve a new unconstrained problem involving the three variables (x1,x2,x3).

1 Like

Thanks for your answer! I just wanted to illustrate the problem in a general way since it can be of interest to some users

The problem

min f(x) subject to lb<=x<=ub

is equivalent to

min f(g(x_u)

where g(x_u) is a continuous function that maps the real line to the interval [lb,ub]. In other words, g is a transformation that maps the unbounded variable to the interval [lb,ub]

Now, if g is not one-to-one, you might have multiple unbounded x’s that give you the same x, and the same value for the objective function. In this way, however, you might introduce additional minima in the problem, but this doesn’t matter…

Another point: in your OP you propose the exp transformation which however might give overflow, no? So it might not be very stable

I would always add a Tichonov regularization term in this case.