Does the first new proposed move on the new temperature has to begin from the last x_opt?
It seems if so, it makes parallelization a little non-trivial here. Because even with many CPU cores, when new Temperature start, they have to start from the last unique x_opt. So, if someone try to use many cores and each core uses a shorter Markiv chain (so less parameter exploring per core, so total computing wall time can be decreased), it does not work well. Because all the cpus’ Markov chain are started from the same ``x_opt```, so they are more or less correlated which is bad because they are not exploring the parameters space enough.
I have used various algorithms for finding “an” optimum. They all have their quirks, but I would say that you can do worse than using BOBYQA by M. Powell and derivatives or indeed the routines in MINPACK. I have experimented with probabilistic methods (simulated annealing, SCE, PSO, differential evolution and then some), but my conclusion was that their claims of achieving the global optimum are only approached if you have sufficient patience.
I have generally used my Genetic Algorithm, developed during my Ph. D. (25 years ago!), coupled to a physical model (simulating light propagation in optoelectronics devices).
I have sometimes used a Simulated Algorithm, but seemingly simpler that what you describe. The advantage of SA is that it can be very simple to program: a loop with a decreasing T used to compute the probability of accepting a solution. If there is only a few parameters in the problem, it can be sufficient.
It depends of course of the complexity of the “landscape” of the “function” you want to optimize, and the number of parameters.
As with stochastic algorithm you can not be certain to obtain the global optimum, parallelization could consist to launch several Simulated Annealing algorithm, each one following its own trajectory in the “landscape”. And you take the best solution at the end (the best local optimum).
The documentation of the NAG library is informative even if you don’t have access to the software, since algorithms are described and references are given. You could look at section Global Optimization of a Function.
Since the word “convex” does not appear in the referenced ACM TOMS article, it is worth pointing out that for a convex problem, all local solutions are global solutions.
Thank you all @Arjen@vmagnin@Beliavsky@themos
I appreciate all of your insights! I learned besides SA, there are also other algorithms such as particle swarm, genetic algorithm, BOBYQA.
There is a book
The article on simulated annealing praises the Nelder-Mead algorithm for being quite robust and fast. That has prompted me to have a look at it again. On Netlib there was one implementation and I intend to modernise it a bit .
I can offer my experiences; when ever possible start with grid searches and plot contours of the objective function, look for pathological “crinkly bottoms” commonly seen fitting to noisy expt data, and if the objective function includes elements from a Monte Carlo simulation, planes consisting of 2 variances can contain very deep, narrow banana-shaped valleys.
I always run a random search routine and leave it running in the background, and leave it running for much longer than looks sensible, I posted a graph in our PhD office of one case where thousands of iterations had produced no obvious improvement, then suddenly there was a dramatic “new” better optimum found. This graph always raises the same question: why did you leave it running so long, did you already know the “new” solution existed and the answer is always the same: no, I did not know, but I was having a professorial moment and had forgotten it was still running.
The next step really depends on how many times you are going to run these programs and how expensive the objective function is, but the basics always apply. Do not believe an optimisation until you have found it multiple times starting from very different parts of the parameter space, always grid search the immediate vicinity, to check a much better solution is not lurking near-by. Be patient, be constructively cynical, and get lucky.
BTW I agree with @Arjen, a combination of simulated annealing and folding polygon have served me, in my domain, very well over the years.
Alan Miller’s site has several optimization codes. For Nelder-Mead there is minim.f90 with driver t_minim.f90, which compiles and runs with gfortran, ifort, and g95. For global optimization I slightly modified and uploaded Global_Optimization_Miller
global.f90 At Arnold Neumaier’s web site, this is recommended as the most successful of the global optimization packages. There is a sample program fit.f90 and the original documentation global.txt for the f77 version. I have included testfunc.f90 which will eventually contain all of Neumaier’s 30 test functions. N.B. Users of local optimization packages usually obtain satisfactory convergence after 10s or sometimes 100s of function evaluations. Global optimization routines usually require many 1000s of function evaluations.
I had a closer look at Alan Miller’s implementation - it has both the Nelder-Mead method and the quadratic surface approximation. Nice. The interface is a bit complicated because you need to specify a lot of parameters but that can be amended by a small wrapper.
I’ve come across this article, which seems to suggest that “DE” and “CS” perform relatively good for their test calculations (but I guess the conclusion may also depend pretty strongly on the problem to be studied)
Perhaps it will be a good idea if we can have a Fortran package which contain various global or local optimization algorithms. Probably I will start to do this, LOL.
For me the only advantage of Julia is it has a differentialequation.jl which contains some latest ode/sde/dde solvers. Fortran also have most of them, it is just that we need more new algorithms implemented in modern Fortran.
If the function (either analytic or numerical) “behaves” (meaning it is differentiable in a given interval,) I normally use an algorithm I found in this paper (and others by the same authors, but this is the first and probably the most detailed one; a freely downladable version can be found here.) They describe the theory behind their algorithm in detail, but if you are in a hurry they also give you a pseudo-code (Pascal-like) which is easy to follow and easy to implement in Fortran - I did that years ago and use since then. The algorithm is able to find all the roots (or extrema) of a function within a user-specified interval - not just some roots/extrema, all of them.
I think the idea behind that algorithm is very clever: if you can, somehow, find the total number of roots/extrema within a given interval, then you can easily find those roots/extrema as well; all you need is just how many of them are located in the interval. Using Kronecker-Picard theory, computation of the total number of roots or extrema is deduced to a definite integral (I use QUADPACK for the integration itself but you can use whatever else; it’s not a “badly-behaving” integral.) Once you have that number, computing the roots/extrema is trivial and fast: divide and conquer.
The neat thing about this approach is that it shouts “I am highly parallelizable” from afar (see, e.g., this paper.) Furthermore, the method works very well with numerical functions (for example, a cubic spline based on computed data.) I used such an algorithm to compute whatever I needed for a specific task: from simple roots, maxima or minima to eigenvalues/eigenfrequencies.
The drawback is what you probably noticed right from the beginning: the function must be continuously differentiable (at least two times) in the interval you are looking for al roots or extrema.