Overview
Simulated annealing is a Monte Carlo approach for minimizing such multivariate functions. The term simulated annealing derives from the roughly analogous physical process of heating and then slowly cooling a substance to obtain a strong crystalline structure. [Kirk83] (See Figure 1 and Figure 2 below)
Perfect = all atoms lined up on crystal lattice sites, no defects.
Perfect = this is the lowest energy “state” for this set of atoms.
Figure 1
How to reach the “low energy state”: “anneal” the material.
get it very hot: gives atoms energy to move around. cool it
very slowly; gently restricts range of motion till everything
freezes into a low energy configuration.
Figure 2
In simulation, a minima of the cost function corresponds to this ground state of the substance. The simulated annealing process lowers the temperature by slow stages until the system “freezes” and no further changes occur. At each temperature the simulation must proceed long enough for the system to reach steady state or thermal equilibrium. Metropolis algorithm outlines the whole process: [Foge94]

After you perturb an atom and compute
,
it tells you whether or not you should keep this new perturbation as new
configuration or discard it. If the energy goes down, it is a “better”
state. You’d better keep it. If the energy goes up, it is a
“worse state”. You may or may not keep it. It depends.
Compute
exp(-
/KT) and generate a
random number r. If r is less than exp(-
/KT),
then keep it. Otherwise, reject it. [Foge94]
Metropolis algorithm only iteratively visits configurations with “reasonably probable” energies at the given fixed temperature. In other words, we may risk neglecting the best configurations. In order to reduce this risk, we need to set the initial temperature high enough to ensure that we won’t trap in a local minimum. To find a minimum energy state, we need to add outer loop that starts with this high temperature, and slowly cools down. Besides, we need to do enough perturbation at each temperature in the sequence of cooling steps to get to thermal equilibrium; to do enough temperatures so that the problem actually freezes into a low energy state, and further cooling does not further lower energy. (See the pseudo code below) [Foge94]

An annealing algorithm needs four basic components: [Foge94]
1. Configurations: a model of what a legal placement (configuration)
is. These represent the possible problem solutions over which we
will search for an answer.
2. Move set: a set of allowable moves that will permit us to
reach all feasible configurations and one that is easy to compute.
These moves are the computations we must perform to move from configuration
to configuration as annealing proceeds.
3. Cost function: to measure how good any given placement configuration
is.
4. Cooling schedule: to anneal the problem from a random solution
to a good, frozen, placement. Specifically, we need a starting hot
temperature (or a heuristic for determining a starting temperature for
the current problem) and rules to determine when the current temperature
should be lowered, by how much the temperature should be lowered, and when
annealing should be terminated.
Balls & Hills and Landscape flattening mental models clearly explain
why simulated annealing works. Consider a simple cost function with
only 1 variable; imagine the cost function surface as a “landscape” with
hills and current configuration we are visiting as the “ball” on the “hill”.
(See figure 3 below)

Figure 4: Landscape Flattening
Idea: temperature T “hides” the obstacles when hot
Cooling restricts us to smaller and smaller “reasonable” areas until
T freezes.
Limitations:
Not all combinational optimization problems can be annealed to given
satisfactory solutions (e.g., the time taken to get a decent answer may
prove to be unreasonable). A configuration space that is mostly smooth,
with gradually flowing hills and valleys is relatively easy to anneal:
once we find a good global downhill track, we can proceed on it climbing
over the occasional hill or bump as we approach a good solution.
In contrast, consider a mostly flat landscape with numerous, densely packed
gopher holes, each of widely varying depth, but some of which lead to the
very best solution. If movements keep falling into these dead-end holes,
and as the temperature cools, it will become impossible to climb out of
them, thus trapping the solution in a bad local minimum. [Kirk83]