20170721_DE

<h1 align = "center">Differential Evolution - A simple and efficient adaptive scheme for global optimization over continuous spaces</h1>

by Rainer Storn and Kenneth Price TR-95-012
March 1995

Users generally demand that a practical optimization technique should fulfill three requirements. First, the method should find the true global minimum, regardless of the initial system parameter values. Second, convergence should be fast. Third, the program should have a minimum of control parameters so that it will be easy to use. In our search for a fast and easy to use "sure fire" technique, we developed a method which is not only astonishingly simple, but also performs extremely well on a wide variety of test problems. It is inherently parallel and hence lends itself to computation via a network of computers or processors. The basic strategy employs the difference of two randomly selected parameter vectors as the source of random variations for a third parameter vector. In the following, we present a more rigorous description of the new optimization method which we call Differential Evolution.


優(yōu)化算法使用者通常需要實際優(yōu)化算法滿足三個要求。第一個,算法應找到實際的全局最小,無論系統的初始參數是什么值。第二,收斂要迅速。第三,程序應該有最小控制參數數目以易于使用。在我們的研究中為了使得使用確定正確的方法更快捷簡便,我們推出了一種算法不僅非常簡單,而且對于各種測試問題表現很好。它內在可以并行因此適用于使用電腦和處理器網絡進行計算。最初的策略使用兩個隨機挑選的參數向量之間的差異作為第三個參數向量。在之后的文章中,我們將對于新的優(yōu)化方法--差分進化進行更嚴格的地描述。


Problem Formulation

Consider a system with the real valued properties
gm; m = 0, 1, 2, ... , P-1 (1)
which constitute the objectives of the system to be optimized. Additionally, there may be real valued constraints
gm; m = P, P+1, ... , P+C-1 (2)
which describe properties of the system that need not be optimized but neither shall be degraded. For example, one may wish to design a mobile phone with the dual objectives of maximizing the transmission power g1 and minimizing the noise g2 of the audio amplifier while simultaneously keeping the battery life g3 above a certain threshold. The properties g1 and g2 represent objectives to be optimized whereas g3 is a constraint. Let all properties of the system be dependent on the real valued parameters
xj; j = 0, 1, 2, ... , D-1. (3)


想象一個系統有P個實值特征,用g0到gP-1表示,他們是系統被優(yōu)化的目標。另外,也有C個實值約束,從gP到gP+C-1,不用被優(yōu)化,但是也不能不滿足。例如,有個人想要設計一個手機,有兩個目標,一個是最大化傳輸功率g1,最小化揚聲器噪音g2同時保持電池壽命g3超過特定閾值。特征g1和g2代表被優(yōu)化的目標函數然而g3是一個約束。使所有特征依賴D個實值參數x0到xD-1。


In the case of the mobile phone the parameters could be resistor and capacitor values. For most technical systems realizability requires
xj ∈ [xjl, xjh] . (4)
Usually, restrictions on the xj will be incorporated into the collection gm, m>=P, of constraints.
Optimization of the system means to vary the D-dimensional parameter vector
x = (x0, x1, ... , xD-1)T (5)
until the properties gm are optimized and the constraints gm, m>=P, are met. An optimization task can always be reformulated as the minimization problem
min fm(x) (6)


在手機的例子中參數可以是電阻和電容值,對于大部分技術體系的實現需要一個xj 在[xjl, xjh] 區(qū)間里的約束。通常,xj會參與下標大于等于P的約束。系統的優(yōu)化問題意味著改變D維參數向量x = (x0, x1, ... , xD-1)T 直到特征被優(yōu)化約束被滿足。一個優(yōu)化問題總能重新建模為最小化問題min fm(x) 。



用fm(x)表示特征 gm的計算函數,并且它的優(yōu)化或者限制維持可以表達為最小化fm(x)。所有的函數fm(x)可以結合為一個單目標目標函數z(x),它的形成或者通過計算加權平均或者最大加權fm(x)值(權重為正)。權重向量wm用于定義不同目標和限制的重要程度也可以用于標準化不同實際單位?,F在最優(yōu)化問題可以被表述成minz(x)。


The min-max formulation (8) and (10) guarantees that all local minima, the Pareto critical points, including the possibly multiple global minima, the Pareto points, can at least theoretically be found [2], [12]. For the objective function (7) and (10) this is true only if the region of realizability of x is convex [1], [2], which in general does not apply in most technical problems.


(8)和(10)的最小化最大化公式保證了所有的局部最小值(Pareto臨界點),包括可能的多個全局最小值(Pareto點),在理論上可以被找到。對于目標函數(7)和(10),只有當函數是凸函數才能確保找到,通常在多數實際問題達不到這樣的條件。


The Method of Differential Evolution

Differential Evolution (DE) is a novel parallel direct search method which utilizes NP parameter vectors xi,G, i = 0, 1, 2, ... , NP-1. (11)
as a population for each generation G. NP doesn't change during the minimization process. The initial population is chosen randomly if nothing is known about the system. As a rule, we will assume a uniform probability distribution for all random decisions unless otherwise stated. In case a preliminary solution is available, the initial population is often generated by adding normally distributed random deviations to the nominal solution xnom,0. The crucial idea behind DE is a new scheme for generating trial parameter vectors. DE generates new parameter vectors by adding the weighted difference vector between two population members to a third member.


差分進化是一種創(chuàng)新的并行直接搜索方法,利用NP個參數向量(并行NP次),xi,G表示第G代的分布。最小化過程中NP不變化。在對系統不了解時,初始分布隨機選擇。作為一個原則,我們通常假定所有隨機選擇都采用均勻分布除非特別說明。一旦賦了初值,初始分布通常通過把正態(tài)分布偏移量加到初值xnom,0上。DE背后的核心思想是一個新的產生試探參數向量的方法。DE通過將兩個點的差加權加到第三個點上產生新的參數向量。


If the resulting vector yields a lower objective function value than a predetermined population member, the newly generated vector replaces the vector with which it was compared. The comparison vector can, but need not be part of the generation process mentioned above. In addition the best parameter vector xbest,G is evaluated for every generation G in order to keep track of the progress that is made during the minimization process.
Extracting distance and direction information from the population to generate random deviations results in an adaptive scheme with excellent convergence properties. We tried several variants of DE, the two most promising of which we subsequently present in greater detail.


如果結果向量比之前的目標函數值低,新產生的向量代替和它比較的點。比較的向量可以是三個產生結果向量的向量之中的,也可以不是。另外,最佳參數向量xbest,G在每一代計算,為了在最小化過程中保留運算軌跡。從分布中提取距離和方向信息以產生隨機偏移產生了有很好收斂性質的適應性機制。我們試了DE的很多個變體,兩個最好的我們隨后詳細說明。


Scheme DE1

Our first variant of DE works as follows: for each vector xi,G , i = 0,1,2,...,NP-1, a trial vector v is generated according to
v = xr1,G + F *(xr2 ,G xr3,G ), (12)
with r1,r2,r3∈[0,NP-1],integer and mutually different,andF>0. (13)
The integers r1, r2 and r3 are chosen randomly from the interval [0, NP-1] and are different from the running index i. F is a real and constant factor which controls the amplification of the differential variation (xr2 ,G xr3,G ).Fig. 1 shows a two dimensional example that illustrates the different vectors which play a part in DE1.


我們第一個DE變種工作流程如下:對于每個向量xi,G,一個試探向量v根據v = xr1,G + F *(xr2 ,G xr3,G )產生,其中r1,r2,r3在[0,NP-1]區(qū)間內并且是不相同的整數,F>0。r1,r2,r3從區(qū)間[0,NP-1]中隨機選擇并且和當前序號i不同。F是一個實值并且常數因子,控制偏差的擴大縮小量。圖1演示了二維參與DE1的不同向量的例子。



I.e. a certain seuence of the vector elements of u are identical to the elements of v, the other elements of u acquire the original values of xi,G . Choosing a subgroup of parameters for mutation is similiar to a process known as crossover in evolution theory. This idea is illustrated in Fig. 2 for D=7, n=2 and L=3. The starting index n in (15) is a randomly chosen integer from the interval [0, D-1]. The integer L is drawn from the interval [0, D-1] with the probability Pr(L=v) = (CR)v . CR∈[0,1] is the crossover probability and constitutes a control variable for the DE1-scheme. The random decisions for both n and L are made anew for each trial vector v.


為了增加參數向量的多樣性,u向量產生自(15)函數。換句話說,一個特定序號的向量元素u和v相同,其他u需要xi,G的原始的值。選擇幾個參數進行變異和進化理論的交叉變異過程類似,這個思想在圖2 中演示,其中D=7, n=2 and L=3。(15)中的開始序號n是從[0, D-1]區(qū)間隨機選擇的,整數L是從區(qū)間[0, D-1]中通過概率 Pr(L=v) = (CR)v得出的???。CR是交叉概率并且是DE1算法中的控制變量。n和L的隨機選擇是為了再次更新每個試探向量v。



In order to decide whether the new vector u shall become a population member of generation G+1, it will be compared to xi,G . If vector u yields a smaller objective function value than xi,G, xi,G+1 is set to u, otherwise the old value xi,G is retained.


為了判斷新向量u能否成為G+1代的成員,它應當和xi,G比較。如果u產生更小的目標函數值,xi,G+1被賦值為u,否則,舊值xi,G保留。


Scheme DE2

Basically, scheme DE2 works the same way as DE1 but generates the vector v according to
v = xi,G + λ*(xbest,G - xi,G) + F (xr2,G - xr3,G), (16)
introducing an additional control variable λ. The idea behind λ is to provide a means to enhance the greediness of the scheme by incorporating the current best vector xbest ,G . This feature can be useful for non-critical objective functions. Fig. 3 illustrates the vector-generation process defined by (16). The construction of u from v and xi,G as well as the decision process are identical to DE1.


根本上來說,DE2方案和DE1方案運作方法類似但是產生v變量根據v = xi,G + λ*(xbest,G - xi,G) + F (xr2,G - xr3,G),引入額外控制變量λ。λ背后的思想是通過納入當前最優(yōu)向量xbest ,G提供一種提高方案貪心性的方法。這種方法對于非臨界目標函數?有用。圖3展示了一個(16)定義的向量產生過程。u從v和xi,G中的構造以及決策過程和DE1相同。


Competing minimization methods

In order to compare the DE method with other global minimizing strategies, we looked for approaches where the source code is readily available, which are known to be powerful and which are capable of coping with nonlinear and non differentiable functions. Two methods in particular piqued our interest. The first was the annealed version of the Nelder&Mead strategy (ANM) [10] which is appealing because of its adaptive scheme for generating random parameter deviations. When the annealing part is switched off, a fast converging direct search method remains which is especially useful for non-critical objective functions. The basic control variables in ANM are T, the starting temperature, TF, the temperature reduction factor and NV, the number of random variations at a given temperature level.


為了比較DE方法和其他全局最小策略,我們尋找源代碼可得的方法,并且這種方法要有影響力的并且可以處理非線性和不可導函數。兩種方法激起了我們的興趣。第一個是Nelder&Mead方法的退火版本(ANM),這個策略因為它產生隨機參數偏移的適應性方案而吸引人。當退火部分關閉,一個快速收斂的直接搜索方法保留,這個方法對非臨界目標函數特別有效。ANM的基本控制變量有初始溫度T,溫度遞減因子TF,給定溫度水平的隨機變量數目。


The second method of interest was Adaptive Simulated Annealing (ASA) [8] which claims to converge very quickly and to outperform genetic algorithms on the De Jong test suite [9]. Although ASA provides more than a dozen control variables, it turned out that just two of them, TEMPERATURE_RATIO_SCALE (TRS) and TEMPERATURE_ANNEAL_SCALE (TAS), had significant impact on the minimization process. We will compare both ANM and ASA to DE1 and DE2. During our research we also wrote an annealed version of the Hooke&Jeeves method [5] and tested two Monte Carlo methods [3] one of which used NP parallel vectors and the differential mutation scheme of DE. Although these approaches all worked, they quickly turned out not to be competitive.


第二個有趣的方法是自適應模擬退火(ASA),這個方法收斂迅速并且在De Jong測試中做的比遺傳算法好。即使ASA需要超過一沓變量,結果是只有兩個變量TRS和TAS對最小化過程有顯著影響。我們會比較ANM、ASA和DE1、DE2。在我們的研究中,我們也寫了Hooke&Jeeves的退火版本并且試了兩個蒙特卡洛方法,一個用了NP并行向量和DE的分變異方案。即使這些方案全部有效,他們都沒有競爭力。



最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
【社區(qū)內容提示】社區(qū)部分內容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發(fā)布,文章內容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內容

友情鏈接更多精彩內容