在訓(xùn)練深度學(xué)習(xí)的網(wǎng)絡(luò)時(shí)候,迭代一定次數(shù),會(huì)出現(xiàn)loss是nan,然后acc很快降低到了0.1,訓(xùn)練也就無法繼續(xù)了。這個(gè)是什么原因?有說法是“尺度不平衡的初始化”,這個(gè)是什么意思?怎么才能解決呢?
There are lots of things I have seen make a model diverge.
Too high of a learning rate. You can often tell if this is the case if the loss begins to increase and then diverges to infinity.
I am not to familiar with the DNNClassifier but I am guessing it uses the categorical cross entropy cost function. This involves taking the log of the prediction which diverges as the prediction approaches zero. That is why people usually add a small epsilon value to the prediction to prevent this divergence. I am guessing the DNNClassifier probably does this or uses the tensorflow opp for it. Probably not the issue.
Other numerical stability issues can exist such as division by zero where adding the epsilon can help. Another less obvious one if the square root who's derivative can diverge if not properly simplified when dealing with finite precision numbers. Yet again I doubt this is the issue in the case of the DNNClassifier.
You may have an issue with the input data. Try calling assert not np.any(np.isnan(x)) on the input data to make sure you are not introducing the nan. Also make sure all of the target values are valid. Finally, make sure the data is properly normalized. You probably want to have the pixels in the range [-1, 1] and not [0, 255].
The labels must be in the domain of the loss function, so if using a logarithmic-based loss function all labels must be non-negative (as noted by evan pu and the comments below).
說明訓(xùn)練不收斂了, 學(xué)習(xí)率太大,步子邁的太大導(dǎo)致梯度爆炸等都是有可能的,另外也有可能是網(wǎng)絡(luò)的問題,網(wǎng)絡(luò)結(jié)構(gòu)設(shè)計(jì)的有問題。我現(xiàn)在的采用方式是: 1. 弱化場(chǎng)景,將你的樣本簡化,各個(gè)學(xué)習(xí)率等參數(shù)采用典型配置,比如10萬樣本都是同一張復(fù)制的,讓這個(gè)網(wǎng)絡(luò)去擬合,如果有問題,則是網(wǎng)絡(luò)的問題。否則則是各個(gè)參數(shù)的問題。 2. 如果是網(wǎng)絡(luò)的問題,則通過不斷加大樣本的復(fù)雜度和調(diào)整網(wǎng)絡(luò)(調(diào)整擬合能力)來改變。 3. 參數(shù)的微調(diào),我個(gè)人感覺是在網(wǎng)絡(luò)的擬合能力和樣本的復(fù)雜度匹配的情況下,就是可以train到一定水平,然后想進(jìn)行進(jìn)一步優(yōu)化的時(shí)候采用。 4. 參數(shù)的微調(diào),樓上說得幾個(gè)也算是一種思路吧,其他的靠自己去積累,另外將weights可視化也是一個(gè)細(xì)調(diào)起來可以用的方法,現(xiàn)在digits tf里面都有相關(guān)的工具.
loss出現(xiàn)Nan,說明你的loss已經(jīng)發(fā)散了。下面是一點(diǎn)個(gè)人經(jīng)驗(yàn),無理論指導(dǎo),歡迎板磚。解決辦法:1、減小整體學(xué)習(xí)率。學(xué)習(xí)率比較大的時(shí)候,參數(shù)可能over shoot了,結(jié)果就是找不到極小值點(diǎn)。減小學(xué)習(xí)率可以讓參數(shù)朝著極值點(diǎn)前進(jìn)。2、改變網(wǎng)絡(luò)寬度。有可能是網(wǎng)絡(luò)后面的層參數(shù)更新異常,增加后面層的寬度試試。3、增加網(wǎng)絡(luò)層數(shù)。4、改變層的學(xué)習(xí)率。每個(gè)層都可以設(shè)置學(xué)習(xí)率,可以嘗試減小后面層的學(xué)習(xí)率試試。
1、數(shù)據(jù)歸一化(減均值,除方差,或者加入normalization,例如BN、L2 norm等);
2、更換參數(shù)初始化方法(對(duì)于CNN,一般用xavier或者msra的初始化方法);
3、減小學(xué)習(xí)率、減小batch size;
4、加入gradient clipping;