正則化

過擬合問題(The Problem of Overfitting)

如上圖所示,第一個采用單變量線性回歸模型來擬合數(shù)據集,但其效果并不好,因此我們將這種情況稱為欠擬合(Underfitting)或高偏差(High Bias);第二個采用二次多項式的線性回歸模型來擬合數(shù)據集,其效果恰好,因此我們將這種情況稱為“Just Right”;第三個采用四次多項式的線性回歸模型來擬合數(shù)據集,其雖然對數(shù)據集擬合的非常好,但其曲線忽上忽下難以針對新數(shù)據進行預測,因此我們將這種情況稱為過擬合(Overfitting)或高方差( high variance)。

除此之外,邏輯回歸模型也存在上述情況,如下圖所示:

根據在線性回歸模型中的分析,我們不難得知第一個為欠擬合,第二個最合適,第三個過擬合。

現(xiàn)在我們來看看過擬合的定義:

即若數(shù)據集中存在許多特征變量,我們通過使用高次方多項式來擬合數(shù)據集,其看似將數(shù)據集中的每個數(shù)據都擬合得很好,但其對于新數(shù)據的處理就無法做得很好,即泛化較差(泛化指一個假設模型能應用到新樣板的能力),這時我們將其稱為過擬合。

Question:
Consider the medical diagnosis problem of classifying tumors as malignant or begin. If a hypothesis hθ(x) has overfit the training set, it means that:
A. It makes accurate predictions for examples in the training set and generalizes well to make accurate predictions on new, previously unseen examples.
B. It does not make accurate predictions for examples in the training set, but it does generalize well to make accurate predictions on new, previously unseen example.
C. It makes accurate predictions for examples in the training set, but it does not generalize well to make accurate predictions on new, previously unseen examples.
D. It does not make accurate predictions for examples in the training set and does not generalize well to make accurate predictions on new, previously unseen examples.

根據過擬合的定義我們不難得知C為正確答案。

針對過擬合問題,我們有如下方法來解決:

  1. 減少特征變量的個數(shù):
    • 人工選擇特征變量
    • 使用模型選擇算法,自動選擇特征變量
  2. 正則化:保留所有特征變量,但減小參數(shù)θj的值
補充筆記
The Problem of Overfitting

Consider the problem of predicting y from x ∈ R. The leftmost figure below shows the result of fitting a y = θ01x to a dataset. We see that the data doesn’t really lie on straight line, and so the fit is not very good.

Underfitting, or high bias, is when the form of our hypothesis function h maps poorly to the trend of the data. It is usually caused by a function that is too simple or uses too few features. At the other extreme, overfitting, or high variance, is caused by a hypothesis function that fits the available data but does not generalize well to predict new data. It is usually caused by a complicated function that creates a lot of unnecessary curves and angles unrelated to the data.

This terminology is applied to both linear and logistic regression. There are two main options to address the issue of overfitting:

  1. Reduce the number of features:
    • Manually select which features to keep.
    • Use a model selection algorithm (studied later in the course).
  2. Regularization
    • Keep all the features, but reduce the magnitude of parameters θj.
    • Regularization works well when we have a lot of slightly useful features.
代價函數(shù)(Cost Function)

若假設函數(shù)hθ(x) = θ0 + θ1x1 + θ2x22 + θ3x33 + θ4x44,則會出現(xiàn)對下圖數(shù)據集過擬合的情況。

現(xiàn)假設所有的特征變量x都是非常重要的,因此我們不能舍棄任何一個特征變量x。為了解決這個問題,我們使用正則化的方法將參數(shù)θj的值變小。

為此我們需要將代價函數(shù)J(θ)修改為如下圖所示那樣:

當我們使用梯度下降算法或其他高級算法來求得了參數(shù)θ的值來使得代價函數(shù)J(θ)最小化時,其θ3和θ4的值相比之前對新數(shù)據預測的影響要小。為什么呢?

這時因為我們通過使用正則化方法,在求得代價函數(shù)J(θ)最小化時,其θ3和θ4的值會無限接近于0。因此,假設函數(shù)hθ(x)甚至可以改寫為hθ(x) = θ0 + θ1x1 + θ2x22。

如若某個數(shù)據集中有非常多的特征變量x且每個特征變量都非常重要,為了避免過擬合問題,我們可將代價函數(shù)J(θ)修改為:

其中λ稱為正則化參數(shù)(Regularization Parameter)。因此,我們將這種方法稱為正則化。

注:此處我們無需考慮θ0

對于正則化參數(shù)λ的選擇我們也要慎重,一旦其值過大,則θ1,θ2,θ3和θ4都會無限接近于0。此時,假設函數(shù)hθ(x)甚至可以改寫為hθ(x) = θ0。

其結果如圖中紅線所示,這樣就出現(xiàn)了欠擬合問題。

補充筆記
Cost Function

If we have overfitting from our hypothesis function, we can reduce the weight that some of the terms in our function carry by increasing their cost.

Say we wanted to make the following function more quadratic:

We'll want to eliminate the influence of θ3x3 and θ4x4 . Without actually getting rid of these features or changing the form of our hypothesis, we can instead modify our cost function:

We've added two extra terms at the end to inflate the cost of θ3 and θ4. Now, in order for the cost function to get close to zero, we will have to reduce the values of θ3 and θ4 to near zero. This will in turn greatly reduce the values of θ3x3 and θ4x4 in our hypothesis function. As a result, we see that the new hypothesis (depicted by the pink curve) looks like a quadratic function but fits the data better due to the extra small terms θ3x3 and θ4x4.

We could also regularize all of our theta parameters in a single summation as:

The λ, or lambda, is the regularization parameter. It determines how much the costs of our theta parameters are inflated.

Using the above cost function with the extra summation, we can smooth the output of our hypothesis function to reduce overfitting. If lambda is chosen to be too large, it may smooth out the function too much and cause underfitting.

正則化的線性回歸(Regularized Linear Regression)

正則化的代價函數(shù)J(θ)為:

現(xiàn)在我們使用學過的梯度下降算法和正規(guī)方程法來求出使得代價函數(shù)J(θ)最小化的參數(shù)θ的值。

梯度下降算法

由于在正則化過程中,我們不對θ0做任何處理,于是梯度下降算法的表達式為:

對于j=1, 2, 3, ...時的迭代表達式可改寫為:

其中1-α*λ/m﹤1一定成立。

正規(guī)方程

正則化的正規(guī)方程的公式為:

其中L矩陣為(n+1)*(n+1)。

對于樣本數(shù)量m小于特征變量x的個數(shù)n時,XTX為不可逆矩陣(奇異矩陣),若如我們在Octave中使用pinv()函數(shù)則可求出其偽逆矩陣,但使用inv()則無法求出其可逆矩陣。

注:對于樣本數(shù)量m等于特征變量x的個數(shù)n時,XTX可能為不可逆矩陣(奇異矩陣)。

存在正則化參數(shù)λ﹥0時,即使當樣本數(shù)量m小于等于特征變量x的個數(shù)n時,XTX為不可逆矩陣,也可使用inv()求出其可逆矩陣。

補充筆記
Regularized Linear Regression

We can apply regularization to both linear regression and logistic regression. We will approach linear regression first.

Gradient Descent

We will modify our gradient descent function to separate out θ0 from the rest of the parameters because we do not want to penalize θ0.

Normal Equation

Now let's approach regularization using the alternate method of the non-iterative normal equation.

To add in regularization, the equation is the same as our original, except that we add another term inside the parentheses:

L is a matrix with 0 at the top left and 1's down the diagonal, with 0's everywhere else. It should have dimension (n+1)×(n+1). Intuitively, this is the identity matrix (though we are not including x0), multiplied with a single real number λ.

Recall that if m < n, then XTX is non-invertible. However, when we add the term λ?L, then XTX + λ?L becomes invertible.

正則化的邏輯回歸(Regularized Logistic Regression)

正則化的邏輯回歸模型的代價函數(shù)J(θ)為:

梯度下降算法

其中hθ(x) = g(θTX)。

高級優(yōu)化算法

首先,創(chuàng)建costFunction.m文件并在文件中按如下圖所示寫出相關函數(shù)代碼:

然后,如之前在邏輯回歸(二)一文中所講,在Octave中調用fminunc()函數(shù),具體操作可回顧邏輯回歸(二)一文。

補充筆記
Regularized Logistic Regression

We can regularize logistic regression in a similar way that we regularize linear regression. As a result, we can avoid overfitting. The following image shows how the regularized function, displayed by the pink line, is less likely to overfit than the non-regularized function represented by the blue line:

Cost Function

Recall that our cost function for logistic regression was:

We can regularize this equation by adding a term to the end:

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯(lián)系作者
【社區(qū)內容提示】社區(qū)部分內容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發(fā)布,文章內容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內容

  • People far away. Are you worried? The long way home. A be...
    元初沖沖沖閱讀 508評論 6 4
  • 今日關鍵詞:【優(yōu)秀】 喬布斯說,1個優(yōu)秀的人能抵得上50個普通員工。那么,該如何定義優(yōu)秀人呢?所謂優(yōu)秀人才,至少要...
    羅藝律師閱讀 1,278評論 0 2
  • - 等劉詩雯反應過來的時候,自己已經被壓倒在了床上,身上是火熱的身體。一抬眼,自己對著的,正是張繼科那雙帶著血絲的...
    大海魚湯閱讀 8,768評論 0 14

友情鏈接更多精彩內容