Model Evaluation and Validation

You can find this article and source code at my GitHub

Testing

Two types of our problems

Think about a simple case... How well is my model doing with a regression problem?

It seems that, though the line in the right graph fits better to the original data points. But if we add one more new data point for testing purpose, the left one works better since it's more generalized.

How do we measure the generalization?

For a regression problem...

For a classification problem...


Notice that both models fit the training set well, but once we introduce the testing set, the model on the left makes less mistakes than the model on the right.

This issue can be handled easily in a Python package called "sklearn".

from sklearn.model_selection import train_test_split
X_train, y_train, X_test, y_test = train_test_split(X, y, test_size=0.25) # 25% total samples will be split into the test set

A golden rule is...

Never use your testing data for training purpose.
That is, never let your model know anything about your testing data. Your model should not learn anything from the testing data.


Evaluation

There is a metric for classification problems called "confusion matrix"

You can fill the blank by yourself to see whether you understand this metric correctly.

The answers are 6, 1, 2 and 5 for True Positives, False Negatives, False Positives, and True Negatives, respectively.


Accuracy

We have a very basic method to calculate the accuracy...

Again, "sklearn" can do this simply with several lines of code

from sklearn.metrics import accuracy_score
accuracy_score(y_true, y_predict)

Regression metrics

from sklearn.metrics import mean_absolute_error
from sklearn.linear_model import LinearRegression

classifier = LinearRegression()
classifier.fit(X_train, y_train)

guesses = classifier.predict(X_test)
error = mean_absolute_error(y_test, guesses)

But there is a problem with the mean absolute error (MAE) is that the formula of MAE is not differentiable, therefore it cannot be adopted to some common method we will use later such as the gradient descent.

An alternative method is the mean squared error (MSE).

from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression

classifier = LinearRegression()
classifier.fit(X_train, y_train)

guesses = classifier.predict(X_test)
error = mean_squared_error(y_test, guesses)

Another common metric we use here is the R2 score.

The formula is as below, and the error in the two figures is calculated with the MSE formula.

from sklearn.metric import r2_score

y_true = [1, 2, 3]
y_pred = [3, 2, 3]

r2_score(y_true, y_pred)

Type of Errors

Error due to bias (underfitting)

Error due to variance (overfitting)

There is the trade-off...


Model Complexity Graph


K-Fold Cross Validation

This is a very useful way to recycle our data...

With this algorithm, for example, in the above graph, we will go train our model 4 times with the different splitting result. And then we average the 4 results in order to find the final model.

"sklearn" is awesome!

from sklearn.model_selection import KFold

kf = KFold(12, 3)
for train_idx, test_idx in kf:
    print(train_idx, test_idx)

If we want to "eliminate" possible bias, we can also add randomized selection in the K-Fold algorithm.

"sklearn" is awesome AGAIN!

from sklearn.model_selection import KFold

kf = KFold(12, 3, shuffle=True)
for train_idx, test_idx in kf:
    print(train_idx, test_idx)

Thanks for reading. If you find any mistake / typo in this blog, please don't hesitate to let me know, you can reach me by email: jyang7[at]ualberta.ca

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • 恐懼的最初來源于人類進(jìn)化種的避險(xiǎn)需求。心理上的恐懼其實(shí)和任何具體的、真正迫在眉睫的危險(xiǎn)無關(guān)。心理上的恐懼總是源于“...
    王增利閱讀 524評論 0 0
  • 今天佳娃老師給我們分享了財(cái)富是什么?財(cái)富包括了所有,精神財(cái)富,物質(zhì)財(cái)富,我們身休每一個(gè)部位都是財(cái)富,包括健康、美麗...
    辛勒換成果閱讀 238評論 0 0
  • 【日更123】 今天在看有關(guān)“概率論”的講課視頻,想補(bǔ)上曾經(jīng)沒有好好上過的課。要是在學(xué)生時(shí)代早知道會有這一天,那當(dāng)...
    唐斬2086閱讀 219評論 0 1

友情鏈接更多精彩內(nèi)容