拉普拉斯平滑:
最簡單的例子:
中國男足vs韓國男足的前5場的比分是0:5,那預(yù)測第六場中國隊(duì)勝出的概率是多少時難道給0/5,這絕壁不行。所以分子分母都加1,變成1/6。
貝葉斯網(wǎng)絡(luò):

p(a,b,c) = p(c|a,b)p(b|a)p(a)
馬爾科夫鏈:
貝葉斯網(wǎng)絡(luò)拉成一條線,并假設(shè)當(dāng)前節(jié)點(diǎn)發(fā)生的概率只與當(dāng)前節(jié)點(diǎn)的前一個節(jié)點(diǎn)有關(guān)。
時間序列:
時間序列簡單的說就是各時間點(diǎn)上形成的數(shù)值序列,時間序列分析就是通過觀察歷史數(shù)據(jù)預(yù)測未來的值。在這里需要強(qiáng)調(diào)一點(diǎn)的是,時間序列分析并不是關(guān)于時間的回歸,它主要是研究自身的變化規(guī)律的(這里不考慮含外生變量的時間序列)。
決策樹
SLIQ:
introduce:
SLIQ stands for Supervised Learning In Quest, where Quest is the Data Mining project at the IBM Almaden Research Center.
SLIQ is a decision tree classifier that can handle both numeric and categorical attributes.
advantages:
SLIQ uses the novel techniques of pre-sorting, breadth first growth, and MDL-based pruning.
pre-sorting:
SLIQ uses a pre-sorting technique in the tree-growth phase to reduce the cost of evaluating numeric attributes.
MDL-based pruning:
SLIQ also uses a new tree-pruning algorithm based on the Minimum Description Length principle [11]. This algorithm is inexpensive, and results in compact and accurate trees.
Best-First decision:
different :
The only difference is that, standard decision tree learning expands nodes in depth-first order, while best-first decision tree learning expands the ”best” node first.
split is the split with the maximal reduction of impurity

In this example, considering the fully-expanded best-first decision tree the benefit of expanding node N2 is greater than the benefit of expanding N3.
two splitting criteria to measure impurity:
Gini gain,information gain
These splitting criteria were introduced to measure impurity of a node.
splitting rules:
split with the maximal reduction of impurity
對于連續(xù)型/數(shù)值型變量,對特征進(jìn)行預(yù)排序,尋找最佳分割點(diǎn)
The method of dealing with missing values:
以不同的權(quán)重進(jìn)入不同的分支
pruning method:pre-pruning post-pruning:
As mentioned before, pre-pruning stops splitting when the splitting cannot improve predictive performance.
In other words, post-pruning prunes off branches which do not improve accuracy.
梯度提升 (GB)
AnyBoost
設(shè)C是損失函數(shù) C是關(guān)于F的函數(shù)
F是一個弱學(xué)習(xí)器 ~F是一個弱學(xué)習(xí)器的集合
F' 代表F的導(dǎo)數(shù), 我們要找到一個f屬于~F,
使得<-F',f>最大,<,>代表內(nèi)積,內(nèi)積大,代表相似度高
當(dāng)內(nèi)積小于零,我們停止迭代
內(nèi)積,損失函數(shù),步長根據(jù)特定情況規(guī)定
A gradient descent view of voting methods
規(guī)定內(nèi)積:

反向梯度的公式:


免費(fèi)午餐定理(NLF)
在沒有實(shí)際背景下,沒有一種算法比隨機(jī)胡猜的效果好
it is hopeless to dream for a learning algorithm which is consistently better than other learning algorithms.
Ensembles Methods
Boost
examples:



Bagging
boost 是順序集成
bagging 是平行集成,基學(xué)習(xí)器平行生成,利用獨(dú)立性
Bagging: Bootstrap AGGregating
采用Bootstrap sampling for training data / sampling with replacement
最常用的策略: voting for classificationg averaging for regression
Bagging有巨大的方差減小效應(yīng),對不穩(wěn)定的學(xué)習(xí)器非常有效(常見的穩(wěn)定學(xué)習(xí)器:k-nearest / neighbor classifer)