国内精品五月天,嫩草女五月天,五月天超碰

single cell clustering

Key Point

scRNA數(shù)據(jù)分析聚類策略選擇
聚類的Technical, Biological, Computational挑戰(zhàn)
聚類的生物學(xué)意義解釋

寫在前面的話

單細(xì)胞實驗技術(shù)發(fā)展

單細(xì)胞組學(xué)應(yīng)用

單細(xì)胞測序分析流程

流式也是一種單細(xì)胞的技術(shù)，不同的是流式通過細(xì)胞的表面蛋白對細(xì)胞類群進行鑒定，而scRNA-seq對單個細(xì)胞的表達(dá)譜進行定量，通過Top基因的表達(dá)對細(xì)胞類群進行鑒定。
為什么要聚類？基于表達(dá)譜的聚類是一種無監(jiān)督的數(shù)據(jù)驅(qū)動，無偏的方法；利用聚類可以對細(xì)胞類型進行劃分，對研究細(xì)胞異質(zhì)性，發(fā)育，進化相關(guān)有很大的幫助
很多聚類方法有潛在的假設(shè)，即數(shù)據(jù)中存在離散的cluser；但是對一些細(xì)胞發(fā)育譜系來說，可能需要考慮進化軌跡的問題，cluster之間存在時間上的關(guān)系。

文獻(xiàn)正文

聚類策略

scRNA-seq 表達(dá)譜矩陣特點：

高維(上萬個基因表達(dá))
稀疏(基因的表達(dá)值為0或接近0)

聚類中距離的計算：

使用所有的feature，即基因，容易落入'curse of dimensionality'，使得距離傾向于更小
特征選擇和降維，使用一些基因組成的特征空間，比如PCA降維

可以使用Euclidean distance, cosine similarity, Pearson's similarity, Pearson's correlation 和 Spearman's correlation。后三個計算方法考慮值之間的相對差異，使得它們對library or cell size差異更加魯棒。

常用的聚類的方法k-means，計算復(fù)雜度隨點的數(shù)目線性增加，然而①k-means通常是貪婪算法，容易陷入局部最優(yōu)，需要重復(fù)多次不同初始參數(shù)條件或者像SC3上游處理，發(fā)現(xiàn)consensus；②bias towards identifying equal-sized clusters，導(dǎo)致忽略稀有細(xì)胞類型。

另外一個常用方法是層次聚類，自上而下或自下而上，但是其time and memory consuming，隨著數(shù)據(jù)點的增加而呈現(xiàn)二次方增長。

另外一個常用的聚類方法是community-detection-based 算法，或者說是圖算法。首先其建立一個k-nearest neighbours graph，其中K的選擇對最終cluster的大小和數(shù)目影響很大。大多數(shù)基于圖的聚類方法只返回一個最優(yōu)解，而且其不用指定cluster的數(shù)目。

Name	Year	Method type	Strengths	Limitations
scanpy ⁴	2018	PCA?+?graph-based	Very scalable	May not be accurate for small data sets
Seurat (latest)³	2016	PCA?+?graph-based	Very scalable	May not be accurate for small data sets
PhenoGraph³²	2015	PCA?+?graph-based	Very scalable	May not be accurate for small data sets
SC3 ²²	2017	PCA?+?k-means	High accuracy through consensus, provides estimation of k	High complexity, not scalable
SIMLR ²⁴	2017	Data-driven dimensionality reduction?+?k-means	Concurrent training of the distance metric improves sensitivity in noisy data sets	Adjusting the distance metric to make cells fit the clusters may artificially inflate quality measures
CIDR ²⁵	2017	PCA?+?hierarchical	Implicitly imputes dropouts when calculating distances
GiniClust ⁷⁵	2016	DBSCAN	Sensitive to rare cell types	Not effective for the detection of large clusters
pcaReduce ²⁷	2016	PCA?+?k-means?+?hierarchical	Provides hierarchy of solutions	Very stochastic, does not provide a stable result
Tasic et al.²⁸	2016	PCA?+?hierarchical	Cross validation used to perform fuzzy clustering	High complexity, no software package available
TSCAN ⁴¹	2016	PCA?+?Gaussian mixture model	Combines clustering and pseudotime analysis	Assumes clusters follow multivariate normal distribution
mpath ⁴⁵	2016	Hierarchical	Combines clustering and pseudotime analysis	Uses empirically defined thresholds and a priori knowledge
BackSPIN ²⁶	2015	Biclustering (hierarchical)	Multiple rounds of feature selection improve clustering resolution	Tends to over-partition the data
RaceID²³, RaceID2¹¹⁵, RaceID3	2015	k-Means	Detects rare cell types, provides estimation of k	Performs poorly when there are no rare cell types
SINCERA ⁵	2015	Hierarchical	Method is intuitively easy to understand	Simple hierarchical clustering is used, may not be appropriate for very noisy data
SNN-Cliq ⁸⁰	2015	Graph-based	Provides estimation of k	High complexity, not scalable

DBSCAN, density-based spatial clustering of applications with noise; PCA, principal component analysis; scRNA-seq, single-cell RNA sequencing.

Discrete versus continuous cell grouping

大多數(shù)劃分聚類的算法會忽略是否存在生物學(xué)有意義的群，如果數(shù)據(jù)中沒有離散的群存在的話，這些方法可能就不是很適用。特別是細(xì)胞處于連續(xù)的狀態(tài)，比如分化，這時常用one dimensional manifold('pseudotime') to order the cells.

comparison of clustering and pseudotime methods

Technical challenges

more dropouts，可能原因：沒有表達(dá)；測序深度低；建庫時沒有捕獲到轉(zhuǎn)錄本
目前有一些統(tǒng)計方法to impute zeros。
估計technical noise，使用內(nèi)源性spike-in RNA，作為陽性對照
batch effect，批次效應(yīng)，最好的避免方法是平衡實驗設(shè)計
還需要考慮在建庫時的RNA降解的問題
doublets (droplets containing two cells)
一些高表達(dá)的基因比如ribosomal genes也會對聚類有影響

Biological challenges

cell-cycle, scLVM和cyclone可以處理這些問題
rare cell type鑒定，分治的策略，但是大cluster要不要繼續(xù)分又是一個問題。

Computational challenges

高維
線性降維：PCA
非線性降維：tSNE和UMAP

參數(shù)的選擇，比如k-means中k的選擇以及基于圖的算法中k階近鄰中k的選擇
如何驗證方法的有效性，及golden standard dataset的建立

tissues that are very well studied and understood 或者 considering cells taken from the earliest stages of embryonic development
many of the suitable data sets are quite small, making it difficult to test methods at the kinds of scale that are relevant for current experiments
可以借助實驗的方法，spatial methods，比如FISH，RNAscope等作為驗證。

生物學(xué)解釋和注釋

如何對劃分的類打標(biāo)簽，這是個很難的問題。與流式基于細(xì)胞表面的蛋白類似，scRNA-seq將cluster中高表達(dá)的基因作為marker基因，通過查文獻(xiàn)，數(shù)據(jù)庫等方式對cluster進行打標(biāo)簽。
或者借助GO富集分析，這里急需一個Cell Ontology的DataBase

新的scRNA-seq數(shù)據(jù)如何以往數(shù)據(jù)進行整合，這里需要考慮batch effect的問題。
整合的是可以①先對表達(dá)矩陣進行merge再進行聚類分析；②或者類似進行blast的功能，給一個cell的表達(dá)矩陣，找到它最近的鄰居。

其實除了RNA水平，還有其它水平的數(shù)據(jù)，即多組學(xué)數(shù)據(jù)，可以更好的幫助我們進行cell type identification。還有實驗水平的空間染色方法，可以幫助我們驗證分群的好壞。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

2019-04-03 文獻(xiàn)閱讀 Challenges in unsupervised clustering of single- cell RNA- seq data

2019-04-03 文獻(xiàn)閱讀 Challenges in unsupervised clustering of single- cell RNA- seq data

Key Point

寫在前面的話

文獻(xiàn)正文

聚類策略

Discrete versus continuous cell grouping

Technical challenges

Biological challenges

Computational challenges

生物學(xué)解釋和注釋

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

2019-04-03 文獻(xiàn)閱讀 Challenges in unsupervised clustering of single- cell RNA- seq data

Key Point

寫在前面的話

文獻(xiàn)正文

聚類策略

Discrete versus continuous cell grouping

Technical challenges

Biological challenges

Computational challenges

生物學(xué)解釋和注釋

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av