加載數(shù)據(jù)
此插圖強調(diào)了一些在Seurat中執(zhí)行差異表達的示例工作流程。出于演示目的,我們將使用在第一個指導(dǎo)教程中創(chuàng)建的2700 PBMC對象。您可以在此處下載預(yù)先計算的對象。
library(Seurat)
pbmc <- readRDS(file = "../data/pbmc3k_final.rds")
執(zhí)行默認(rèn)的差異表達測試
可以通過該FindMarkers函數(shù)訪問Seurat的大部分差異表達功能。默認(rèn)情況下,Seurat基于非參數(shù)Wilcoxon秩和檢驗執(zhí)行微分表達式。這將替換以前的默認(rèn)測試(“ bimod”)。要測試兩組特定細(xì)胞之間的差異表達,請指定ident.1和ident.2參數(shù)。
# list options for groups to perform differential expression on
levels(pbmc)
## [1] "Naive CD4 T" "Memory CD4 T" "CD14+ Mono" "B" "CD8 T"
## [6] "FCGR3A+ Mono" "NK" "DC" "Platelet"
# Find differentially expressed features between CD14+ and FCGR3A+ Monocytes
monocyte.de.markers <- FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono")
# view results
head(monocyte.de.markers)
| p_val | avg_logFC | pct.1 | pct.2 | p_val_adj | |
|---|---|---|---|---|---|
| FCGR3A | 0 | -2.617707 | 0.131 | 0.975 | 0 |
| LYZ | 0 | 1.812078 | 1.000 | 0.988 | 0 |
| RHOC | 0 | -1.611576 | 0.162 | 0.864 | 0 |
| S100A8 | 0 | 2.610695 | 0.975 | 0.500 | 0 |
| S100A9 | 0 | 2.286734 | 0.996 | 0.870 | 0 |
| IFITM2 | 0 | -1.445771 | 0.677 | 1.000 | 0 |
結(jié)果數(shù)據(jù)框包含以下列:
- p_val:p_val(未調(diào)整)
- avg_logFC:兩組之間平均表達的對數(shù)折疊通道。正值表示該特征在第一組中的表達更高。
- pct.1:第一組中檢測到該功能的像元百分比
- pct.2:第二組中檢測到該功能的單元格的百分比
- p_val_adj:基于使用數(shù)據(jù)集中所有特征的bonferroni校正,調(diào)整后的p值。
如果ident.2省略該參數(shù)或?qū)⑵湓O(shè)置為NULL,FindMarkers將測試由指定的組ident.1與所有其他單元格之間的差異表達特征。
# Find differentially expressed features between CD14+ Monocytes and all other cells, only
# search for positive markers
monocyte.de.markers <- FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = NULL, only.pos = TRUE)
# view results
head(monocyte.de.markers)
| p_val | avg_logFC | pct.1 | pct.2 | p_val_adj | |
|---|---|---|---|---|---|
| S100A9 | 0 | 3.860873 | 0.996 | 0.215 | 0 |
| S100A8 | 0 | 3.796640 | 0.975 | 0.121 | 0 |
| LGALS2 | 0 | 2.634295 | 0.908 | 0.059 | 0 |
| FCN1 | 0 | 2.352693 | 0.952 | 0.151 | 0 |
| CD14 | 0 | 1.951644 | 0.667 | 0.028 | 0 |
| TYROBP | 0 | 2.111879 | 0.994 | 0.265 | 0 |
預(yù)過濾功能或單元可提高DE測試的速度
為了提高標(biāo)記發(fā)現(xiàn)的速度,特別是對于大型數(shù)據(jù)集,Seurat允許對特征或單元進行預(yù)過濾。例如,在一組細(xì)胞中很少檢測到的特征或以相似的平均水平表達的特征不太可能被差異表達。所述的實施例的用例min.pct,logfc.threshold,min.diff.pct,和max.cells.per.ident參數(shù)在下面證明。
# Pre-filter features that are detected at <50% frequency in either CD14+ Monocytes or FCGR3A+
# Monocytes
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", min.pct = 0.5))
| p_val | avg_logFC | pct.1 | pct.2 | p_val_adj | |
|---|---|---|---|---|---|
| FCGR3A | 0 | -2.617707 | 0.131 | 0.975 | 0 |
| LYZ | 0 | 1.812078 | 1.000 | 0.988 | 0 |
| RHOC | 0 | -1.611576 | 0.162 | 0.864 | 0 |
| S100A8 | 0 | 2.610695 | 0.975 | 0.500 | 0 |
| S100A9 | 0 | 2.286734 | 0.996 | 0.870 | 0 |
| IFITM2 | 0 | -1.445771 | 0.677 | 1.000 | 0 |
# Pre-filter features that have less than a two-fold change between the average expression of
# CD14+ Monocytes vs FCGR3A+ Monocytes
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", logfc.threshold = log(2)))
| p_val | avg_logFC | pct.1 | pct.2 | p_val_adj | |
|---|---|---|---|---|---|
| FCGR3A | 0 | -2.617707 | 0.131 | 0.975 | 0 |
| LYZ | 0 | 1.812078 | 1.000 | 0.988 | 0 |
| RHOC | 0 | -1.611576 | 0.162 | 0.864 | 0 |
| S100A8 | 0 | 2.610695 | 0.975 | 0.500 | 0 |
| S100A9 | 0 | 2.286734 | 0.996 | 0.870 | 0 |
| IFITM2 | 0 | -1.445771 | 0.677 | 1.000 | 0 |
# Pre-filter features whose detection percentages across the two groups are similar (within
# 0.25)
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", min.diff.pct = 0.25))
| p_val | avg_logFC | pct.1 | pct.2 | p_val_adj | |
|---|---|---|---|---|---|
| FCGR3A | 0 | -2.617707 | 0.131 | 0.975 | 0 |
| RHOC | 0 | -1.611576 | 0.162 | 0.864 | 0 |
| S100A8 | 0 | 2.610695 | 0.975 | 0.500 | 0 |
| IFITM2 | 0 | -1.445771 | 0.677 | 1.000 | 0 |
| LGALS2 | 0 | 2.049431 | 0.908 | 0.265 | 0 |
| CDKN1C | 0 | -1.007729 | 0.029 | 0.506 | 0 |
# Increasing min.pct, logfc.threshold, and min.diff.pct, will increase the speed of DE testing,
# but could also miss features that are prefiltered
# Subsample each group to a maximum of 200 cells. Can be very useful for large clusters, or
# computationally-intensive DE tests
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", max.cells.per.ident = 200))
| p_val | avg_logFC | pct.1 | pct.2 | p_val_adj | |
|---|---|---|---|---|---|
| FCGR3A | 0 | -2.6177073 | 0.131 | 0.975 | 0 |
| LYZ | 0 | 1.8120776 | 1.000 | 0.988 | 0 |
| S100A8 | 0 | 2.6106955 | 0.975 | 0.500 | 0 |
| S100A9 | 0 | 2.2867339 | 0.996 | 0.870 | 0 |
| IFITM2 | 0 | -1.4457715 | 0.677 | 1.000 | 0 |
| RPS19 | 0 | -0.7563274 | 0.990 | 1.000 | 0 |
使用替代測試執(zhí)行DE分析
當(dāng)前支持以下差異表達測試:
- “ wilcox”:Wilcoxon秩和檢驗(默認(rèn))
- “ bimod”:單細(xì)胞特征表達的似然比測試(McDavid等,生物信息學(xué),2013)
- “ roc”:標(biāo)準(zhǔn)AUC分類器
- “ t”:學(xué)生的t檢驗
- “泊松”:假設(shè)潛在泊松分布的似然比檢驗。僅用于基于UMI的數(shù)據(jù)集
- “ negbinom”:似然比檢驗,假設(shè)潛在的負(fù)二項式分布。僅用于基于UMI的數(shù)據(jù)集
- “ LR”:使用邏輯回歸框架確定差異表達的基因。構(gòu)造一個邏輯回歸模型,根據(jù)每個特征分別預(yù)測組成員身份,并將其與似然比檢驗的空模型進行比較。
- “ MAST”:將細(xì)胞檢測率視為協(xié)變量的GLM框架(Finak等,Genome Biology,2015)(安裝說明)
- “ DESeq2”:基于使用負(fù)二項式分布的模型的DE (Love等人,Genome Biology,2014)(安裝說明)
對于MAST和DESeq2,請確保單獨安裝這些軟件包,以便將它們用作Seurat的一部分。安裝后,可以使用use test.use參數(shù)指定要使用的DE測試。
# Test for DE features using the MAST package
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", test.use = "MAST"))
| p_val | avg_logFC | pct.1 | pct.2 | p_val_adj | |
|---|---|---|---|---|---|
| LYZ | 0 | 1.812078 | 1.000 | 0.988 | 0 |
| FCGR3A | 0 | -2.617707 | 0.131 | 0.975 | 0 |
| S100A9 | 0 | 2.286734 | 0.996 | 0.870 | 0 |
| S100A8 | 0 | 2.610695 | 0.975 | 0.500 | 0 |
| IFITM2 | 0 | -1.445771 | 0.677 | 1.000 | 0 |
| LGALS2 | 0 | 2.049431 | 0.908 | 0.265 | 0 |
# Test for DE features using the DESeq2 package. Throws an error if DESeq2 has not already been
# installed Note that the DESeq2 workflows can be computationally intensive for large datasets,
# but are incompatible with some feature pre-filtering options We therefore suggest initially
# limiting the number of cells used for testing
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", test.use = "DESeq2", max.cells.per.ident = 50))
| p_val | avg_logFC | pct.1 | pct.2 | p_val_adj | |
|---|---|---|---|---|---|
| S100A9 | 0 | 1.759457 | 0.996 | 0.870 | 0 |
| LYZ | 0 | 1.377950 | 1.000 | 0.988 | 0 |
| S100A8 | 0 | 1.929894 | 0.975 | 0.500 | 0 |
| FCGR3A | 0 | -2.044779 | 0.131 | 0.975 | 0 |
| RPS19 | 0 | -1.119358 | 0.990 | 1.000 | 0 |
| IFITM2 | 0 | -1.53??3646 | 0.677 | 1.000 | 0 |
致謝
我們感謝MAST和DESeq2軟件包的作者的幫助和建議。我們還將用戶引向Charlotte Soneson和Mark Robinson 進行的以下研究,該研究對單細(xì)胞差異表達測試的方法進行了仔細(xì)而廣泛的評估。