今天學(xué)習(xí)另外一個轉(zhuǎn)錄因子活性預(yù)測工具:DoRothEA。很多文章用的也是這個工具,看起來比pySCENIC好像快很多。其實,單細胞轉(zhuǎn)錄因子分析本質(zhì)上是計算一種specific的得分,DoRothEA計算的是 Viper 得分。
======安裝=====
官網(wǎng)地址是:
https://saezlab.github.io/dorothea/
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("dorothea")
或者用devtools安裝:
devtools::install_github("saezlab/dorothea")
還需要一個依賴包:
BiocManager::install("viper")
=====例子測試====
還是用經(jīng)典的pbmc的例子。
加載需要的庫:
library(Seurat)
library(dorothea)
library(tidyverse)
library(viper)
把pbmc的數(shù)據(jù)load進來,分析和前面的很多例子一樣
pbmc <- readRDS("pbmc.rds")
str(pbmc@meta.data)
下面,加載這個包自帶的human的數(shù)據(jù)庫
## We read Dorothea Regulons for Human:
dorothea_regulon_human <- get(data("dorothea_hs", package = "dorothea"))
## We obtain the regulons based on interactions with confidence level A, B and C
regulon <- dorothea_regulon_human %>% dplyr::filter(confidence %in% c("A","B","C"))
然后計算TF的活性,通過使用函數(shù) run_viper() 在 DoRothEA 的regulons上運行 VIPER 以獲得 TFs activity。 在 seurat 對象的情況下,該函數(shù)返回相同的 seurat 對象,其中包含一個名為 dorothea 的assay,其中包含slot數(shù)據(jù)中的 TFs activity。
pbmc <- run_viper(pbmc, regulon,options = list(method = "scale", minsize = 4,
eset.filter = FALSE, cores = 1, verbose = FALSE))
Assays(pbmc)

可以看到,這個pbmc數(shù)據(jù)集里面的約 三千個細胞都有自己的266個轉(zhuǎn)錄因子的活性得分。

下面,我們試試看FindAllMarkers函數(shù)獲取各個單細胞亞群特異性的轉(zhuǎn)錄因子。
DefaultAssay(object = pbmc) <- "dorothea"
table(Idents(pbmc))

pbmc.markers <- FindAllMarkers(object = pbmc,
only.pos = TRUE,
min.pct = 0.25,
thresh.use = 0.25)
DT::datatable(pbmc.markers)
write.csv(pbmc.markers,file='pbmc.markers.csv')
下面,我們嘗試熱圖展示:
library(dplyr)
pbmc.markers$fc = pbmc.markers$pct.1 - pbmc.markers$pct.2
top10 <- pbmc.markers %>% group_by(cluster) %>% top_n(10, fc)
pbmc@assays$dorothea@data[1:4,1:4]
top10 =top10[top10$fc > 0.3,]
pbmc <- ScaleData(pbmc)
DoHeatmap(pbmc,top10$gene,size=3,slot = 'scale.data')
從熱圖可以看出,其實還是有一大部分TF在不同的細胞類型中特異性表達的。

然后,我們嘗試利用這個包來找下特異性比較好的TF regulons。官網(wǎng)有詳細的文檔介紹,比如如何選擇TF activity per cell population,根據(jù) previously computed VIPER scores on DoRothEA’s regulons.
首先也是獲取Viper得分矩陣 ,根據(jù)不同細胞亞群進行歸納匯總 :
viper_scores_df <- GetAssayData(pbmc, slot = "scale.data", assay = "dorothea") %>%
data.frame(check.names = F) %>% t()
viper_scores_df[1:4,1:4]

## We create a data frame containing the cells and their clusters
CellsClusters <- data.frame(cell = names(Idents(pbmc)),
cell_type = as.character(Idents(pbmc)),
check.names = F)
head(CellsClusters)
## We create a data frame with the Viper score per cell and its clusters
viper_scores_clusters <- viper_scores_df %>%
data.frame() %>%
rownames_to_column("cell") %>%
gather(tf, activity, -cell) %>%
inner_join(CellsClusters)
## We summarize the Viper scores by cellpopulation
summarized_viper_scores <- viper_scores_clusters %>%
group_by(tf, cell_type) %>%
summarise(avg = mean(activity),
std = sd(activity))
# For visualization purposes, we select the 20 most variable TFs across clusters according to our scores.
head(summarized_viper_scores)
## We select the 20 most variable TFs. (20*9 populations = 180)
highly_variable_tfs <- summarized_viper_scores %>%
group_by(tf) %>%
mutate(var = var(avg)) %>%
ungroup() %>%
top_n(180, var) %>%
distinct(tf)
highly_variable_tfs
## We prepare the data for the plot
summarized_viper_scores_df <- summarized_viper_scores %>%
semi_join(highly_variable_tfs, by = "tf") %>%
dplyr::select(-std) %>%
spread(tf, avg) %>%
data.frame(row.names = 1, check.names = FALSE)
最后,整理好的數(shù)據(jù)如下所示:
summarized_viper_scores_df[1:4,1:4]

最終數(shù)據(jù)是挑選的top 20個TF,它們在不同的單細胞亞群里面的變化最大。這樣就可以可視化這些細胞類型特異表達的TF regulon了。
有了數(shù)據(jù),就很容易可視化:
palette_length = 100
my_color = colorRampPalette(c("Darkblue", "white","red"))(palette_length)
my_breaks <- c(seq(min(summarized_viper_scores_df), 0,
length.out=ceiling(palette_length/2) + 1),
seq(max(summarized_viper_scores_df)/palette_length,
max(summarized_viper_scores_df),
length.out=floor(palette_length/2)))
library(pheatmap)
viper_hmap <- pheatmap(t(summarized_viper_scores_df),fontsize=14,
fontsize_row = 10,
color=my_color, breaks = my_breaks,
main = "DoRothEA (ABC)", angle_col = 45,
treeheight_col = 0, border_color = NA)
