【DoRothEA】單細胞轉(zhuǎn)錄因子分析

今天學(xué)習(xí)另外一個轉(zhuǎn)錄因子活性預(yù)測工具:DoRothEA。很多文章用的也是這個工具,看起來比pySCENIC好像快很多。其實,單細胞轉(zhuǎn)錄因子分析本質(zhì)上是計算一種specific的得分,DoRothEA計算的是 Viper 得分。

======安裝=====

官網(wǎng)地址是:

https://saezlab.github.io/dorothea/

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("dorothea")

或者用devtools安裝:
devtools::install_github("saezlab/dorothea")

還需要一個依賴包:
BiocManager::install("viper")

=====例子測試====

還是用經(jīng)典的pbmc的例子。

加載需要的庫:
library(Seurat)
library(dorothea)
library(tidyverse)
library(viper)
把pbmc的數(shù)據(jù)load進來,分析和前面的很多例子一樣

pbmc <- readRDS("pbmc.rds")
str(pbmc@meta.data)

下面,加載這個包自帶的human的數(shù)據(jù)庫

## We read Dorothea Regulons for Human:
dorothea_regulon_human <- get(data("dorothea_hs", package = "dorothea"))

## We obtain the regulons based on interactions with confidence level A, B and C
regulon <- dorothea_regulon_human %>% dplyr::filter(confidence %in% c("A","B","C"))

然后計算TF的活性,通過使用函數(shù) run_viper() 在 DoRothEA 的regulons上運行 VIPER 以獲得 TFs activity。 在 seurat 對象的情況下,該函數(shù)返回相同的 seurat 對象,其中包含一個名為 dorothea 的assay,其中包含slot數(shù)據(jù)中的 TFs activity。

pbmc <- run_viper(pbmc, regulon,options = list(method = "scale", minsize = 4, 
                  eset.filter = FALSE, cores = 1, verbose = FALSE))
Assays(pbmc)
image.png

可以看到,這個pbmc數(shù)據(jù)集里面的約 三千個細胞都有自己的266個轉(zhuǎn)錄因子的活性得分。

image.png

下面,我們試試看FindAllMarkers函數(shù)獲取各個單細胞亞群特異性的轉(zhuǎn)錄因子。

DefaultAssay(object = pbmc) <- "dorothea"
table(Idents(pbmc))
image.png
pbmc.markers <- FindAllMarkers(object = pbmc, 
                              only.pos = TRUE, 
                              min.pct = 0.25, 
                              thresh.use = 0.25)                                                          
DT::datatable(pbmc.markers)
write.csv(pbmc.markers,file='pbmc.markers.csv')

下面,我們嘗試熱圖展示:

library(dplyr) 
pbmc.markers$fc = pbmc.markers$pct.1 - pbmc.markers$pct.2
top10 <- pbmc.markers %>% group_by(cluster) %>% top_n(10, fc)
pbmc@assays$dorothea@data[1:4,1:4]
top10 =top10[top10$fc > 0.3,] 
pbmc <- ScaleData(pbmc)
DoHeatmap(pbmc,top10$gene,size=3,slot = 'scale.data')

從熱圖可以看出,其實還是有一大部分TF在不同的細胞類型中特異性表達的。

image.png

然后,我們嘗試利用這個包來找下特異性比較好的TF regulons。官網(wǎng)有詳細的文檔介紹,比如如何選擇TF activity per cell population,根據(jù) previously computed VIPER scores on DoRothEA’s regulons.

首先也是獲取Viper得分矩陣 ,根據(jù)不同細胞亞群進行歸納匯總 :

viper_scores_df <- GetAssayData(pbmc, slot = "scale.data", assay = "dorothea") %>%
data.frame(check.names = F) %>% t()
viper_scores_df[1:4,1:4]

image.png
## We create a data frame containing the cells and their clusters
CellsClusters <- data.frame(cell = names(Idents(pbmc)), 
                            cell_type = as.character(Idents(pbmc)),
                            check.names = F)
head(CellsClusters)

## We create a data frame with the Viper score per cell and its clusters
viper_scores_clusters <- viper_scores_df  %>%
  data.frame() %>% 
  rownames_to_column("cell") %>%
  gather(tf, activity, -cell) %>%
  inner_join(CellsClusters)
  
  
## We summarize the Viper scores by cellpopulation
summarized_viper_scores <- viper_scores_clusters %>% 
  group_by(tf, cell_type) %>%
  summarise(avg = mean(activity),
            std = sd(activity))
            
# For visualization purposes, we select the 20 most variable TFs across clusters according to our scores.
head(summarized_viper_scores)


## We select the 20 most variable TFs. (20*9 populations = 180)
highly_variable_tfs <- summarized_viper_scores %>%
  group_by(tf) %>%
  mutate(var = var(avg))  %>%
  ungroup() %>%
  top_n(180, var) %>%
  distinct(tf)
highly_variable_tfs

## We prepare the data for the plot
summarized_viper_scores_df <- summarized_viper_scores %>%
  semi_join(highly_variable_tfs, by = "tf") %>%
  dplyr::select(-std) %>%   
  spread(tf, avg) %>%
  data.frame(row.names = 1, check.names = FALSE) 

                  

最后,整理好的數(shù)據(jù)如下所示:

summarized_viper_scores_df[1:4,1:4]
image.png

最終數(shù)據(jù)是挑選的top 20個TF,它們在不同的單細胞亞群里面的變化最大。這樣就可以可視化這些細胞類型特異表達的TF regulon了。

有了數(shù)據(jù),就很容易可視化:

palette_length = 100
my_color = colorRampPalette(c("Darkblue", "white","red"))(palette_length)
my_breaks <- c(seq(min(summarized_viper_scores_df), 0, 
                   length.out=ceiling(palette_length/2) + 1),
               seq(max(summarized_viper_scores_df)/palette_length, 
                   max(summarized_viper_scores_df), 
                   length.out=floor(palette_length/2)))
library(pheatmap)
viper_hmap <- pheatmap(t(summarized_viper_scores_df),fontsize=14, 
                       fontsize_row = 10, 
                       color=my_color, breaks = my_breaks, 
                       main = "DoRothEA (ABC)", angle_col = 45,
                       treeheight_col = 0,  border_color = NA) 

image.png
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容