002-尋找高變基因和降維畫tsne和umap

首先打開rstudio

打開后用getwd()查看當(dāng)前工作路徑

getwd()

如果路徑跟上次的不一樣,重新設(shè)置一下路徑

setwd("X:/xxx/xxxx")
setwd("F:/Rstudio_data/001_singlecell_code/raw/BC21/")

重新加載R包

library(Seurat)
library(tidyverse)
library(patchwork)
library(dplyr)

加載數(shù)據(jù)

scRNA <-load("scRNA1.Rdata")

官方推薦是2000個高變基因,很多文章也有設(shè)置30000的,這個因自己的實驗項目決定

scRNA1 <- FindVariableFeatures(scRNA1, selection.method = "vst", nfeatures = 3000) 

Identify the 10 most highly variable genes,把top10的高變基因挑選出來,目的是為了作圖

top10 <- head(VariableFeatures(scRNA1), 10) 

plot variable features with and without labels 畫出來不帶標(biāo)簽的高變基因圖

plot1 <- VariableFeaturePlot(scRNA1) 

把top10的基因加到圖中

plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE, size=2.5) 
plot <- CombinePlots(plots = list(plot1, plot2),legend="bottom") 
plot 

數(shù)據(jù)標(biāo)準(zhǔn)化(中心化)

如果內(nèi)存足夠最好對所有基因進行中心化

scale.genes <-  rownames(scRNA1)
scRNA1 <- ScaleData(scRNA1, features = scale.genes)

如果內(nèi)存不夠,可以只對高變基因進行標(biāo)準(zhǔn)化

scale.genes <-  VariableFeatures(scRNA)
scRNA <- ScaleData(scRNA, features = scale.genes)

scRNA對象中原始表達矩陣經(jīng)過標(biāo)準(zhǔn)化和中心化之后,已經(jīng)產(chǎn)生了三套基因表達數(shù)據(jù),可以通過以下命令獲得

原始表達矩陣

GetAssayData(scRNA,slot="counts",assay="RNA") 

標(biāo)準(zhǔn)化之后的表達矩陣

GetAssayData(scRNA,slot="data",assay="RNA")

中心化之后的表達矩陣

GetAssayData(scRNA,slot="scale.data",assay="RNA") 

細(xì)胞周期回歸:上一步找到的高變基因,常常會包含一些細(xì)胞周期相關(guān)基因。

它們會導(dǎo)致細(xì)胞聚類發(fā)生一定的偏移,即相同類型的細(xì)胞在聚類時會因為細(xì)胞周期的不同而分開。

cc.genes
CaseMatch(c(cc.genes$s.genes,cc.genes$g2m.genes),VariableFeatures(scRNA1))

細(xì)胞周期評分

g2m_genes = cc.genes$g2m.genes
g2m_genes = CaseMatch(search = g2m_genes, match = rownames(scRNA1))
s_genes = cc.genes$s.genes
s_genes = CaseMatch(search = s_genes, match = rownames(scRNA1))
scRNA1 <- CellCycleScoring(object=scRNA1,  g2m.features=g2m_genes,  s.features=s_genes)

查看細(xì)胞周期基因?qū)?xì)胞聚類的影響

scRNAa <- RunPCA(scRNA1, features = c(s_genes, g2m_genes))
p <- DimPlot(scRNAa, reduction = "pca", group.by = "Phase")
p
VlnPlot(scRNAa, features = c("nFeature_RNA", "nCount_RNA", "percent.mt","percent.HB","G2M.Score","S.Score"), ncol = 6)
ggsave("cellcycle_pca.png", p, width = 8, height = 6)

如果需要消除細(xì)胞周期的影響

scRNAb <- ScaleData(scRNA1, vars.to.regress = c("S.Score", "G2M.Score"), features = rownames(scRNA1))

PCA降維并提取主成分

PCA降維

scRNA1 <- RunPCA(scRNA1, features = VariableFeatures(scRNA1)) 
plot1 <- DimPlot(scRNA1, reduction = "pca", group.by="orig.ident") 
### 畫圖
plot1 
####確定數(shù)據(jù)的維度 Determine the ‘dimensionality’ of the dataset 
###ElbowPlot() 可以快速的檢查降維的效果
plot2 <- ElbowPlot(scRNA1, ndims=20, reduction="pca") 
##畫圖
plot2
###我們一般選擇拐點作為降維的度數(shù)。
plotc <- plot1+plot2
ggsave("pca.pdf", plot = plotc, width = 8, height = 4) 
ggsave("pca.png", plot = plotc, width = 8, height = 4)

后續(xù)分析要根據(jù)右圖選擇提取的pc軸數(shù)量,一般選擇斜率平滑的點之前的所有pc軸,此圖我的建議是選擇前13個pc軸。

可以看出大概在PC為13的時候,每個軸是有區(qū)分意義的。

pc.num=1:13

細(xì)胞聚類

Identify clusters of cells by a shared nearest neighbor (SNN) modularity optimization based clustering algorithm. First calculate k-nearest neighbors and construct the SNN graph. Then optimize the modularity function to determine clusters. For a full description of the algorithms, see Waltman and van Eck (2013) The European Physical Journal B. Thanks to Nigel Delaney (evolvedmicrobe@github) for the rewrite of the Java modularity optimizer code in Rcpp!

scRNA1 <- FindNeighbors(scRNA1, dims = pc.num) 
scRNA1 <- FindClusters(scRNA1, resolution = 0.5)

這個resolution(分辨率)是可以自定義的,當(dāng)我們的樣本細(xì)胞數(shù)較大時候resolution 要高一些,一般情況2萬細(xì)胞以上都是大于1.0的
查看每一類有多少個細(xì)胞

table(scRNA1@meta.data$seurat_clusters)
metadata <- scRNA@meta.data
cell_cluster <- data.frame(cell_ID=rownames(metadata), cluster_ID=metadata$seurat_clusters)
write.csv(cell_cluster,'cluster/cell_cluster.csv',row.names = F)

可視化降維有兩個方法tSNE和UMAP

非線性降維——這個目的是為了可視化,而不是特征提取(PCA),雖然它也可以用來做特征提取。

tSNE

scRNA1 = RunTSNE(scRNA1, dims = pc.num)
embed_tsne <- Embeddings(scRNA1, 'tsne')
write.csv(embed_tsne,'embed_tsne.csv')
plot1 = DimPlot(scRNA1, reduction = "tsne") 
##畫圖
plot1
###label = TRUE把注釋展示在圖中
DimPlot(scRNA1, reduction = "tsne",label = TRUE) 
###你會發(fā)現(xiàn)cluster都標(biāo)了圖中
ggsave("tSNE.pdf", plot = plot1, width = 8, height = 7)
##把圖片保存一下

UMAP---第二種可視化降維

scRNA1 <- RunUMAP(scRNA1, dims = pc.num)
embed_umap <- Embeddings(scRNA1, 'umap')
write.csv(embed_umap,'embed_umap.csv') 
plot2 = DimPlot(scRNA1, reduction = "umap") 
plot2
ggsave("UMAP.pdf", plot = plot2, width = 8, height = 7)

合并tSNE與UMAP

plotc <- plot1+plot2+ plot_layout(guides = 'collect')
plotc
ggsave("tSNE_UMAP.pdf", plot = plotc, width = 10, height = 5)

保存數(shù)據(jù)這節(jié)課的數(shù)據(jù)

saveRDS(scRNA1, file="scRNA1.rds")
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容