DESeq2跑差異基因

20210305 16:36
最近在處理single cell 數(shù)據(jù),老板要求用DESeq2跑差異基因,Seurat中自帶的DESeq2跑出來的有很多是0,很奇怪,也不懂是啥原因,沒有往下面細細研究,后來師姐用DESeq2直接跑,就是把每一個細胞當成一個樣本來跑的,細胞數(shù)目少的時候跑的時間用的少,但是細胞數(shù)目多的時候,時間就會用的多。之前都是換一組數(shù)據(jù),就得寫個代碼,很麻煩。今天給它包裝了下,跑起來方便多了,很開心。最近的粉絲漲了三個,開心開心,雖然離100的粉絲量還差好多,但是是在進步的就好。

1.函數(shù)
#!/~/bin/Rscript
## 第一行是Linux上Rscript上所在位置

# Function: This is a pipeline to find DEGs using DESeq2
# Input: 1.rds odject which includes counts variable and label variable; 2. outputname to save the DEGs
# Output: csv of DESeq2 DEGs
# Version: 2021-03-05, Yiyi Zheng

library(optparse)
library(getopt)
option_list <- list(
  make_option(c("-o", "--output"), type = "character", default = FALSE,
              action = "store", help = "This is the output directory."
  ),
  make_option(c("--rds_name"), type = "character", default = FALSE,
              action = "store", help = "This is the name of rds to input."
  ),
  make_option(c("--output_name"), type = "character", default = FALSE,
              action = "store", help = "This is the name of ouput file."
  )
)

# -help 
opt = parse_args(OptionParser(option_list = option_list, 
                              usage = "This is a script to find DEGs using DESeq2"))
print(opt)

# set the output path
library(DESeq2)
DESeq2_Object <- readRDS(opt$rds_name)

setwd(opt$output)
timestart<-Sys.time() 
print("The time begining:")
print(timestart)

##  Main function 
print("The level and number of input data:")
# The levels of condition need to assign by yourself. DESeq2 will use the first level as the condtion.
#condition<-factor(c(rep("Control",7104), rep("Obesity",6965)), levels = c("Control","Obesity"))

condition <- factor(DESeq2_Object$label)
print(levels(condition))
print(table(condition))

print("The contrl is")
print(table(condition)[1])
print("The treat is")
print(table(condition)[2])

count <- DESeq2_Object$counts
count <- as.matrix(count) # Need to transform to matrix.

coldata <- data.frame(row.names=colnames(count), condition)                    
count_2 <- count[which(rowSums(count) > 0), ] # Exclude the gene which is 0 in all cells.
count <-  count_2+1
dds <- DESeqDataSetFromMatrix(count,coldata,design=~condition)
dds <- DESeq(dds)
res <- results(dds)
outputname <- opt$output_name

print("Finished to calculate DEGs and now writing the DEGs results.")
write.csv(res, outputname)

timeend<-Sys.time()
print("The time ending:")
print(timeend)
runningtime<-timeend-timestart
print("The time DESeq2 used is ")
print(runningtime) 
2.使用

該函數(shù)的使用就是在Linux上直接輸入?yún)?shù)就可以了,但是之前你需要構建一個rds object, 其中兩個變量是counts和label,構建如下:

rm(list = ls())
list.files()
library(Seurat)
Female_exp  <- readRDS("20210305_Female_DESeq2_Exp.rds")
all <- list()
all$counts <- Female_exp$counts
all$label <- Female_exp$sample_label
print(table(all$label))
## DESeq2中會把第一個level 作為control,第二個level作為treat.這個需要確認好,是誰對誰找的差異基因
saveRDS(all , "Female_allcells.rds")

然后就可以掛在后臺上跑了

i=Female_allcells.rds
nohup RunDESeq2_Linux.R -o . --rds_name $i --output_name $i.csv > $i.output.file 2>&1 &

進一步有一步的歡喜??!
今天是前進的一天

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯(lián)系作者
【社區(qū)內容提示】社區(qū)部分內容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發(fā)布,文章內容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

友情鏈接更多精彩內容