20210305 16:36
最近在處理single cell 數(shù)據(jù),老板要求用DESeq2跑差異基因,Seurat中自帶的DESeq2跑出來的有很多是0,很奇怪,也不懂是啥原因,沒有往下面細細研究,后來師姐用DESeq2直接跑,就是把每一個細胞當成一個樣本來跑的,細胞數(shù)目少的時候跑的時間用的少,但是細胞數(shù)目多的時候,時間就會用的多。之前都是換一組數(shù)據(jù),就得寫個代碼,很麻煩。今天給它包裝了下,跑起來方便多了,很開心。最近的粉絲漲了三個,開心開心,雖然離100的粉絲量還差好多,但是是在進步的就好。
1.函數(shù)
#!/~/bin/Rscript
## 第一行是Linux上Rscript上所在位置
# Function: This is a pipeline to find DEGs using DESeq2
# Input: 1.rds odject which includes counts variable and label variable; 2. outputname to save the DEGs
# Output: csv of DESeq2 DEGs
# Version: 2021-03-05, Yiyi Zheng
library(optparse)
library(getopt)
option_list <- list(
make_option(c("-o", "--output"), type = "character", default = FALSE,
action = "store", help = "This is the output directory."
),
make_option(c("--rds_name"), type = "character", default = FALSE,
action = "store", help = "This is the name of rds to input."
),
make_option(c("--output_name"), type = "character", default = FALSE,
action = "store", help = "This is the name of ouput file."
)
)
# -help
opt = parse_args(OptionParser(option_list = option_list,
usage = "This is a script to find DEGs using DESeq2"))
print(opt)
# set the output path
library(DESeq2)
DESeq2_Object <- readRDS(opt$rds_name)
setwd(opt$output)
timestart<-Sys.time()
print("The time begining:")
print(timestart)
## Main function
print("The level and number of input data:")
# The levels of condition need to assign by yourself. DESeq2 will use the first level as the condtion.
#condition<-factor(c(rep("Control",7104), rep("Obesity",6965)), levels = c("Control","Obesity"))
condition <- factor(DESeq2_Object$label)
print(levels(condition))
print(table(condition))
print("The contrl is")
print(table(condition)[1])
print("The treat is")
print(table(condition)[2])
count <- DESeq2_Object$counts
count <- as.matrix(count) # Need to transform to matrix.
coldata <- data.frame(row.names=colnames(count), condition)
count_2 <- count[which(rowSums(count) > 0), ] # Exclude the gene which is 0 in all cells.
count <- count_2+1
dds <- DESeqDataSetFromMatrix(count,coldata,design=~condition)
dds <- DESeq(dds)
res <- results(dds)
outputname <- opt$output_name
print("Finished to calculate DEGs and now writing the DEGs results.")
write.csv(res, outputname)
timeend<-Sys.time()
print("The time ending:")
print(timeend)
runningtime<-timeend-timestart
print("The time DESeq2 used is ")
print(runningtime)
2.使用
該函數(shù)的使用就是在Linux上直接輸入?yún)?shù)就可以了,但是之前你需要構建一個rds object, 其中兩個變量是counts和label,構建如下:
rm(list = ls())
list.files()
library(Seurat)
Female_exp <- readRDS("20210305_Female_DESeq2_Exp.rds")
all <- list()
all$counts <- Female_exp$counts
all$label <- Female_exp$sample_label
print(table(all$label))
## DESeq2中會把第一個level 作為control,第二個level作為treat.這個需要確認好,是誰對誰找的差異基因
saveRDS(all , "Female_allcells.rds")
然后就可以掛在后臺上跑了
i=Female_allcells.rds
nohup RunDESeq2_Linux.R -o . --rds_name $i --output_name $i.csv > $i.output.file 2>&1 &
進一步有一步的歡喜??!
今天是前進的一天