引言:以下學(xué)習(xí)筆記主要參考《一文學(xué)會常規(guī)轉(zhuǎn)錄組分析》“http://m.itdecent.cn/p/bdeebd669eb8”
1.數(shù)據(jù)獲取及質(zhì)控
提前安裝Stratooklit、Prefetch、Aspera、Fastaqc、Multiqc
創(chuàng)建下載數(shù)據(jù)記錄號的文件
#cat dir_6.txt
SRR3286802
SRR3286803
SRR3286804
SRR3286805
SRR3286806
SRR3286807
1.1Aspera下載SRA數(shù)據(jù):
使用命令
下載
解壓
sh ibm-aspera-connect_4.1.3.93_linux.s
#~/.aspera/connect/bin/ascp -i ~/.aspera/connect/etc/asperaweb_id_dsa.putty --mode recv --host ftp-private.ncbi.nlm.nih.gov --user anonftp --file-list dir_6.txt
報錯1:
ascp: destination required
Startup failed, exit
解決1:將.aspera路徑改為絕對路徑,最后的Data是你下載文件的指定路徑
#/home/radish/.aspera/connect/bin/ascp -i /home/radish/.aspera/connect/etc/asperaweb_id_dsa.putty --mode recv --host ftp-private.ncbi.nlm.nih.gov --user anonftp --file-list dir_6.txt Data/
報錯2:
ascp: Failed to open TCP connection for SSH, exiting.
Session Stop? (Error: Failed to open TCP connection for SSH)
2.下載gff/gtf注釋文件并提取出感興趣的基因/轉(zhuǎn)錄本區(qū)間
#less Arabidopsis_thaliana.TAIR10.42.gff3 | awk'{ if($3=="gene") print $0 }'>gene27655.gff
3.安裝Hisat2
3.1root下安裝,所以無需寫bashrc
#anaconda search -t conda hisat2
#anaconda show bioconda/hisat2
#conda install --channel https://conda.anaconda.org/bioconda hisat2
運(yùn)行
#hisat
沒問題
3.2如果普通用戶,則需要寫入bashrc
#vi ~/.bashrc
#export PATH=~/home/radish/bio_soft/hisat2-2.2.0/hisat2:$PATH
#source ~/.bashrc
3.3將SRA數(shù)據(jù)比對到參考基因組:
3.3.1建立索引:
#hisat2-build Arabidopsis_thaliana.TAIR10.dna.toplevel.fa Arabidopsis_thaliana &
3.3.2單獨(dú)比對:
#hisat2 -p 6 -x Arabidopsis_thaliana -1 SRR3286802_1.fastq.gz -2 SRR3286802_2.fastq.gz -S SRR3286802.sam
3.3.2腳本比對:
#cat 3.sh
for i in `seq 2 7`
do
hisat2? -x? ~/bio_soft/Arabidopsis_thaliana? -p? 8? \
-1? ~/bio_soft/SRR328680${i}_1.fastq.gz? \
-2? ~/bio_soft/SRR328680${i}_2.fastq.gz? \
-S? ~/bio_soft/SRR328680${i}.sam
done
#sh 3.sh
報告文件來看比對率都挺高的,97%以上。
4.sam轉(zhuǎn)bam并排序。安裝Samtools時報錯:
#ibncurses.so.5: cannot open shared object fil
解決:
#whereis libncurses.so.5
#ln -s /usr/lib64/libncurses.so.6.1 /usr/lib64/libncurses.so.5
安裝Samtools
與上述Hisat2同命令
運(yùn)行:
單獨(dú)轉(zhuǎn)換和排序:
#samtools view -bS SRR3286805.sam > SRR3286805.bam
#samtools sort SRR3286805.bam > SRR3286805.n.bam
腳本轉(zhuǎn)化和排序:
#cat 1.sh
for i in `seq 2 7`
do
samtools view -@ 8 -Sb SRR328680${i}.sam > SRR328680${i}.bam
samtools sort -@ 8 -n SRR328680${i}.bam > SRR328680${i}.n.bam
done
#sh 1.sh
5.計算表達(dá)量
5.1.安裝FeatureCounts
#export PATH=~/home/radish/bio_soft/subread-1.6.0-Linux-x86_64/bin:$PATH
5.2.安裝Stringtie
#wget http://ccb.jhu.edu/software/stringtie/dl/stringtie-1.3.3b.Linux_x86_64.tar.gz
#tar -zvxf stringtie-1.3.3b.Linux_x86_64.tar.gz
#cd stringtie-1.3.3b.Linux_x86_64/
#pwd
將打印出來的路徑寫入bashrc
#vi ~/.bashrc
#export PATH=~/home/radish/bio_soft/stringtie-1.3.3b.Linux_x86_64/stringtie:$PATH
#source ~/.bashrc
未完待續(xù)