測序數(shù)據(jù)的質(zhì)控和過濾

見上一篇論文 http://m.itdecent.cn/p/0a8461b1ea7a

本文涉及到的軟件

RSEM軟件包 RSEM (RNA-Seq by Expectation-Maximization)
http://deweylab.github.io/RSEM/

安裝：
conda install -c bioconda rsem

Trinity軟件包

安裝：
conda install -c bioconda trinity

正文1.構(gòu)建參考基因組

a. 參考基因組.fa及注釋文件.gtf的下載
ensembl : ftp://ftp.ensembl.org/pub （我個(gè)人首選）
NCBI : ftp://ftp.ncbi.nih.gov/genomes/
UCSC：ftp://hgdownload.soe.ucsc.edu/goldenPath

注意：1. 參考基因組盡量下載dna.primary_assembly.fa.gz（僅包含組裝到染色體上的序列），如果沒有就下載dna.toplevel.fa.gz（包含未組裝到染色體上的scaffold），如下圖人的參考基因組：
人參考基因組：ftp://ftp.ensembl.org/pub/release-99/fasta/homo_sapiens/dna/
小鼠參考基因組：ftp://ftp.ensembl.org/pub/release-99/fasta/mus_musculus/dna/
大鼠參考基因組：ftp://ftp.ensembl.org/pub/release-99/fasta/rattus_norvegicus/dna/

image.png

2. 注釋文件下載要與參考基因組同一個(gè)網(wǎng)站，要不然后邊構(gòu)建參考基因組時(shí)會報(bào)錯(cuò)...，下載文件最好是chr.gtf.gz結(jié)尾.
人注釋文件：ftp://ftp.ensembl.org/pub/release-99/gtf/homo_sapiens
小鼠注釋文件：ftp://ftp.ensembl.org/pub/release-99/gtf/mus_musculus
大鼠注釋文件：ftp://ftp.ensembl.org/pub/release-99/gtf/rattus_norvegicus

image.png

下載完成后解壓，命令：“gunzip 文件.gz“.

image.png

b. 構(gòu)建參考基因組（以大鼠為例）
新建一個(gè)目錄，名字叫ref，將構(gòu)建好的索引放在這個(gè)目錄中。

rsem-prepare-reference --gtf /public/home/huangshsh/zxg/ref_genome/Rattus_norvegicus.Rnor_6.0.99.chr.gtf \
                                         --bowtie2 \
                                          -p 16   \
                                       /public/home/huangshsh/zxg/ref_genome/Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa  \
                                               ./rn6
#--gtf 參數(shù)后跟解壓后的注釋文件
#--bowtie2 利用bowtie2構(gòu)建索引，默認(rèn)bowtie2在path中，若沒有后邊要跟bowtie的路徑
#-p  參考基因組構(gòu)建索引的線程數(shù)，根據(jù)自己電腦實(shí)際情況來
#再跟上解壓后的參考基因組
#再跟上參考基因組索引的前綴（rn6 ）

索引構(gòu)建成功后如下圖

image.png

若出現(xiàn)錯(cuò)誤

image.png

這表明注釋文件和參考基因組不是同一來源，要換成同一來源的才行。

正文2.比對和表達(dá)定量

參考基因組索引構(gòu)建成功后，對每個(gè)樣本進(jìn)行比對和表達(dá)定量,以W2BR樣品為例

rsem-calculate-expression --strandedness reverse \
                                            --paired-end \
                                              -p 16  \
                                           --bowtie2  \
                                          ../cleandata/W2BR_P_R1.fq.gz  \
                                         ../cleandata/W2BR_P_R2.fq.gz   \
                                          /public/home/huangshsh/zxg/rat_trans_nn/rsem/ref/rn6 \
                                         W2BR


#--strandedness reverse  建庫方式，一般為illumina的方式，具體自己查資料
#--paired-end  雙尾測序
# -p 16程序運(yùn)行的線程數(shù)
# --bowtie2  比對使用bowtie2
#../cleandata/W2BR_P_R1.fq.gz 第一個(gè)測序文件
#../cleandata/W2BR_P_R2.fq.gz 第二個(gè)測序文件
# /public/home/huangshsh/zxg/rat_trans_nn/rsem/ref/rn6 參考基因組文件位置.../ref/,參考基因組索引前綴rn6
#W2BR 結(jié)果文件前綴（這個(gè)參數(shù)很容易遺漏，會報(bào)錯(cuò)的）

比對要一會，完成后結(jié)果如下：

image.png

一個(gè)文件夾和兩個(gè)文件，其中g(shù)enes.results是以基因?yàn)榛A(chǔ)的，包含了gene_id，對應(yīng)transcript_id, count,TPM和FPKM等信息。
isoforms.results是轉(zhuǎn)錄本為基礎(chǔ)的

正文3.生成表達(dá)量矩陣文件

利用trinity安裝包中的util/abundance_estimates_to_matrix.pl（一個(gè)perl腳本）

ls *.genes.results > genes.quant_files.txt   #將所有的genes.results文件的名字存到genes.quant_files.txt ，當(dāng)成一個(gè)目錄后邊要用

路徑到trinity安裝包下/util/abundance_estimates_to_matrix.pl \
      --est_method RSEM \
      --cross_sample_norm TMM \
       --gene_trans_map none  \
      --quant_files genes.quant_files.txt \
      --out_prefix rsem_matrix


#      or  /public/home/huangshsh/zxg/software/trinityrnaseq-2.9.0/util/abundance_estimates_to_matrix.pl --est_method <method> --quant_files file.listing_target_files.txt

#      Note, if only a single input file is given, it's expected to contain the paths to all the target abundance estimation files.

# Required:
#  --est_method <string>           RSEM|eXpress|kallisto|salmon  (needs to know what format to expect)
#  --gene_trans_map <string>           the gene-to-transcript mapping file. (if you don't want gene estimates, indicate 'none'.
  #                                                        可以從gtf中提取轉(zhuǎn)錄本信息，最后做成的文件要轉(zhuǎn)錄本-基因一一對應(yīng)，轉(zhuǎn)錄本一定要在前 
   #                                                        邊，兩者用\t分隔。
# Options:
#  --cross_sample_norm <string>         TMM|UpperQuartile|none   (default: TMM)
#  --name_sample_by_basedir             name sample column by dirname instead of filename
#      --basedir_index <int>            default(-2)
#  --out_prefix <string>                default: value for --est_method 前綴隨意命名，默認(rèn)為RSEM
#  --quant_files <string>              file containing a list of all the target files.

結(jié)果文件如下圖

image.png

我們后續(xù)要用到第一個(gè)，rsem_matrix.gene.counts.matrix

正文4.差異表達(dá)基因分析

要用到R包里邊的DESeq2或edgeR，所以R里邊要提前裝好DESeq2包或者edgeR包。
腳本命令用到trinity軟件目錄下的/Analysis/DifferentialExpression/run_DE_analysis.pl，所用到的腳本參數(shù)如下：

image.png

$TRINITY_HOME=~/zxg/software/trinityrnaseq-2.9.0    #trinity的安裝路徑

perl $TRINITY_HOME/Analysis/DifferentialExpression/run_DE_analysis.pl \
        --matrix ~/zxg/rat_trans_nn/rsem/rsem_matrix.gene.counts.matrix  \
        --method DESeq2 \
        --samples_file sample.list.txt

##sample.list.txt格式如上圖所示

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

2020-01-23 RSEM轉(zhuǎn)錄組RNA-seq測序分析實(shí)操

2020-01-23 RSEM轉(zhuǎn)錄組RNA-seq測序分析實(shí)操

測序數(shù)據(jù)的質(zhì)控和過濾

本文涉及到的軟件

正文1.構(gòu)建參考基因組

正文2.比對和表達(dá)定量

正文3.生成表達(dá)量矩陣文件

正文4.差異表達(dá)基因分析

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

2020-01-23 RSEM轉(zhuǎn)錄組RNA-seq測序分析實(shí)操

測序數(shù)據(jù)的質(zhì)控和過濾

本文涉及到的軟件

正文1.構(gòu)建參考基因組

正文2.比對和表達(dá)定量

正文3.生成表達(dá)量矩陣文件

正文4.差異表達(dá)基因分析

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av