miRNA 專題 | 數(shù)據(jù)過濾 & 比對 & 靶基因預(yù)測

本文主要參考知乎帖子三種方法提取miRNA成熟體序列

如何提取感興趣物種的miRNA成熟體序列,有三種方式。

  • perl、python 腳本
  • R腳本
  • Notepad++ 或 EmEditor正則表達(dá)式查找替換

miRNA分析流程

具體參考[miRNA 數(shù)據(jù)過濾我使用cutadapt](miRNA 數(shù)據(jù)過濾我使用cutadapt),進(jìn)行了一些整理,感謝博主的分享。

一、miRNA 數(shù)據(jù)過濾(植物18~30nt)

cutadapt  -a  AGATCGGAAGAGCACACGTCT  -m  15  -q  20  --discard-untrimmed  -o  outname .fa
  • --discard-untrimmed 把reads 中不含有adaper的reads 去掉。
  • -a 剪切reads 3' 端adapter (雙端測序第一條read),加$表示adapter錨定在reads 3'端(可找公司要)。
  • -g 剪切reads 5'端adapter (雙端測序第一條reads),加$表示adapter錨定在reads 5'端。
  • -q 低質(zhì)量堿基。
  • -m reads 短于15時(shí),丟棄該reads。

獲得合適長度的reads

二、miRNA 比對

  • 方案1. 比對到Rfam中的ncRNA,去除snRNA,snoRNA,rRNA和tRNA等。
  • 方案2. 將miRNA 比對到目標(biāo)物種的參考基因組上,去除那些匹配不上的序列。

為了減少比對時(shí)間,在比對之前可將每個(gè)樣本中的reads 進(jìn)行合并,得到fasta 格式,其命名規(guī)則為:樣本_r數(shù)字_x數(shù)字,其中r中的數(shù)字表示reads序號;x中數(shù)字表示該條reads重復(fù)次數(shù)

miR-PREFeR 軟件的使用

介紹:miR-PREFeR: microRNA PREdiction From small RNAseq data,本文主要參考github上的tutorial。
借助miR-PREFeR軟件比對到參考基因組,鑒定新的miRNA。

分析流程

1. Required programs (必要的安裝包)

a. 提前安裝ViennaRNA,且版本最好在1.8.5、2.1.2、 2.1.5及以上 。

wget  https://www.tbi.univie.ac.at/RNA/download/sourcecode/2_4_x/ViennaRNA-2.4.18.tar.gz
tar  zvxf  ViennaRNA-2.4.18.tar.gz
cd  ViennaRNA-2.4.18.tar.gz
./configure --prefix="/user/tools/ViennaRNA/" --without-perl
make
make  install

b. 安裝samtools (0.1.15 或之后的版本)

cd   /manager/biosoft/
tar  jfx  samtools-0.1.19.tar.bz2
cd  samtools-0.1.19
make 

?注意:由于miR-PREFeR是基于Python2版本,所以Python3版本運(yùn)行會報(bào)錯!

The current version is only tested under Python 2.6.7, Python 2.7.2 and Python 2.7.3 and should work under Python 2.6. and Python 2.7.

2. Obtain and install the pipeline (下載安裝miR-PREFeR)

git clone https://github.com/hangelwen/miR-PREFeR.git

?如果沒法上下載git,可以從我網(wǎng)盤下載。
鏈接:https://pan.baidu.com/s/1UqkKYDOGcjv13dHm9pi9ew
提取碼:volh

3. Test the pipeline (軟件調(diào)試用,可以跳過)

作者貼心的給出了測試數(shù)據(jù)(example/exampledata.tar.gz)以及測試整個(gè)軟件的pipeline(HOW_TO_RUN_EXAMPLE.txt)。

以下是該HOW_TO_RUN_EXAMPLE.txt的具體內(nèi)容,下面具體看看

================================================================================
1. Test the pipeline.

# The package provides a small example dataset for testing the pipeline. The
# dataset is for Aradidopsis, chromosome 1. To run the example, first change
# directory to the example folder:

cd  example
tar  xvf  exampledata.tar.gz       #  Then decompress the exampledata.tar.gz file:

# Then open the config.example file, change the PIPELINE_PATH to the path where
# you put the miR-PREFeR package folder. For example, if you put miR-PREFeR at
# /home/username/tools/miR-PREFeR-v0.09, then set PIPELINE_PATH as:
PIPELINE_PATH=/home/username/tools/miR-PREFeR-v0.09

# Save the config.example file. In the example folder, execute command:
python  ../miR_PREFeR.py  -L  -k  pipeline  config.example

# The -L option generates a log file in the output directory example-result. The
# -k option keeps the temp directory used to store the intermediate files. The
# temp directory is in the example-result directory.

# If you have python, samtools, RNALfold installed and in the PATH, you should be
# able to run the test program. It takes about one or two minutes to
# finish. You'll be able to see the result in the example-result folder.



================================================================================
2. Test how to do checkpointing.

# Before testing this, if you have run the pipeline with the example.config file
# in this folder, please remove the example-result folder first.

# Then change the 'CHECKPOINT_SIZE' option to a smaller value (30, for
# example). The reason to do this is that by default the pipeline makes a
# checkpoint after finishing folding every 3000 sequences, but the sample data is
# so small that the total number of sequences is smaller than the default.

# Then run the pipeline with 'pipeline' command:
python  ../miR_PREFeR.py  -L  -k  pipeline  config.example

# After running for a while (10 seconds, for example. You should let it run for
# enough time to do at least one checkpoint. A "Done" is shown when a checkpoint
# is applied), kill the process by "Ctrl-C". To check where the pipeline was stopped,
# run:
python ../miR_PREFeR.py -L check config.example

# This will show the checkpoint information.

# To restart the pipeline from where it was stopped, run:
python  ../miR_PREFeR.py  -L  recover  config.example

# The pipeline will continue to finish the job specified in the config.example
file.
================================================================================

4. How to run the pipeline (現(xiàn)在正式干活了)

a. Prepare input data for the pipeline.
  1. A fasta file, which contains the gnome sequences of the species under study.
  2. one or more SAM files which contains the alignments of small RNAseq data with the gnome.
  3. (Optional) An GFF (http://www.sanger.ac.uk/resources/software/gff/spec.html) file which lists regions in the gnome sequences that should be ignored from miRNA analysis.
a). Genome fasta file (是A fasta file的解讀)

Fasta format specification can be found at http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml. In miR-PREFeR, for the string following ">", only the first word that is delimited by any white space characters (whitespace, tab, etc) is used. For example, for the following sequence, 'ath-MIR773a' is used as the identifier of the seqeunce. Thus, please ensure that all the sequences in the FASTA files have different identifiers.

>ath-MIR773a MI0005103
AGGAGGCAAUAGCUUGAGCAAAUAAUUGAUUGCAGAAGUCCAUCGACUAAAGCUGUCACCUGUUUGCUUCCAGCUUUUGUCUCCU
b). SAM alignment files (是SAM files的解讀)

The miR-PREFeR pipeline takes SAM format alignment files. SAM alignment files can be generated by many aligners. Here we use Bowtie (http://bowtie-bio.sourceforge.net/index.shtml) as an example.

\color{green}{\it\small{注意}}

今天累了,未完待續(xù)....

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容