一. 簡(jiǎn)介
Snippy是一款用于SNP檢測(cè)的軟件,可以通過(guò)分析得到核心SNP,進(jìn)行比對(duì)構(gòu)建進(jìn)化樹(shù)。
Snippy finds SNPs between a haploid reference genome and your NGS sequence reads. It will find both substitutions (snps) and insertions/deletions (indels). It will use as many CPUs as you can give it on a single computer (tested to 64 cores). It is designed with speed in mind, and produces a consistent set of output files in a single folder. It can then take a set of Snippy results using the same reference and generate a core SNP alignment (and ultimately a phylogenomic tree).
二. 安裝
可以利用conda進(jìn)行安裝:
conda install -c bioconda snippy
也可以直接從Github安裝最新版本(conda安裝試了幾次都是老版本的,找不到snippy-multi):
cd $ HOME
git clone https://github.com/tseemann/snippy.git
$HOME/snippy/bin/snippy --help
三. 運(yùn)行
snippy運(yùn)行常用參數(shù)包括:輸出文件(--outdir),參考基因組文件(--ref ),輸入文件可以是單末端(--se)或雙末端(--R1,--R2)fastq文件,也可以是fasta文件(--ctgs)或bam文件(--bam),CPU數(shù)目(--cpus 默認(rèn)8個(gè))
snippy [options] --outdir <dir> --ref <ref> --R1 <R1.fq.gz> --R2 <R2.fq.gz> --cpus 10
snippy [options] --outdir <dir> --ref <ref> --se <R.fq.gz> --cpus 10
snippy [options] --outdir <dir> --ref <ref> --ctgs <contigs.fa> --cpus 10
snippy [options] --outdir <dir> --ref <ref> --bam <reads.bam> --cpus 10
具體詳細(xì)參數(shù)如下:
RESOURCES
--cpus N Maximum number of CPU cores to use (default '8')
--ram N Try and keep RAM under this many GB (default '8')
--tmpdir F Fast temporary storage eg. local SSD (default '/tmp')
INPUT
--reference F Reference genome. Supports FASTA, GenBank, EMBL (not GFF) (default '')
--R1 F Reads, paired-end R1 (left) (default '')
--R2 F Reads, paired-end R2 (right) (default '')
--se F Single-end reads (default '')
--ctgs F Don't have reads use these contigs (default '')
--peil F Reads, paired-end R1/R2 interleaved (default '')
--bam F Use this BAM file instead of aligning reads (default '')
--targets F Only call SNPs from this BED file (default '')
--subsample n.n Subsample FASTQ to this proportion (default '1')
OUTPUT
--outdir F Output folder (default '')
--prefix F Prefix for output files (default 'snps')
--report Produce report with visual alignment per variant (default OFF)
--cleanup Remove most files not needed for snippy-core (inc. BAMs!) (default OFF)
--rgid F Use this @RG ID: in the BAM header (default '')
--unmapped Keep unmapped reads in BAM and write FASTQ (default OFF)
PARAMETERS
--mapqual N Minimum read mapping quality to consider (default '60')
--basequal N Minimum base quality to consider (default '13')
--mincov N Minimum site depth to for calling alleles (default '10')
--minfrac n.n Minumum proportion for variant evidence (0=AUTO) (default '0')
--minqual n.n Minumum QUALITY in VCF column 6 (default '100')
--maxsoft N Maximum soft clipping to allow (default '10')
--bwaopt F Extra BWA MEM options, eg. -x pacbio (default '')
--fbopt F Extra Freebayes options, eg. --theta 1E-6 --read-snp-limit 2 (default '')
也可以利用snippy-multi生成shell腳本文件批量執(zhí)行,snippy-multi輸入文件包括:
snippy-multi abc.txt --reference ref.gbk --cpus 10 > run_snp.sh
nohup ./run_snp.sh &
- 文件名和路徑列表文件,格式如下:
abc.txt:
a /Absolute path/a.fq.gz
b /Absolute path/b.fq.gz
c /Absolute path/c.fq.gz
...
- 參考序列文件,可以是fasta文件,也可以是gbk文件。
- 需要分析的fq文件或fasta文件。
eg: more run_snp.sh
snippy --outdir a --ref ref.fas --se a.fq.gz --cpus 10
snippy --outdir b --ref ref.fas --se b.fq.gz --cpus 10
snippy --outdir c --ref ref.fas --se c.fq.gz --cpus 10
...
snippy-core --ref 'a/ref.fa' a b c ...
得到的run_snp.sh腳本是逐個(gè)執(zhí)行,如果服務(wù)器性能好可以對(duì)腳本進(jìn)行修改,在snippy命令行加上:nohup &,同時(shí)運(yùn)行多個(gè)snippy命令;等所有snippy運(yùn)行完后在單獨(dú)執(zhí)行snippy-core 命令。
上述命令運(yùn)行完之后,再執(zhí)行以下命令構(gòu)建進(jìn)化樹(shù):
nohup snippy-clean_full_aln core.full.aln > clean.full.aln &
nohup run_gubbins.py -p gubbins clean.full.aln & # 報(bào)錯(cuò)可以調(diào)整--filter_percentage 50
nohup snp-sites -c gubbins.filtered_polymorphic_sites.fasta > clean.core.aln &
nohup FastTree -gtr -nt clean.core.aln > clean.core.tree &
其中,snippy,snippy-core,snippy-multi,snippy-clean_full_aln命令可以在~ /snippy/bin/目錄下找到,snp-sites命令在~/snippy/binaries/linux/目錄下,run_gubbins.py需要另外安裝gubbins(conda install -c bioconda gubbins),如果找不到FastTree也需另外安裝。