Sambamba 去除重復(fù)工具

寫在前面

為什么會(huì)用這個(gè)工具呢
因?yàn)槲衣犝f很快,并且被 samtools markdup 和 picard 傷到了。用 samtools markdup的時(shí)候提醒我要先 fixmate 并且 sort 按照 read name 來,可是我先前是按照默認(rèn)的sort方式來的,emmm。gatk picard 去除重復(fù)后,比原先文件還大,加了什么鬼東西啊

附上此工具鏈接

http://lomereiter.github.io/sambamba/docs/sambamba-markdup.html

開始

gzip -d sambamba-0.6.8.gz
chmod a+x sambamba-0.6.8

./sambamba-0.8.6

下載解壓,放進(jìn)環(huán)境變量,就是如此簡單,不需要安裝。

NAME

sambamba-markdup - finding duplicate reads in BAM file

SYNOPSIS

sambamba markdup OPTIONS <input.bam> <output.bam>

DESCRIPTION

Marks (by default) or removes duplicate reads. For determining whether a read is a duplicate or not, the same criteria as in Picard are used.

OPTIONS

-r, --remove-duplicates
remove duplicates instead of just marking them

-t, --nthreads=NTHREADS
number of threads to use

-l, --compression-level=N
specify compression level of the resulting file (from 0 to 9)");

-p, --show-progress
show progressbar in STDERR

--tmpdir=TMPDIR
specify directory for temporary files; default is /tmp

--hash-table-size=HASHTABLESIZE
size of hash table for finding read pairs (default is 262144 reads); will be rounded down to the nearest power of two; should be > (average coverage) * (insert size) for good performance

--overflow-list-size=OVERFLOWLISTSIZE
size of the overflow list where reads, thrown away from the hash table, get a second chance to meet their pairs (default is 200000 reads); increasing the size reduces the number of temporary files created

--io-buffer-size=BUFFERSIZE
controls sizes of two buffers of BUFFERSIZE megabytes each, used for reading and writing BAM during the second pass (default is 128)

測試

去重復(fù)特別快,3G的bam文件去重復(fù)時(shí)間只用了1min。

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • pyspark.sql模塊 模塊上下文 Spark SQL和DataFrames的重要類: pyspark.sql...
    mpro閱讀 9,934評論 0 13
  • Introduction What is Bowtie 2? Bowtie 2 is an ultrafast a...
    wzz閱讀 6,243評論 0 5
  • NAME dnsmasq - A lightweight DHCP and caching DNS server....
    ximitc閱讀 2,996評論 0 0
  • 煙花易冷,人事易分 青春無己,已知煙花易冷; 離合天定,信哉人事易分
    嚭囈閱讀 412評論 0 0
  • 數(shù)百年前,東海龍族因?yàn)槟倪溉哟篝[東海,殺了敖丙,結(jié)下了深仇大恨,因此在封神之戰(zhàn)中選擇支持商紂王。也正因?yàn)槿绱耍?..
    書生幺閱讀 295評論 0 0

友情鏈接更多精彩內(nèi)容