hicPro+EndHiC(二)染色體掛載

EndHic

想比較HiC-Pro,EndHic的安裝就簡單很多,就是下載即可用

EndHiC的安裝

git clone git@github.com:fanagislab/EndHiC.git

要用到的腳本都在文件夾下,直接調(diào)用就行

怎么使用呢?不得不說一下,github上面寫的簡直潦草~~

還不如直接看他給出的實例中的腳本來得直接

EndHiC的使用

給出的實例腳本

$ cat biosoft/EndHiC/z.testing_data/Arabidopsis_thalina/work.sh
##Atha.contigs.fa is generated by Hifiasm
##AthaHiC_100000_abs.bed, AthaHiC_100000.matrix, AthaHiC_100000_iced.matrix are generated by HiC-pro using Atha.contigs.fa as the reference genome

gzip -d Atha.contigs.fa.gz

##get contig length
perl ../../fastaDeal.pl -attr id:len Atha.contigs.fa > Atha.contigs.fa.len

##draw contig Hi-C heatmaps with 10*100000 (1-Mb) resolution
../../matrix2heatmap.py AthaHiC_100000_abs.bed AthaHiC_100000.matrix 10

##Run one round, when the contig assembly is quite good
perl ../../endhic.pl Atha.contigs.fa.len AthaHiC_100000_abs.bed AthaHiC_100000.matrix AthaHiC_100000_iced.matrix

ln Round_A.04.summary_and_merging_results/z.EndHiC.A.results.summary.cluster* ./


##convert cluster file to agp file
perl ../../cluster2agp.pl Round_A.04.summary_and_merging_results/z.EndHiC.A.results.summary.cluster Atha.contigs.fa.len > Atha.scaffolds.agp

##get final scaffold sequence file
perl ../../agp2fasta.pl Atha.scaffolds.agp Atha.contigs.fa > Atha.scaffolds.fa

##draw HiC heatmaps for scaffolds with 10*100000 (1-Mb) resolution
../../cluster2bed.pl AthaHiC_100000_abs.bed z.EndHiC.A.results.summary.cluster > clusterA_100000_abs.bed 2> clusterA.id.len
../../matrix2heatmap.py clusterA_100000_abs.bed AthaHiC_100000.matrix 10

##Here, Arabidopsis thalina has 5 chromosomes, and all these chromosomes can be successfully scaffolded by EndHiC

使用的數(shù)據(jù)就是我們上一步HiC-Pro輸出的數(shù)據(jù):

改良后的腳本

contig=/share/home/off/Work/Genome_assembly/Assembly/contig.fa  ##contig文件,一定要和HiC-Pro中的contig保持一致
endhic_dir=/share/home/off_wenhao/biosoft/EndHiC    ##EndHiC的安裝路徑
name=dlo    ##物種名稱,也要和HiC-Pro設(shè)置的保持一致,也是就是hic-pro的輸出文件夾`**_outdir_new`

##get contig length
perl ${endhic_dir}/fastaDeal.pl -attr id:len ${contig} > contigs.fa.len

##draw contig Hi-C heatmaps with 10*100000 (1-Mb) resolution
hic_pro_dir=/share/home/off/Work/Genome_assembly/Assembly/08.EndHiC/01.hicprp/${name}_outdir_new/hic_results/matrix/${name}


${endhic_dir}/matrix2heatmap.py ${hic_pro_dir}/raw/100000/${name}_100000_abs.bed ${hic_pro_dir}/raw/100000/${name}_100000.matrix 10

##Run one round, when the contig assembly is quite good

perl ${endhic_dir}/endhic.pl contigs.fa.len ${hic_pro_dir}/raw/100000/${name}_100000_abs.bed ${hic_pro_dir}/raw/100000/${name}_100000.matrix ${hic_pro_dir}/iced/100000/${name}_100000_iced.matrix

ln  Round_A.04.summary_and_merging_results/z.EndHiC.A.results.summary.cluster* ./

##convert cluster file to agp file
perl ${endhic_dir}/cluster2agp.pl Round_A.04.summary_and_merging_results/z.EndHiC.A.results.summary.cluster contigs.fa.len > scaffolds.agp

##get final scaffold sequence file
perl ${endhic_dir}/agp2fasta.pl scaffolds.agp ${contig} > ${name}.scaffolds.fa

##draw HiC heatmaps for scaffolds with 10*100000 (1-Mb) resolution
${endhic_dir}/cluster2bed.pl ${hic_pro_dir}/raw/100000/${name}_100000_abs.bed Round_A.04.summary_and_merging_results/z.EndHiC.A.results.summary.cluster > clusterA_100000_abs.bed 2> clusterA.id.len
${endhic_dir}/matrix2heatmap.py clusterA_100000_abs.bed ${hic_pro_dir}/raw/100000/${name}_100000.matrix 10

結(jié)果

clusterA.id.len
clusterA_100000_abs.bed   
clusterA_100000_abs.bed.pdf 
endhic.100000.10.iced.sh  
endhic.100000.20.iced.sh  
endhic.100000.5.iced.sh                            
endhic.100000.10.raw.sh   
endhic.100000.20.raw.sh   
endhic.100000.5.raw.sh   
endhic.100000.15.raw.sh   
endhic.100000.25.raw.sh   
endhic.Round_A.sh    
endhic.100000.15.iced.sh  
endhic.100000.25.iced.sh  
endhic.log
EndHic.sh     
dlo.scaffolds.fa                                                  
Round_A.01.contig_end_contact_results/
Round_A.02.GFA_contig_graph_results/
Round_A.03.cluster_order_orient_results/
Round_A.04.summary_and_merging_results/
scaffolds.agp
contigs.fa.len                 
z.EndHiC.A.results.summary.cluster
z.EndHiC.A.results.summary.cluster.GFA.v1.2.GFA
z.EndHiC.A.results.summary.cluster.GFA

文件很多,但是我們真正需要的就只有scaffolds.agpprefix.scaffolds.fa兩個,一個是scaffold文件,一個是map文件。

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容