三維基因組小記

Zero 全基因組比對(duì)

比對(duì)軟件參考:https://github.com/carbonelab/lastz-pipeline/tree/main
準(zhǔn)備文件:

target.fa  target.2bit target.size
quey.fa qury.2bit qury.size
  1. 更改lastz-pipe.R的比對(duì)參數(shù)
1. 若物種分化時(shí)間很近,例如人和猩猩,使用默認(rèn)distance="near"
2. 若物種分化時(shí)間超過(guò)100MYA,使用distance="far"
3. 其他使用distance="medium"
  1. 運(yùn)行
Rscript lastz-pipe.R -t  target.2bit -q qury.2bit -T target.size -Q qury.size -d /abo_path_to_output -p 32

輸出為 target.qury.all.chain

One 單基因組AB區(qū)室統(tǒng)計(jì)

Pan-3D genome analysis reveals structural and functional diferentiation of soybean genomesFigure 1

在這之前,生成每個(gè)物種的AB區(qū)室結(jié)果,接hic-pro的輸出
run_call_AB.sh prefix bin_size annoate.gff prefix.ref.fasta prefix.ref.fasta.fai /path_to_01.cal-corr.py
其中:
prefix :物種名稱
bin_size :分辨率
annoate.gff :注釋文件
prefix.ref.fasta :參考基因組
prefix.ref.fasta.fai : 索引
/path_to_01.cal-corr.py: 指向腳本的絕對(duì)路徑
eg.
run_call_AB.sh golani 100000 golani.chr.v4.gff Golani.Chr.v4.fasta Golani.Chr.v4.fasta.fai /path_to_01.cal-corr.py

最后結(jié)果為:prefix.combine.PC1.bed

輸入:
物種的染色體位置及PC1的值,如圖


微信截圖_20241121213921.png

其中,正值是經(jīng)過(guò)校正的PC1的值,代表A,反之代表B,空值為None

python3 01.py carmeli.PC1.bed > carmeli.AB.bed

output file format:

Chr1    0       50000   None
Chr1    50000   100000  B
Chr1    100000  150000  B
Chr1    150000  200000  A
Chr1    200000  250000  B
Chr1    250000  300000  None
...
python3 02.py 01.output species_name > prefix.AperEvChr.stat
eg. python3 02.py carmeli.AB.bed carmeli > carmeli.AperEvChr.stat

output file format: prefix.AperEvChr.stat

prefix  chr_name A_count  all_count  A_percentage

carmeli Chr1    2777    4991    0.5564015227409337
carmeli Chr2    1824    4570    0.39912472647702407
carmeli Chr3    1424    3590    0.3966573816155989
carmeli Chr4    1392    3576    0.38926174496644295
...
python3 03.py 02.output perfix.GC.stat > prefix.AperEvChr_GC.stat
eg. python3 03.py golani.AperEvChr.stat golani.GC.stat > golani.AperEvChr_GC.stat
perfix.GC.stat:
Chr1    40.31
Chr2    40.19
Chr3    40.52
Chr4    40.34
...
it can be got by :
seqkit fx2tab prefix.ref.fa -g -n > prefix.GC.stat

output file format: prefix.AperEvChr_GC.stat

prefix chr_name GC A_per
golani  Chr1    40.31   0.42178301093355763
golani  Chr2    40.19   0.36876640419947504
golani  Chr3    40.52   0.4832324173265021
python3 04.py prefix.1Mwin.bed prefix.1Mwin_TSS.output prefix.AperEvChr_GC.stat > prefix.AperEvChr_GC_TSS.stat
eg. python3 04.py carmeli.1Mwin.bed carmeli.1Mwin_TSS.output carmeli.AperEvChr_GC.stat > carmeli.AperEvChr_GC_TSS.stat
prefix.1Mwin.bed :
it can be got by :
bedtools makewindows -g carmeli.size -w 100000 > carmeli.1Mwin.bed

prefix.TSS.bed
it can be got by :
cat prefix.anno.gff | awk '$3=="gene"{print$0}' | awk '{print$1"\t"$4"\t"$4"\t"$9}' > prefix.TSS.bed

prefix.1Mwin_TSS.output
it can be got by :
bedtools intersect -a carmeli.1Mwin.bed -b carmeli.TSS.bed -wo > carmeli.1Mwin_TSS.output

output file format: prefix.AperEvChr_GC_TSS.stat

prefix chr_name GC A_per  TSS_density(/Mb)
carmeli Chr1    40.79   0.5564015227409337      0.7439903846153846
carmeli Chr2    40.17   0.39912472647702407     0.7452954048140044
carmeli Chr3    43.12   0.3966573816155989      0.8044568245125349
...
python3 05.py prefix.size prefix.Chr_repeat.locat prefix.AperEvChr_GC_TSS.stat > prefix.AperEvChr_GC_TSS_repeat.stat
eg. python3 05.py carmeli.size carmeli.Chr_repeat.locat carmeli.AperEvChr_GC_TSS.stat > carmeli.AperEvChr_GC_TSS_repeat.stat
prefix.size
chr_id  size
Chr1    249545545
Chr2    228452398
Chr3    179495747
it can be got by 
its easy you know ? : )

prefix.Chr_repeat.locat
it can be got by :
cat prefix.repeat.chr.gff  | cut -f 1,4,5 | bedtools merge -i - | grep "Chr" > carmeli.Chr_repeat.locat

output file format: prefix.AperEvChr_GC_TSS_repeat.stat

prefix chr GC A_per TSS_density(/Mb) repeat_density
carmeli Chr1    40.79   0.5564015227409337      0.7439903846153846      0.5028114727513969
carmeli Chr2    40.17   0.39912472647702407     0.7452954048140044      0.49054962863642165
carmeli Chr3    43.12   0.3966573816155989      0.8044568245125349      0.5070431835914195
...
python3 06.py prefix.AB.bed > prefix.AB_I.tmp
paste -d "\t" prefix.AB.bed prefix.AB_I.tmp > prefix.AB_I.bed
eg.
python3 06.py carmeli.AB.bed > carmeli.AB_I.tmp
paste -d "\t" carmeli.AB.bed carmeli.AB_I.tmp > carmeli.AB_I.bed

output file format: prefix.AB_I.bed

chr  start  end  A/B I_stat
Chr1    0       50000   None    I
Chr1    50000   100000  B       I
Chr1    100000  150000  B       I
Chr1    150000  200000  A       I
...
python3 07.py prefix.AB_I.bed prefix > prefix.AB_I.stat
eg.
python3 07.py carmeli.AB_I.bed carmeli > carmeli.AB_I.stat

output file format: prefix.AB_I.stat

prefix  chr  I_percentage
carmeli Chr1    0.4898817872169906
carmeli Chr2    0.11575492341356675
carmeli Chr3    0.2908077994428969
carmeli Chr4    0.8671700223713646
...
python3 08.py prefix.AB_I.bed prefix.50kwin_TSS.output prefix.50kwin.GC.bed prefix.50kwin.repeat.bed > prefix.AB_I_GC_repeat_TSS.stat
eg. 
python3 08.py carmeli.AB_I.bed carmeli.50kwin_TSS.output carmeli.50kwin.GC.bed carmeli.50kwin.repeat.bed > carmeli.AB_I_GC_repeat_TSS.stat
輸入文件準(zhǔn)備:
1. carmeli.50kwin_TSS.output:
bedtools intersect -a carmeli.50kwin.bed -b carmeli.TSS.bed -wo > carmeli.50kwin_TSS.output
2. carmeli.50kwin.GC.bed
bedtools nuc -fi Carmeli.Chr.v4.fasta -bed carmeli.50kwin.bed | cut -f 1-3,5 | grep-v "#" > carmeli.50kwin.GC.bed
3. carmeli.50kwin.repeat.bed
bedtools annotate -i carmeli.50kwin.bed -files carmeli.Chr_repeat.locat | bedtools sort -i - > carmeli.50kwin.repeat.bed
python3 09.py prefix.AB_I_GC_repeat_TSS.stat prefix > prefix.5type.stat
eg.
python3 09.py carmeli.AB_I_GC_repeat_TSS.stat carmeli > carmeli.5type.stat

Two 基于Pore-C的Pipeline

默認(rèn)擁有.mcool文件,默認(rèn)使用的分辨率為100000

1. mcool轉(zhuǎn)為cool
hicConvertFormat -m prefix.mcool://resolutions/100000 -o prefix_100000.cool --inputFormat cool --outputFormat cool
2. mcool轉(zhuǎn)為hic https://www.biostars.org/p/360254/
cooler dump --join prefix.mcool::/resolutions/100000 > prefix_bedpe
awk -F "\t" '{print 0, $1, $2, 0, 0, $4, $5, 1, $7}'  prefix_bedpe > prefix_bedpe.short
sort -k2,2d -k6,6d prefix_bedpe.short > prefix_bedpe.short.sorted
java -jar juicer_tools.jar pre -r 100000 prefix_bedpe.short.sorted prefix.hic prefix.size
3. cool轉(zhuǎn)為dense matrix
hictk dump prefix_100000.cool > prefix_100000.matrix
bedtools makewindows -g prefix.size -w 100000 -i winnum > prefix.100k.bed
{python sparseToDense.py -b prefix.100k.bed prefix_100000.matrix --perchr}  可以套用 run_call_AB.sh

Three 基于HiC-PRO的Pipeline

默認(rèn)擁有HiC-PRO的結(jié)果 分辨率為100000

1. allValidPairs轉(zhuǎn)hic
hicpro2juicebox.sh -i sample1.allValidPairs -g prefix.size -j juicer_tools_1.22.01.jar

Four 計(jì)算絕緣系數(shù)

h1d one IS prefix.cool 100000 Chrx -p 300000 -o output --datatype cool --gt prefix.size

Five 標(biāo)準(zhǔn)化過(guò)程

對(duì)h5格式表轉(zhuǎn)化
hicCorrectMatrix correct -m prefix.h5 --outFileName prefix.ice.h5 --filterThreshold -1.2 5
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容