hi-c 文獻(xiàn)導(dǎo)讀-1 basic

最近開始hi-c為代表的三維轉(zhuǎn)錄組的學(xué)習(xí)。即由于染色體的高度折疊性,即使同一染色體一維上相距很遠(yuǎn)的elements,也有可能空間距離很近,而產(chǎn)生interaction。
hi-c是目前探索相關(guān)信息的主要方法之一,本篇文獻(xiàn)首次提出hi-c method,并進(jìn)一步發(fā)現(xiàn)了染色體compartment分域方法。

  • 文獻(xiàn):Comprehensive mapping of long range interactions reveals folding principles of the human genome
  • PubMed ID:19815776
  • GEO:GSE18199
  • 關(guān)鍵字:hi-c method;A/B compartment

1、構(gòu)建 hi-c library

1-1

如上基本步驟為

  • (1)cells are crosslinked with formaldehyde;
    DNA與甲醇交聯(lián),固定DNA(如果DNA間發(fā)生interaction,會(huì)有相關(guān)蛋白將interaction 區(qū)域固定在一起)
  • (2)DNA is digested with a restriction enzyme that leaves a 5′-overhang;
    使用限制內(nèi)切酶切開DNA,產(chǎn)生黏性末端(一端有一段缺口),有interaction的DNA片段會(huì)形成一個(gè)H形;
  • (3)the 5′-overhang is filled, including a biotinylated residue;
    粘性末端缺口補(bǔ)平(包含biotinylated,如上圖是那個(gè)紫色的標(biāo)記)
  • (4)the resulting blunt-end fragments are ligated ;
    H兩端分別連接,成θ
  • (5)A Hi-C library is created by shearing the DNA and selecting the biotin-containing fragments with streptavidin beads.
    DNA打斷碎片,富集、純化含有biotinylated的片段,最終形成Hi-c library,進(jìn)行測(cè)序。
  • (6)parallel DNA sequencing, producing a catalog of interacting fragments
    將上述文庫(kù)進(jìn)行雙端測(cè)序,以biotin為分隔成一組read pairs(read1、read2)分別比對(duì)到基因組兩個(gè)位置 A、B

位置A、B即是染色體上的一段區(qū)域。單位長(zhǎng)度即為bin 或者說resolution
bin越短 可匹配的越精確,但同時(shí)匹配的reads 數(shù)就減少了;
bin越長(zhǎng) 匹配到的就越寬泛,但能夠匹配到更多的reads ;

  • 測(cè)序結(jié)果表明共有8.4M read pairs total。其中6.7 million corresponded to long-range contacts between segments greater than >20Kb apart.即六百萬+的read pairs 相隔20KB長(zhǎng)度以上,說明許多一維相距很遠(yuǎn)的bin,但空間距離很近。

2、genome-wide contact matrix M Heatmap

  • 目的:將上述測(cè)序比對(duì)結(jié)果可視化。
  • 矩陣格子:人染色體長(zhǎng)度一般有100~200Mb,文獻(xiàn)中用到的chr14長(zhǎng)度為104M,bin設(shè)置為1Mb。
  • 橫、縱軸為相同的染色體長(zhǎng)度,以bin長(zhǎng)度(1mb)為軸刻度單位。因此對(duì)應(yīng)的contact matrix M就是104*104大小的矩陣。(如下圖)


    2-1
  • Mij to be the number of ligation products between locusi and locus j (SOM).
    舉例來說,假如M(2,8)=10就表示reads pairs分別匹配到染色體第2個(gè)bin與第8個(gè)bin的數(shù)目為10;再轉(zhuǎn)換為對(duì)應(yīng)熱圖相應(yīng)的顏色等級(jí)。
  • This matrix reflects an ensemble average of the interactions present inthe original sample of cells;
  • It can be visually represented as a heatmap, with intensity indicating contact frequency.
    2-2

如上討論的情況是pairs 的read1,read2均比對(duì)到同一條染色體的情況,稱為Cis interaction
。而Trans interaction就是指read pairs分別比對(duì)到不同染色體的情況。

3、average intrachromosomal contact probability

  • 疑問:如何計(jì)算?(基于1D component?)
  • 概念:I n(s) for pairs of loci separated by a genomic distance s onchromosome n.
    指在染色體n上,平均相距長(zhǎng)度為s(一維距離)的兩個(gè)position contact probability
  • 結(jié)果,如下圖:從上往下依次代表染色體1內(nèi)部、染色體1與10、染色體與其余所有染色體平均、染色體1與21。
3-1
  • 結(jié)論
    (1)contact probability decreases monotonically on every chromosome;就是說一般距離近,contact probability(interaction)作用強(qiáng)
    (2)chromosome territories.即使一條染色體上相距很遠(yuǎn)(超過200mb),其contact probability也比兩條染色體里的任意兩個(gè)position 高得多


    3-2

4、compartment

4.1、observed contact matrix M Heatmap

就是我們?cè)诘诙c(diǎn)討論的內(nèi)容


4-1

4.2、nomalization:observed/expected

  • 為了校正 sequence proximity strongly influences contact probability,將raw data進(jìn)行標(biāo)準(zhǔn)化;
  • 標(biāo)準(zhǔn)化方法:dividing each entry in the contact matrix by the genome-wide average contact probability for loci at that genomic distance (上述第三點(diǎn)).
  • 結(jié)果如下:分為兩大類,紅色的大于1,藍(lán)色的小于1
    The normalized matrix shows many large blocks of enriched and depleted interactions generating a ‘plaid’ pattern(格子圖案)


    4-2

4.3、correlation matrix C

  • If two loci (here 1 Mb regions) are nearby in space, we reasoned that they will share neighbors and have correlated interaction profiles. (未太理解這句話)
  • 轉(zhuǎn)換為相關(guān)矩陣correlation matrix C
    Cij is the Person correlation between the i row and j column of M which dramatically sharpened the plaid pattern(如下圖)


    4-3
  • The plaid pattern suggests that each chromosome can be decomposed into two sets of loci(arbitrarily labeled A and B) such that contacts within each set are enriched and contacts between sets are depleted.
    即表示將染色體分為兩個(gè)區(qū)域,區(qū)域內(nèi)的bins interaction 明顯高于 區(qū)域間的interacton

4.4、pca的第一pc

  • 對(duì)相關(guān)矩陣進(jìn)行主成分分析,一般來說, the first principal component (PC) clearly corresponded to the plaid pattern (positive values defining one set, negative values the other)
  • The entries of the PC vector reflected the sharp transitions from compartment to compartment observed within the plaid heatmaps.


    4-4

4.5 compartment A B

  • The Hi-C data imply that regions tend be closer in space if they belong to the same
    compartment.
  • 但是如上G圖以及文章其它證據(jù)表明 compartment A compartment A is more closely associated with open, accessible, actively transcribed chromatin. 相對(duì)來說compartment B closed chromatin domains ,相對(duì)來說表達(dá)不活躍。
4-5
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容