WGCNA 挑選軟閾值

我們今天就搞清楚這兩個(gè)問(wèn)題:
1.什么是無(wú)尺度分布
2.手工計(jì)算軟閾值

無(wú)尺度分布

假如我問(wèn)大家,如果你想認(rèn)識(shí)世界上的任何一個(gè)人,需要通過(guò)幾個(gè)人來(lái)聯(lián)系?

6個(gè)。

這個(gè)就是6度分隔理論,描述的是隨機(jī)網(wǎng)絡(luò)里面的情形,隨機(jī)網(wǎng)絡(luò)里面,大多數(shù)節(jié)點(diǎn)擁有相同的鏈接數(shù)。
如果用計(jì)算機(jī)模擬的。在節(jié)點(diǎn)完全隨機(jī)的情況下,任何兩個(gè)節(jié)點(diǎn)的平均距離卻遠(yuǎn)遠(yuǎn)大于6個(gè)。
但是我會(huì)感覺(jué)6度理論是靠譜的,舉兩個(gè)例子:
因?yàn)槟阆肼?lián)系上特朗普的話,應(yīng)該沒(méi)那么困難。你先認(rèn)識(shí)地方電視臺(tái),然后通過(guò)他們聯(lián)系馬云,而馬云跟特朗普認(rèn)識(shí)。

你如果找馬化騰呢?也沒(méi)有那么困難,你先認(rèn)識(shí)地方電視臺(tái),然后找到馬云,而馬云跟馬化騰很熟。

你看,只要我們?cè)谶@個(gè)網(wǎng)絡(luò)中,加入一些巨人,事情就會(huì)簡(jiǎn)化很多,而這些巨人擁有的特點(diǎn)就是,巨人很少,單個(gè)巨人跟普通人的鏈接特別多。
這些巨人起到了樞紐的作用,樞紐英文叫作hub。

真實(shí)世界的網(wǎng)絡(luò)不是隨機(jī)的,而是有hub的網(wǎng)絡(luò)。就像右邊的圖一樣。普通節(jié)點(diǎn)占大多數(shù),hub節(jié)點(diǎn)是少數(shù)。

隨機(jī)網(wǎng)絡(luò)是可以用一個(gè)值來(lái)衡量大多數(shù)節(jié)點(diǎn)之間的距離,或者是6或者是60,也就是你可以用一個(gè)尺度去衡量他,可以稱為尺度網(wǎng)絡(luò)。
而有巨人的網(wǎng)絡(luò),沒(méi)辦法來(lái)衡量?jī)蓚€(gè)節(jié)點(diǎn)之間的距離,叫作無(wú)尺度分布,該分布又叫做冪律分布,我們說(shuō)的二八定律,長(zhǎng)尾定律都是冪律分布的口頭化呈現(xiàn)。

人和人之間的交往是這個(gè)樣子,那么蛋白和蛋白之間的互作是什么情況呢?
目前的觀察是,也符合冪律分布,也就是無(wú)尺度分布。一個(gè)細(xì)胞內(nèi),不是每個(gè)基因都要表達(dá),即使表達(dá),也不定起作用。
決定這個(gè)基因分化,功能改變,都是一些重要基因,我們把他成為hub gene。

好了,WGCNA分析就是利用了這一點(diǎn)規(guī)律,強(qiáng)行讓基因間的聯(lián)系符合無(wú)尺度分布。

手工計(jì)算軟閾值

我們用真實(shí)的數(shù)據(jù)來(lái)演示以下這個(gè)過(guò)程

load(file = "FemaleLiver-01-dataInput.RData")

我們的數(shù)據(jù)名稱是dataExpr,在第一次課的文件夾里面。
他的行是樣本,134個(gè),列是基因,3600個(gè)。

WGCNA首先通過(guò)相關(guān)性分析(cor函數(shù)),計(jì)算了任意兩個(gè)基因的相關(guān)性。
如果是平常,我們就會(huì)設(shè)定一個(gè)閾值,大于0.8(假設(shè)是這個(gè))是有聯(lián)系,小于的叫沒(méi)有聯(lián)系。
但是正常情況并不是這樣,因?yàn)?.79的基因會(huì)出來(lái)鬧。

所以,作者就給這些相關(guān)性加了一個(gè)冪。

圖中的綠線原來(lái)的相關(guān)性,進(jìn)行冪次計(jì)算之后(8次)就變成了紅色。
可以看到整體的相關(guān)性都變小了,但是里面本來(lái)就小的變得更加小。

為什么要這么做,因?yàn)檫@樣做了之后,基因間的連通性,就開(kāi)始符合冪律了。
我們以冪為10來(lái)計(jì)算一下。

power <- 10
ADJ=abs(cor(datExpr,use = 'p'))^power

現(xiàn)在得到了一個(gè)3600行乘以3600行的相關(guān)性矩陣

如果對(duì)每一列的基因進(jìn)行求和,得到的就是這個(gè)基因跟其他基因相關(guān)性之和。

k=apply(ADJ,2,sum) -1

減去1是排除了自身。

MMT00000044 MMT00000046 MMT00000051 MMT00000076 MMT00000080 MMT00000102 
 0.02094526  7.13906159  4.17142638  0.22300729  6.25771609  1.20155840

這個(gè)K指的就是3600基因組成的網(wǎng)絡(luò)中,每個(gè)節(jié)點(diǎn)的連通性。做一個(gè)頻次分布圖就會(huì)有驚喜

hist(k)

橫坐標(biāo)是連通性依次升高,縱坐標(biāo)表示該范圍連通性的頻次,是不是從分布上看符合了冪律呢?
當(dāng)然這是靠眼睛在看,
真正的冪律是這樣的,把連通性分隔,分隔內(nèi)連通性的平均值取log10,跟頻率的概率取log10,兩者之間有線性關(guān)系。

用圖展示一下,
先把k從小到大排序,切割成10份

cut1=cut(k,10)

計(jì)算每個(gè)區(qū)間的平均值

binned.k=tapply(k,cut1,mean)

結(jié)果如下

(-0.0655,6.58]    (6.58,13.2]    (13.2,19.7]    (19.7,26.3]    (26.3,32.9]    (32.9,39.5]      (39.5,46]      (46,52.6]    (52.6,59.2]    (59.2,65.8] 
      2.151281       9.145864      15.670490      22.564169      30.466627      36.649594      42.028070      49.043907      55.893530      61.864996

然后計(jì)算每個(gè)區(qū)間的頻率

freq1=tapply(k,cut1,length)/length(k)

結(jié)果是這樣的

(-0.0655,6.58]    (6.58,13.2]    (13.2,19.7]    (19.7,26.3]    (26.3,32.9]    (32.9,39.5]      (39.5,46]      (46,52.6]    (52.6,59.2]    (59.2,65.8] 
   0.792500000    0.130277778    0.036666667    0.011111111    0.001944444    0.003055556    0.003611111    0.008333333    0.005555556    0.006944444

此時(shí)這個(gè)均值的對(duì)數(shù)和頻率的對(duì)數(shù)就是線性的

plot(log10(binned.k),log10(freq1+.000000001),xlab="log10(k)",ylab="log10(p(k))")

如果通用線性函數(shù)加線和注釋就會(huì)明顯一點(diǎn)

xx= as.vector(log10(binned.k))
lm1=lm(as.numeric(log10(freq1+.000000001))~ xx )
lines(xx,predict(lm1),col=1)
title(paste( "scale free R^2=",as.character(round(summary(lm1)$adj.r.squared,2)),", slope=", round(lm1$coefficients[[2]],2)))

R平方達(dá)到了0.81,已經(jīng)很不錯(cuò)了。
但是,我們必須有所選擇,所以我們可以幾個(gè)冪次一起算,然后來(lái)選就行了。

把以上結(jié)果寫一個(gè)函數(shù)

mypick <- function(powerVector,datExpr){
  power <- powerVector
  cor<-stats::cor
  ADJ=abs(cor(datExpr,use = 'p'))^power
  k=apply(ADJ,2,sum) -1
  cut1=cut(k,10)
  binned.k=tapply(k,cut1,mean)
  freq1=tapply(k,cut1,length)/length(k)
  xx= as.vector(log10(binned.k))
  lm1=lm(as.numeric(log10(freq1+.000000001))~ xx )
  return(data.frame(Power=power,
                    SFT.R.sq=as.character(round(summary(lm1)$adj.r.squared,2)),
                    slope=round(lm1$coefficients[[2]],2),
                    mean.k=mean(k)))
}

測(cè)試一個(gè)結(jié)果,冪次為10

mypick(10,datExpr)

結(jié)果如下,符合預(yù)期,其中mean.k,是對(duì)所有基因的連通性取均值,代表當(dāng)前網(wǎng)絡(luò)的連通性
之后作圖的時(shí)候需要用到。

Power SFT.R.sq slope   mean.k
1    10     0.81 -1.66 5.193521

現(xiàn)在批量運(yùn)算

powers = c(c(1:10), seq(from = 12, to=20, by=2))
do.call(rbind,lapply(powers,mypick,datExpr))

得到結(jié)果如下

結(jié)合這個(gè)表格,我會(huì)選取6,作為power值,因?yàn)閺?到6,R平方顯著提升。

官方版本的軟閾值計(jì)算

以上過(guò)程只是幫助我們理解無(wú)尺度網(wǎng)絡(luò)和軟閾值的概念。
實(shí)際預(yù)算,WGCNA包中提供了一個(gè)函數(shù)pickSoftThreshold,可以輕松計(jì)算。
這個(gè)函數(shù)輸入的就是不同的power值和表達(dá)矩陣

powers = c(c(1:10), seq(from = 12, to=20, by=2))
sft = pickSoftThreshold(datExpr, powerVector = powers)

因?yàn)樗褂昧朔謮K還有并行化的思想,所以計(jì)算速度十分快。

結(jié)果返回的是個(gè)列表,里面有兩個(gè)內(nèi)容
第一個(gè)是,他自己確定的軟閾值,該函數(shù)如果發(fā)現(xiàn)了R平法大于0.85的power值,就返回最小的那個(gè)。
這里返回的是6
可以用這句命令查看

sft$powerEstimate

第二個(gè)返回的就是上面的表格,

sft$fitIndices

我們之前確定6是靠眼睛看,但是很不直觀,所以,抽取表格內(nèi)的數(shù)據(jù)作圖,就會(huì)很方便。

sizeGrWindow(9, 5)
par(mfrow = c(1,2))
cex1 = 0.85
# Scale-free topology fit index as a function of the soft-thresholding power
plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
     xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",
     main = paste("Scale independence"));
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
     labels=powers,cex=cex1,col="red");
# this line corresponds to using an R^2 cut-off of h
abline(h=0.90,col="red")
# Mean connectivity as a function of the soft-thresholding power
plot(sft$fitIndices[,1], sft$fitIndices[,5],
     xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",
     main = paste("Mean connectivity"))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")

這張圖是很常見(jiàn),由兩張圖組成,都是以不同的軟閾值作為橫坐標(biāo)。
第一張圖,縱坐標(biāo)是R平方。橫線畫在了0.85的地方
從圖上看,軟閾值為6的時(shí)候,R平方第一次有了突破,達(dá)到了0.9.此時(shí)網(wǎng)絡(luò)已經(jīng)符合無(wú)尺度分布。
第二張圖,縱坐標(biāo)是連通性的平均值,我們已經(jīng)計(jì)算過(guò),他會(huì)越來(lái)越小。
這是必然的規(guī)律,聯(lián)合第一張看就行了。

一些差別

如果你仔細(xì)看,會(huì)發(fā)現(xiàn),我們自己算的R平方和官方的R平法還有差別。

這是為什么呢?
是因?yàn)槊恳淮尉€性模型的計(jì)算,都會(huì)返回兩個(gè)R平方。

summary(lm1)

我們提取的是矯正后的0.81,而他提取的是沒(méi)有矯正的0.83,如果我們想改,也十分容易。
下面的語(yǔ)句就可以提取。

summary(lm1)$r.squared

一些疑惑

把我們的數(shù)據(jù)變成無(wú)尺度是個(gè)人為的事情??
是的。
作者也這么說(shuō)了,真實(shí)基因之間的關(guān)系應(yīng)該是符合無(wú)尺度分布的,但是相關(guān)性計(jì)算出來(lái)的不具有代表性。
因?yàn)橄嚓P(guān)不是因果嘛。

那么,我們就認(rèn)為給他一個(gè)power值,把他們的之間的關(guān)系變成符合無(wú)尺度分布就行了。這是作者的原話。

last but not least

基因的相關(guān)性有個(gè)選項(xiàng),默認(rèn)是unsigned,還可以改成sined
有什么區(qū)別?
如果是unsigned,我們計(jì)算的時(shí)候,無(wú)論相關(guān)性是正和負(fù)的,都取的絕對(duì)值。
最終得到的基因其實(shí)有正相關(guān)和負(fù)相關(guān),我們一視同仁了。

但是真實(shí)情況又不是這樣的。為了權(quán)衡,就有了signed方法來(lái)削減負(fù)相關(guān)的影響。
就是取0.5之后,再加上0.5,這樣如果是負(fù)的,最終還是會(huì)變成正的值,只是相關(guān)性變小了。
如果是正的,那么相關(guān)性會(huì)變大。最終都會(huì)在0到1的區(qū)間內(nèi),沒(méi)有負(fù)值。
而作者推薦的是signed,他覺(jué)得這更符合真實(shí)情況。
后面得到的模塊里面的也都是正相關(guān)的基因。

當(dāng)然事情并不絕對(duì),他說(shuō),如果是單細(xì)胞的數(shù)據(jù),那么就是因?yàn)橐蝗夯蛘?fù)調(diào)控得到的結(jié)果,這時(shí)候unsigned的比較合適。
這是一個(gè)生物學(xué)的問(wèn)題,不是統(tǒng)計(jì)學(xué)的問(wèn)題。
可以在果子學(xué)生信公眾號(hào)回復(fù)“果子WGCNA”自助獲取作者接近2個(gè)小時(shí)的演講視頻和PPT,自己感受和體會(huì)一下。

現(xiàn)在再看這個(gè)圖就好理解了吧

假如你的數(shù)據(jù)最后沒(méi)有合適的閾值怎么辦呢?
作者說(shuō)了,如果是unsigned的就選6,signed就選12,不要糾結(jié)了。

接下來(lái)我們還要講講模塊的獲取過(guò)程,eigengene值的計(jì)算。集結(jié)完畢后,作為連同多數(shù)據(jù)的WGCNA一起更新到答疑群中。

WGCNA 常見(jiàn)問(wèn)答

  1. 需要多少樣本?

    建議至少15個(gè)樣本. 對(duì)于高通量數(shù)據(jù),少于15個(gè)樣本的相關(guān)性將對(duì)網(wǎng)絡(luò)的生物學(xué)意義產(chǎn)生較大噪音。如果可能,建議至少20個(gè)樣本; 和其他分析方法一樣,樣本數(shù)越多得到的結(jié)果的可靠性和準(zhǔn)確度越高。

  2. 應(yīng)該怎么過(guò)濾數(shù)據(jù)?

    可以使用平均表達(dá)或方差過(guò)濾(或類似方法如中位數(shù)和中位數(shù)絕對(duì)偏差,MAD),因?yàn)榈捅磉_(dá)或不變化的基因通常給結(jié)果帶來(lái)更多噪音。平均表達(dá)和方差過(guò)濾哪個(gè)更好,依然是一個(gè)有爭(zhēng)議的問(wèn)題;這兩種方法各有利弊,但更重要的是,它們往往會(huì)過(guò)濾掉相似的一組基因,因?yàn)槠骄岛头讲钔ǔJ窍嚓P(guān)的。

    不建議使用差異表達(dá)基因進(jìn)行過(guò)濾. WGCNA是一種基于基因表達(dá)譜的無(wú)監(jiān)督分析方法. 通過(guò)差異表達(dá)過(guò)濾得到的基因集可能是來(lái)自一個(gè)或幾個(gè)表達(dá)高度相關(guān)的模塊。這將導(dǎo)致無(wú)尺度網(wǎng)絡(luò)的假設(shè)完全失效,進(jìn)而導(dǎo)致基于無(wú)尺度網(wǎng)絡(luò)的軟閾值挑選失敗。

  3. 有哪些設(shè)置參數(shù)的建議?

通常,我們?cè)O(shè)置的默認(rèn)參數(shù)在多個(gè)應(yīng)用程序中都能獲得較好的運(yùn)行。然而,在某些情況下,為了向后兼容性和再現(xiàn)性,我們保持“簡(jiǎn)單”或歷史默認(rèn)設(shè)置,而對(duì)于新的計(jì)算,我們不建議使用默認(rèn)設(shè)置。下面列出了一些設(shè)置說(shuō)明。

  • Signed networks. 選擇‘ signed ’還是‘ unsigned’ 網(wǎng)絡(luò)是一個(gè)復(fù)雜的問(wèn)題,但是通常我們建議使用‘signed’(或‘signed hybrid’)網(wǎng)絡(luò)??梢栽谙铝泻瘮?shù)中設(shè)置參數(shù)‘type = "signed" ’or 'type = "signed hybrid"' 來(lái)構(gòu)建‘signed' 網(wǎng)絡(luò)。函數(shù)列表如下:accuracyMeasures, adjacency, chooseOneHubInEachModule, chooseTopHubInEachModule, nearestNeighborConnectivity, nearestNeighborConnectivityMS, orderBranchesUsingHubGenes, softConnectivity 等(其他函數(shù)詳見(jiàn)幫助文檔)有些函數(shù)使用參數(shù) **networkType**來(lái)選擇網(wǎng)絡(luò)類型; 常用的有 blockwiseModules, blockwiseConsensusModules, blockwiseIndividualTOMs, consensusTOM, intramodularConnectivity, modulePreservation, pickSoftThreshold, TOMsimilarityFromExpr, vectorTOM 等。如有疑惑請(qǐng)閱讀幫助文檔。

  • Robust correlation. WGCNA中所有函數(shù)默認(rèn)的相關(guān)方法是標(biāo)準(zhǔn)皮爾遜相關(guān)。一般來(lái)說(shuō),除非有充分的理由相信沒(méi)有異常值測(cè)量,否則我們推薦使用biweight mid-correlation作為替代 。 其在WGCNA的函數(shù)bicor中實(shí)現(xiàn)。許多WGCNA的函數(shù)都有一個(gè)‘corFnc’參數(shù)讓使用者選擇標(biāo)準(zhǔn)相關(guān)‘cor’還是雙重相關(guān)‘bicor’。 Additional arguments to the correlation function can be specified using the argument corOptions (depending on function, this argument may require one of two alternate forms, please see the help for each function for details). In certain functions, notably the of the blockwise family, correlation function cannot be specified directly as a function; rather, one must use the argument corType to specify either Pearson or biweight mid-correlation.

    關(guān)于使用bicor的重要注意事項(xiàng). The biweight mid-correlation works very well in a variety of settings but in some situations it will produce unwanted results.

    • 限制排除的異常值的數(shù)量: argument maxPOutliers. The default version of the biweight mid-correlation, described in Langfelder and Horvath (2011) (link to article), can produce unwanted results when the data have a bi-modal distribution (e.g., when a gene expression depends heavily on a binary variable such as disease status or genotype) or when one of the variables entering the correlation is itself binary (or ordinal). For this reason, we strongly recommend using the argument maxPOutliers = 0.05 or 0.10 whenever the biweight midcorrelation is used. This argument essentially forces bicor to never regard more than the specified proportion of samples as outliers.
    • 處理二進(jìn)制數(shù)據(jù). When relating high-throughput data x to binary variable y such as sample traits, one can use argument robustY = FALSE to turn off the robust treatment for the y argment of bicor. This results in a hybrid robust-Pearson correlation as described in Langfelder and Horvath (2011). The hybrid correlation can also be used when one of the inputs is numeric but known to not have any outliers.
  1. WGCNA可以用于分析 RNA-Seq 嗎?

    Yes. As far as WGCNA is concerned, working with (properly normalized) RNA-seq data isn't really any different from working with (properly normalized) microarray data.

    We suggest removing features whose counts are consistently low (for example, removing all features that have a count of less than say 10 in more than 90% of the samples) because such low-expressed features tend to reflect noise and correlations based on counts that are mostly zero aren't really meaningful. The actual thresholds should be based on experimental design, sequencing depth and sample counts.

    We then recommend a variance-stabilizing transformation. For example, package DESeq2 implements the function varianceStabilizingTransformation which we have found useful, but one could also start with normalized counts (or RPKM/FPKM data) and log-transform them using log2(x+1). For highly expressed features, the differences between full variance stabilization and a simple log transformation are small.

    Whether one uses RPKM, FPKM, or simply normalized counts doesn't make a whole lot of difference for WGCNA analysis as long as all samples were processed the same way. These normalization methods make a big difference if one wants to compare expression of gene A to expression of gene B; but WGCNA calculates correlations for which gene-wise scaling factors make no difference. (Sample-wise scaling factors of course do, so samples do need to be normalized.)

    If data come from different batches, we recommend to check for batch effects and, if needed, adjust for them. We use ComBat for batch effect removal but other methods should also work.

    Finally, we usually check quantile scatterplots to make sure there are no systematic shifts between samples; if sample quantiles show correlations (which they usually do), quantile normalization can be used to remove this effect.

  2. 異質(zhì)性數(shù)據(jù)如何使用 WGCNA?

    異質(zhì)性數(shù)據(jù)會(huì)影響任何統(tǒng)計(jì)分析,特別是像WGCNA這樣的無(wú)監(jiān)督的統(tǒng)計(jì)分析。 What, if any, modifications should be made to the analysis depends crucially on whether the heterogeneity (or its underlying driver) is considered "interesting" for the question the analyst is trying to answer, or not. 如果幸運(yùn)的話,樣本差異的主要驅(qū)動(dòng)因素是人們所研究的處理/條件,在這種情況下,WGCNA可以被應(yīng)用于現(xiàn)有的數(shù)據(jù)。 不幸的是,異質(zhì)性驅(qū)動(dòng)因素通常是無(wú)趣的,應(yīng)該對(duì)此進(jìn)行調(diào)整。 Such factors can be technical (batch effects, technical variables such as post-mortem interval etc.) or biological (e.g., sex, tissue, or species differences).

    If one has a categorical source of variation (e.g., sex or tissue differences) and the number of samples in each category is large enough (at least 30, say) to construct a network in each category separately, it may be worthwhile to carry out a consensus module analysis (Tutorial II, see WGCNA Tutorials). Because this analysis constructs a network in each category separately, the between-category variation does not affect the analysis.

    If it is desired to construct a single network for all samples, the unwanted or uninteresting sources of large variation in the data should be adjusted for. For categorical (ordinal) factors we recommend using the function ComBat (from the package sva). Users who have never used ComBat before should read the help file for ComBat and work through the sva vignette (type vignette("sva") at the R prompt) to make sure they use ComBat correctly.

    For continuous sources of variation (e.g., postmortem interval), one can use simple linear regression to adjust the data. There may be more advanced methods out there that also allow the use of covariates and protect from over-correction.

    Whichever method is used, we caution the user that removal of unwanted sources of variation is never perfect and it can, in some cases, lead to removal of true interesting signal, and in rare cases it may introduce spurious association signal. Thus, only sources of relatively large variation should be removed.

  3. I can't get a good scale-free topology index no matter how high I set the soft-thresholding power.

    First, the user should ensure that variables (probesets, genes etc.) have not been filtered by differential expression with respect to a sample trait. See item 2 above for details about beneficial and detrimental filtering genes or probesets.

    If the scale-free topology fit index fails to reach values above 0.8 for reasonable powers (less than 15 for unsigned or signed hybrid networks, and less than 30 for signed networks) and the mean connectivity remains relatively high (in the hundreds or above), chances are that the data exhibit a strong driver that makes a subset of the samples globally different from the rest. The difference causes high correlation among large groups of genes which invalidates the assumption of the scale-free topology approximation.

    Lack of scale-free topology fit by itself does not invalidate the data, but should be looked into carefully. It always helps to plot the sample clustering tree and any technical or biological sample information below it as in Figure 2 of Tutorial I, section 1; strong clusters in the clustering tree indicate globally different groups of samples. It could be the result a technical effect such as a batch effect, biological heterogeneity (e.g., a data set consisting of samples from 2 different tissues), or strong changes between conditions (say in a time series). One should investigate carefully whether there is sample heterogeneity, what drives the heterogeneity, and whether the data should be adjusted (see previous point).

    If the lack of scale-free topology fit turns out to be caused by an interesting biological variable that one does not want to remove (i.e., adjust the data for), the appropriate soft-thresholding power can be chosen based on the number of samples as in the table below. This table has been updated in December 2017 to make the resulting networks conservative.

    <center>

    | Number of samples | Unsigned and signed hybrid networks | Signed networks |
    | Less than 20 | 9 | 18 |
    | 20-30 | 8 | 16 |
    | 30-40 | 7 | 14 |
    | more than 40 | 6 | 12 |

    </center>

  4. The functions take many arguments! Are default settings always appropriate?

    Many of the WGCNA functions take multiple arguments that control various subtleties in network construction and module identification. In general we attempt to provide defaults that work reasonably well in most common situations. However, in some cases we, over time, find that a different setting is more appropriate. In most cases we keep the old default for reproducibility.

  5. Functions TOMsimilarity and TOMsimilarityFromExpr give slightly different results!

    The function TOMsimilarityFromExpr uses a slightly different default setting for TOM calculation in unsigned networks. This should produce TOM that's slightly easier to interpret but is slightly different from what one gets by calculating a standard unsigned adjacency and then TOM using TOMsimilarity. To get the same result, use the argument TOMType="unsigned" when calling TOMsimilarityFromExpr.

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容