用mcmctree做分歧時間估計的時候遇到報錯:
MCMCTREE in paml version 4.9, March 2015
Reading options from mcmctree.ctl..
finetune is deprecated now.
Reading master tree.
(Bmor, ((Achi, Rpro), Nlug));
Reading sequence data.. 3 loci
*** Locus 1 ***
ns = 4 ls = 621319
Reading sequences, sequential format..
Reading seq # 1: Achi
Error in sequence data file: B at 621301 seq 1.
Make sure to separate the sequence from its name by 2 or more spaces.
問題是序列不符合phylip格式,每個名字沒有區(qū)分開
1.使用以下命令行將orthofinder處理好的多序列比對結(jié)果,fasta格式改為phylip格式,失敗
cat SpeciesTreeAlignment.fa |tr '\n' '\t'|sed 's/>/\n/g' |sed 's/\t/ /'|sed 's/\t//g'| awk 'NF > 0' >Aa.phy.tmp
awk '{print " "NR" "length($2)}' supergene.phy.tmp|tail -n 1 | cat - supergene.phy.tmp > Aa.phy
- 使用
cat SpeciesTreeAlignment.fa |tr '\n' '\t'|sed 's/>/\n>/g' |sed 's/\t/\n/'|sed 's/\t//g'| awk 'NF > 0' > Aa1.phy.tmp
轉(zhuǎn)換為正常的fasta(去掉換行等)
使用R包將fasta轉(zhuǎn)換為phylip格式
library(devtools)
library(ape)
library(phylotools)
data <- read.fasta("Aa1.phy.tmp")
dat2phylip(data, outfile = "out.phy")
還是得到相同報錯
報錯原因,查找可能是蛋白序列里帶有U,U不在常見密碼子當中,所以有些軟件不識別會報錯
sed 's/U/X/g' SpeciesTreeAlignments.fa > STA_delU.fa
再用得到的這個去除U fasta文件,執(zhí)行以上命令行,得到phylip格式文件。序列個數(shù)沒有問題。注意,千萬不要把U替換為空,會影響文件序列長度。應(yīng)該替換為X。
運行繼續(xù)報錯,但是距離成功不遠了。
ns = 4 ls = 621319
Reading sequences, sequential format..
Reading seq # 4: Rpro
Sequences read..
Counting site patterns.. 0:00
56693 patterns at 621319 / 621319 sites (100.0%), 0:01
Counting frequencies..
56693 patterns, messy
*** Locus 2 ***
Error: seq err1: EOF.
seq file is not paml/phylip format. Trying nexus format.
原來問題是,軟件自帶測試數(shù)據(jù)是3組數(shù)據(jù),而我們的數(shù)據(jù)只有一組!
所以非常簡單,修改mcmctree.ctl。改為ndata=1
ndata = 1
seqtype = 2 * 0: nucleotides; 1:codons; 2:AAs
usedata = 3 * 0: no data; 1:seq like; 2:use in.BV; 3: out.BV
clock = 3 * 1: global clock; 2: independent rates; 3: correlated rates
RootAge = <1.0 * safe constraint on root age, used if no fossil for root.
運行成功。