文章
標(biāo)題:CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning
地址:https://www.nature.com/articles/s41592-023-01940-w
期刊:Nature Methods 2023
摘要
測(cè)序技術(shù)和生物信息學(xué)工具的進(jìn)步極大地提高了宏基因組數(shù)據(jù)中微生物基因組的恢復(fù)率。 評(píng)估宏基因組組裝基因組 (MAG) 的質(zhì)量是下游分析之前的關(guān)鍵步驟。 在這里,我們提出了 CheckM2,這是一種使用機(jī)器學(xué)習(xí)預(yù)測(cè) MAG 基因組質(zhì)量的改進(jìn)方法。 使用合成和實(shí)驗(yàn)數(shù)據(jù),我們證明 CheckM2 在準(zhǔn)確性和計(jì)算速度方面均優(yōu)于現(xiàn)有工具。 此外,CheckM2的數(shù)據(jù)庫(kù)可以使用新的高質(zhì)量參考基因組快速更新,包括僅由單個(gè)基因組代表的分類群。 我們還表明,CheckM2 可以準(zhǔn)確預(yù)測(cè)來(lái)自新譜系的 MAG 的基因組質(zhì)量,即使對(duì)于那些基因組大小較小的譜系(例如,Patescibacteria 和 DPANN superphylum)也是如此。 CheckM2 提供跨細(xì)菌和古菌譜系的準(zhǔn)確基因組質(zhì)量預(yù)測(cè),在從 MAG 推斷生物學(xué)結(jié)論時(shí)增強(qiáng)信心。
提升
1 準(zhǔn)確性和計(jì)算速度
2 新的高質(zhì)量參考基因組數(shù)據(jù)庫(kù)
3 準(zhǔn)確預(yù)測(cè)新譜系基因組
github地址:https://github.com/chklovski/CheckM2
bioconda: https://bioconda.github.io/recipes/checkm2/README.html
安裝
conda
conda create -n checkm2
conda activate checkm2
conda install -c bioconda -c conda-forge checkm2
mamba
mamba create -n checkm2 -c bioconda -c conda-forge checkm2
source /XX/huty/software/miniconda3/etc/profile.d/conda.sh
conda activate checkm2
checkm2 -h
# export數(shù)據(jù)庫(kù)
export CHECKM2DB="/hwfsxx1/ST_HN/PXXX/huty/databases/checkm2_db/uniref100.KO.1.dmnd"
# 命令行設(shè)置數(shù)據(jù)庫(kù)
checkm2 predict \
-i ./folder_with_MAGs \
-o ./output_folder \
--database_path /hwfsxx1/ST_HN/P18Z10200N0423/huty/databases/checkm2_db/uniref100.KO.1.dmnd

應(yīng)用
source /hwfsxx1/ST_HN/P18Z10200N0423/huty/software/miniconda3/etc/profile.d/conda.sh
conda activate checkm2
checkm2 predict \
--threads 24 \
-x fa \
-i 02_MAG/$infile/bins/ \
-o 02_MAG/$infile/bins_checkm2 \
--database_path /hwfsxx1/ST_HN/P18Z10200N0423/huty/databases/checkm2_db/CheckM2_database/uniref100.KO.1.dmnd
[02/22/2024 10:29:10 AM] INFO: Running CheckM2 version 1.0.1
[02/22/2024 10:29:10 AM] INFO: Custom database path provided for predict run. Checking database at /hwfsxx1/ST_HN/P18Z10200N0423/huty/databases/checkm2_db/Ch
[02/22/2024 10:29:17 AM] INFO: Running quality prediction workflow with 24 threads.
[02/22/2024 10:29:23 AM] INFO: Calling genes in 74 bins with 24 threads:
[02/22/2024 10:30:46 AM] INFO: Calculating metadata for 74 bins with 24 threads:
[02/22/2024 10:30:47 AM] INFO: Annotating input genomes with DIAMOND using 24 threads
[02/22/2024 10:33:43 AM] INFO: Processing DIAMOND output
[02/22/2024 10:33:44 AM] INFO: Predicting completeness and contamination using ML models.
[02/22/2024 10:33:52 AM] INFO: Parsing all results and constructing final output table.
[02/22/2024 10:33:52 AM] INFO: CheckM2 finished successfully.