前几年专门用uchime去嵌合体也应用较多,但现在官网上已指出不推荐单独安装uchime,推荐直接下载usearch。VSEARCH是作为替代usearch的开源软件推出的,与usearch运算速度不分上下,是mothur中嵌合体去除和聚类的推荐方法,建议大家可以试试。ChimeraSlayer运算速度较慢,DECIPHER已经在uchime官网里被吊打,这里不做推荐。05OTU聚类1、UCLUSTUCLUST creates “seeds” of sequences which generate clusters based on percent identity. Uclust_ref, as uclust, but takes a reference database to use as seeds. New clusters can be toggled on or off.参考文献:PMID:20709691下载地址:http://www.drive5.com/uclust/downloads1_2_22q.html官网地址:https://www.drive5.com/usearch/manual/uclust_algo.html2、UparseUPARSE is a method for generating clusters (OTUs) from next-generation sequencing reads of marker genes such as 16S rRNA, the fungal ITS region and the COI gene. 参考文献:PMID:23955772下载地址:http://www.drive5.com/usearch/manual/cmd_cluster_otus.html官网地址:https://www.drive5.com/uparse/3、CD-HITCD-HIT is a very widely used clustering program, which applies a “longest-sequence-first list removal algorithm” to cluster sequences.最新版本:Version 4.6.8参考文献:PMID:23060610下载地址:https://github.com/weizhongli/cdhit/releases官网地址:http://weizhongli-lab.org/cd-hit/4、MothurFor the Mothur method, the clustering algorithm may be specified as nearest-neighbor, furthest-neighbor, or average-neighbor. The default algorithm is furthest-neighbor.详见第一部分介绍5、OclustA pipeline for clustering long 16S rRNA sequencing reads, or any sequences, into OTUs.参考文献:PMID: 26434730下载地址:https://github.com/oscar-franzen/oclust/官网地址:https://omictools.com/oclust-tool 老司机点评:OTUs聚类的方法有非常多,主要分为启发式算法和层次聚类算法两种,前者有uparse、uclust、CD-HIT等,后者如mothur和oclust等。从应用情况来看,目前主流上的聚类软件还是以uparse、uclust、mothur几种为主。上面提到的软件,大多都有打包在qiime中,默认聚类软件是uclust(pick_otus.py)。最后列出的Oclust主打基于三代Pacbio长序列的聚类,鉴于目前二代测序独领风骚的局面,目前应用尚且较少。06物种注释1、GreengenesA 16S rRNA gene database addresses limitations of public repositories by providing chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies. 最新版本:Version 13.5参考文献:PMID: 16820507 下载地址:http://greengenes.secondgenome.com/downloads/database/13_5官网地址:http://greengenes.secondgenome.com/2、SilvaSILVA provides comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya). 最新版本:SILVA 128参考文献:PMID:23193283下载地址:https://www.arbsilva.de/documentation/release-128/官网地址:https://www.arb-silva.de/3、RDPRDP provides quality-controlled, aligned and annotated Bacterial and Archaeal 16S rRNA sequences, and Fungal 28S rRNA sequences, and a suite of analysis tools to the scientific community.最新版本:Version 11.5参考文献:PMID: 24288368下载地址:http://rdp.cme.msu.edu/misc/rel10info.jsp官网地址:http://rdp.cme.msu.edu/index.jsp4、UniteUNITE is a user-friendly Nordic ITS Ectomycorrhiza Database designed to provide a stable and reliable platform for sequence-borne identification of ectomycorrhizal asco- and basidiomycetes, including only high-quality sequences of well identified fungi. 最新版本:Version 7.2参考文献:PMID:15869663下载地址:https://unite.ut.ee/repository.php官网地址:https://unite.ut.ee/5、FunGeneFunctional Gene Pipeline Scripts contains a set of python scripts that allows to run one or more individual tools offered by RDP FunGene Pipeline. These tools are offered a modular fashion allowing researchers to choose the appropriate subset based on their needs.最新版本:Version 9.3参考文献:PMID: 24101916官网地址:http://fungene.cme.msu.edu/ 老司机点评:扩增子分析中,16S序列注释以Greegene、Silva和 RDP为主,早期Greegene用的最多,当然这与打包在QIIME中密不可分,2013年5月后就一直没有更新,做分析的童鞋纷纷转去用Silva注释,Silva基本上每年还是都有更新的,好玩的是,后面我们会讲到两个比较有名的功能预测软件,PICRUSt需要与Greengene配合使用,Tax4fun推荐与Silva配合使用。另外,真菌ITS注释主要还是应用Unite数据库。功能基因早期用NT库注释效果惨不忍睹,近几年Fungene不断完善,基本上是功能基因扩增子测序物种注释的不二选择了。07序列比对1、PyNASTPyNAST is a reimplementation of the NAST sequence aligner, which has become a popular tool for adding new 16s rRNA sequences to existing 16s rRNA alignments.最新版本:PyNAST 1.0参考文献:PMID: 19914921下载地址:http://biocore.github.io/pynast/install.html官网地址:http://biocore.github.io/pynast/2、MuscleMUSCLE is an alignment method which stands for MUltiple Sequence Comparison by Log-Expectation. On average, MUSCLE is cited by ten new papers every day. 最新版本:Version 3.8.31参考文献:PMID:15034147下载地址:http://www.drive5.com/muscle/downloads.htm官网地址:http://www.drive5.com/muscle/3、MafftMAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼30,000 sequences), etc.最新版本:Version 7.310参考文献:PMID: 12136088下载地址:http://mafft.cbrc.jp/alignment/software/#Download%20and%20Installation官网地址:http://mafft.cbrc.jp/alignment/software/4、InfernalInfernal (\"INFERence of RNA ALignment\") is for searching DNA sequence databases for RNA structure and sequence similarities. 最新版本:Version 1.1.2参考文献:PMID: 24008419下载地址:http://eddylab.org/infernal/#Downloads官网地址: http://eddylab.org/infernal/ 老司机点评:几款序列比对软件都打包在了QIIME中,调用 即可得到。几款软件中,Pynast和Infernal类似,都是基于参考库比对,但Infernal运行速度要慢得多,应用也少很多。Muscle和Mafft都是不依赖于参考库的全局比对软件,muscle号称每天产出十篇文章,虽然这个数字不只是微生物组的应用,但也不可谓不广泛,mafft与之类似,有测评软件显示mafft比对准确性高,但速度上没什么优势,目前对于没有好的参考库的序列比对时(如功能基因等),这俩方法都有应用。08功能预测1、PICRUStPICRUSt (pronounced “pie crust”) is a bioinformatics software package designed to predict metagenome functional content from marker gene (e.g., 16S rRNA) surveys and full genomes.最新版本:PICRUSt 1.1.2参考文献:PMID:23975157下载地址:https://github.com/picrust/picrust官网地址:http://picrust.github.io/picrust/2、Tax4FunTax4Fun is a open-source R package that predicts the functional capabilities of microbial communities based on 16S datasets. Tax4Fun is applicable to output as obtained from the SILVAngs web server or the application of QIIME against the SILVA database.参考文献:PMID:25957349下载地址:http://tax4fun.gobics.de/#Download官网地址:http://tax4fun.gobics.de/3、FAPROTAXFAPROTAX is a database that maps prokaryotic clades (e.g. genera or species) to established metabolic or other ecologically relevant functions, using the current literature on cultured strains. 最新版本:FAPROTAX 1.1参考文献:PMID:28812567下载地址:http://www.zoology.ubc.ca/louca/FAPROTAX/lib/php/index.php?section=Download官网地址:http://www.zoology.ubc.ca/louca/FAPROTAX/lib/php/index.php4、FUNGuildAn open annotation tool for parsing fungal community datasets by ecological guild.参考文献:https://doi.org/10.1016/j.funeco.2015.06.006下载地址:https://github.com/UMNFuN/FUNGuild.git官网地址:http://www.stbates.org/guilds/app.php 老司机点评:由于扩增子本身是对物种层面的分析,如能实现对其功能的预测,能解决的科学问题就多了。目前来说,功能预测软件仍以PICRUSt应用最多,但随着大家对古菌、真菌等多种非细菌群体的关注和注释数据库的更迭,其他软件应用也变多了。比如,上面我们说到,随着注释数据库的变更,Tax4Fun应用增多;专注于于环境样本的生物地球化学循环过程的FAPROTAX,真菌功能预测的FUNGuild等。09常用作图及统计软件1、基础作图类R ggplot2:https://cran.rproject.org/web/packages/ggplot2/Perl SVG: https://metacpan.org/pod/SVGPython matplotlib: https://matplotlib.org/QIIME: http://qiime.org/2、物种统计及可视化STAMP: kiwi.cs.dal.ca/Software/STAMPLefSE:http://huttenhower.sph.harvard.edu/galaxy/Metastat: http://clovr.org/docs/metastats/QIIME: http://qiime.org/3、多样性分析QIIME:http://qiime.org/Mothur: https://www.mothur.org/Usearch: http://drive5.com/usearch/4、系统发生树可视化GraPhlAn:http://huttenhower.org/galaxy/iTOL: https://itol.embl.de/5、环境因子分析R vegan:https://cran.r-project.org/web/packages/vegan/Canoco5: http://www.canoco5.com/6、网络互作分析Cytoscape: http://www.cytoscape.org/Gephi:https://gephi.org/ 老司机点评:这部分给大家列了一些常见的软件,一般来说,如果得到了物种注释后的otu_table和序列比对后构建的发生树rep_phylo.tre,基础的分析部分就已经做完了,后续分析主要是基于物种统计及展示、组间比较(多样性--alpha_div,群落结构--beta_div等)、关联分析(网络互作、环境因子等),根据需求可能还会有功能预测分析等,结合其他验证类实验解释微生物多样性变化相关联的科学问题。/End.欢迎转发到朋友圈。
(图片来源网络,侵删)
0 评论