软件余款合集模块司机流程(下载地址参考文献官网地址最新版本)

16S测序,也即是扩增子测序,因为其“短平快”、“物美价廉”的特点,目前可谓是科研工作者们最为喜闻乐见的高通量测序类型了。
由于其数据量很小,越来越多没有HPC的宝宝们都可以用小通量的服务器甚至是好的笔记本来自己作数据分析了。
也因此,扩增子的软件层出不穷,从集成的傻瓜式分析软件,到各种解决特定小问题的软件和小工具,林林总总上百种。
这里就给大家盘点一些主流的软件和数据库,并稍作点评,欢迎补充、指正。
01流程集成1、QIIMEQIIME is an open-source bioinformatics pipeline for performing microbiome analysis from raw DNA sequencing data. QIIME includes demultiplexing and quality filtering, OTU picking, taxonomic assignment, and phylogenetic reconstruction, and diversity analyses and visualizations.最新版本:QIIME2(2018年1月1日后QIIME1将不再支持和更新)参考文献:PMID:20383131下载地址:https://www.bioinformatics.babraham.ac.uk/projects/download.html#fastqc官网地址:QIIME2: https://docs.qiime2.org/2017.8/QIIME1: http://qiime.org/流程示例地址:https://docs.qiime2.org/2017.8/tutorials/moving-pictures/2、MothurMothur is currently the most cited bioinformatics tool for analyzing 16S rRNA gene sequences. Step inside the wiki and user forum and learn how you can use mothur to process data generated by Sanger, PacBio, IonTorrent, 454, and Illumina (MiSeq/HiSeq). 最新版本:Version 1.39.5参考文献:PMID:19801464下载地址:https://github.com/mothur/mothur/releases/tag/v1.39.5官网地址:https://www.mothur.org/流程示例地址:https://www.mothur.org/wiki/MiSeq_SOP3、UsearchUSEARCH is a unique sequence analysis tool with thousands of users world-wide, which combines many different algorithms into a single package with outstanding documentation and support. 最新版本:Version 10参考文献:PMID:20709691下载地址:http://drive5.com/usearch/download.html官网地址:http://drive5.com/usearch/4、FunGeneFunctional Gene Pipeline Scripts contains a set of python scripts that allows to run one or more individual tools offered by RDP FunGene Pipeline. These tools are offered a modular fashion allowing researchers to choose the appropriate subset based on their needs.最新版本:Version 9.3参考文献:PMID:24101916官网地址:http://fungene.cme.msu.edu/流程示例地址:http://fungene.cme.msu.edu/FunGenePipeline/5、SILVAngsSILVAngs is a data analysis service for ribosomal RNA gene (rDNA) amplicon reads from high-throughput sequencing approaches based on an automatic software pipeline. It uses the SILVA rDNA databases, taxonomies, and alignments as a reference. It facilitates the classification of rDNA reads and provides a wealth of results (tables, graphs and sequence files) for download.最新版本:Version 9.3参考文献:PMID:23193283官网地址:https://www.arb-silva.de/ngs/流程示例地址:https://www.arbsilva.de/ngs/#demo: 老司机点评:在扩增子数据分析中,分析点相对成熟,软件繁多,盘点下来不止百种。
一一安装又浪费资源又浪费时间,打包了多种软件的流程式软件备受青睐。
这其中最为有名的便是QIIME和Mothur, 基本上可能用到的分析点大多都打包进去了。
老牌聚类软件usearch不落人后,也将数据前处理、OTU聚类、物种注释、多样性分析等一并打包进去,虽则不像qiime中花样繁多,基本上的分析也够了,唯一可惜的是64位版本收费。
一些数据库如RDP和SILVA等也纷纷动作,如SILVAngs的在线分析平台,FunGene的功能基因分析流程,RDP自己的rdpipeline(http://pyro.cme.msu.edu/)等,这里不一一列举。
02数据质控1、FastQCA quality control tool for high throughput sequence data.最新版本:Version 0.11.5参考文献:PMID:22312429下载地址:https://www.bioinformatics.babraham.ac.uk/projects/download.html#fastqc官网地址:https://www.bioinformatics.babraham.ac.uk/projects/fastqc/2、TrimmomaticA flexible trimmer for Illumina Sequence Data最新版本:Version 0.36参考文献:PMID:24695404下载地址:http://www.usadellab.org/cms/?page=trimmomatic3、QIIME split_libraries_fastq.py软件地址:http://qiime.org/命令使用说明:http://qiime.org/scripts/split_libraries_fastq.html 老司机点评:扩增子的数据质控在分析的好几个地方都会用到,从原始数据下机,先要经历质控的环节,序列首先截掉接头、barcode、引物,做个质量评价和过滤,根据 PE reads的overlap拼接在一起,然后还要经历拼接后序列的质控,去除低质量、读N、过段序列,然后才能用于后续的聚类和注释分析。
这里把质控的部分都放到一块来写。
FastQC这个软件在《NGS数据格式演化简史》里面介绍过,基本上是原始数据质控的标配了。
Trimmomatic是一个划动窗口的过滤和截断软件,对illumina这种序列尾部质量显著下降的很有用。
拼接后序列的过滤在QIIME中有自编脚本,可调用执行。
03Reads拼接1、FLASHA very fast and accurate software tool to merge paired-end reads from NGS experiments.最新版本:Version 1.2.11参考文献:PMID:21903629下载地址:https://sourceforge.net/projects/flashpage/files/官网地址:https://ccb.jhu.edu/software/FLASH/2、PEAR An ultrafast, memory-efficient and highly accurate pair-end read merger. It is fully parallelized and can run with as low as just a few kilobytes of memory.最新版本:Version 0.9.8参考文献:PMID: 24142950下载地址:https://sco.hits.org/exelixis/web/software/pear/downloads.html官网地址:https://sco.hits.org/exelixis/web/software/pear/3、PANDAseqPANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. 最新版本:Version 2.11参考文献:PMID:22333067下载地址:https://github.com/neufeld/pandaseq/releases/tag/v2.11官网地址:http://neufeldserver.uwaterloo.ca/~apmasell/pandaseq_man1.html4、fastq-jionCommand-line tools for processing biological sequencing data参考文献:Command-line tools for processing biological sequencing data官网地址:https://github.com/ExpressionAnalysis/ea-utils/blob/wiki/FastqJoin.md 老司机点评:目前最为主流的拼接软件仍为flash,但如果扩增片段过长或过短时,flash拼接效果可能不尽如人意,针对这些情况用pear或pandaseq拼接可能会有惊喜。
fastq-join是打包在qiime中的拼接软件,在qiime中运行join_paired_ends.py默认调用fastq-join,可选其他软件如SeqPrep(https://github.com/jstjohn/SeqPrep),后者目前还比较少在文献中出现,运算速度上这两个软件还是不错的。
04嵌合体去除1、DECIPHERDECIPHER is a software toolset that can be used for deciphering and managing biological sequences efficiently using the R programming language. DECIPHER's Find Chimeras web tool can be used to uncover chimeras hidden in 16S rRNA sequences.最新版本:Version 2.2.0参考文献:PMID:22101057下载地址:http://decipher.cee.wisc.edu/Download.html官网地址:http://decipher.cee.wisc.edu/index.html2、ChimeraSlayerChimeraSlayer uses BLAST to identify potential chimera parents and computes the optimal branching alignment of the query against two parents. An input with the pynast aligned representative sequences is suggested. 最新版本:Version 2.2.0参考文献:PMID:21212162下载地址:https://sourceforge.net/projects/microbiomeutil/files/官网地址:http://microbiomeutil.sourceforge.net/#A_CS3、VSEARCHVSEARCH supports de novo and reference based chimera detection, clustering, full-length and prefix dereplication, rereplication, reverse complementation, masking, all-vs-all pairwise global alignment, exact and global alignment searching, shuffling, subsampling and sorting. It also supports FASTQ file analysis, filtering, conversion and merging of paired-end reads.最新版本:Version 2.4.4参考文献:PMID: 27781170下载地址:https://github.com/torognes/vsearch/releases官网地址:https://github.com/torognes/vsearch4、UCHIME2UCHIME2 and UCHIME are algorithms for detecting chimeric sequences.最新版本:Version 4.2参考文献:doi: https://doi.org/10.1101/074252下载地址:http://drive5.com/uchime/uchime_download.html官网地址:http://drive5.com/usearch/manual/uchime_algo.html5、usearch61usearch61 performs both de novo (abundance based) chimera and reference based detection. With usearch61, unclustered sequences should be used as input rather than a representative sequence set, as these sequences need to be clustered to get abundance data.参考文献:PMID:20709691下载地址:http://drive5.com/usearch/download.html官网地址:http://drive5.com/usearch/usearch_docs.html 老司机点评:嵌合体的去除主要是de novo和基于参考库两种方法,结合了两种方法的usearch61被打包在qiime中(identify_chimeric_seqs.py),是目前主流的方法之一。
但是注意,上面说到过,usearch的64位版本是收费的。

前几年专门用uchime去嵌合体也应用较多,但现在官网上已指出不推荐单独安装uchime,推荐直接下载usearch。
VSEARCH是作为替代usearch的开源软件推出的,与usearch运算速度不分上下,是mothur中嵌合体去除和聚类的推荐方法,建议大家可以试试。
ChimeraSlayer运算速度较慢,DECIPHER已经在uchime官网里被吊打,这里不做推荐。
05OTU聚类1、UCLUSTUCLUST creates “seeds” of sequences which generate clusters based on percent identity. Uclust_ref, as uclust, but takes a reference database to use as seeds. New clusters can be toggled on or off.参考文献:PMID:20709691下载地址:http://www.drive5.com/uclust/downloads1_2_22q.html官网地址:https://www.drive5.com/usearch/manual/uclust_algo.html2、UparseUPARSE is a method for generating clusters (OTUs) from next-generation sequencing reads of marker genes such as 16S rRNA, the fungal ITS region and the COI gene. 参考文献:PMID:23955772下载地址:http://www.drive5.com/usearch/manual/cmd_cluster_otus.html官网地址:https://www.drive5.com/uparse/3、CD-HITCD-HIT is a very widely used clustering program, which applies a “longest-sequence-first list removal algorithm” to cluster sequences.最新版本:Version 4.6.8参考文献:PMID:23060610下载地址:https://github.com/weizhongli/cdhit/releases官网地址:http://weizhongli-lab.org/cd-hit/4、MothurFor the Mothur method, the clustering algorithm may be specified as nearest-neighbor, furthest-neighbor, or average-neighbor. The default algorithm is furthest-neighbor.详见第一部分介绍5、OclustA pipeline for clustering long 16S rRNA sequencing reads, or any sequences, into OTUs.参考文献:PMID: 26434730下载地址:https://github.com/oscar-franzen/oclust/官网地址:https://omictools.com/oclust-tool 老司机点评:OTUs聚类的方法有非常多,主要分为启发式算法和层次聚类算法两种,前者有uparse、uclust、CD-HIT等,后者如mothur和oclust等。
从应用情况来看,目前主流上的聚类软件还是以uparse、uclust、mothur几种为主。
上面提到的软件,大多都有打包在qiime中,默认聚类软件是uclust(pick_otus.py)。
最后列出的Oclust主打基于三代Pacbio长序列的聚类,鉴于目前二代测序独领风骚的局面,目前应用尚且较少。
06物种注释1、GreengenesA 16S rRNA gene database addresses limitations of public repositories by providing chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies. 最新版本:Version 13.5参考文献:PMID: 16820507 下载地址:http://greengenes.secondgenome.com/downloads/database/13_5官网地址:http://greengenes.secondgenome.com/2、SilvaSILVA provides comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya). 最新版本:SILVA 128参考文献:PMID:23193283下载地址:https://www.arbsilva.de/documentation/release-128/官网地址:https://www.arb-silva.de/3、RDPRDP provides quality-controlled, aligned and annotated Bacterial and Archaeal 16S rRNA sequences, and Fungal 28S rRNA sequences, and a suite of analysis tools to the scientific community.最新版本:Version 11.5参考文献:PMID: 24288368下载地址:http://rdp.cme.msu.edu/misc/rel10info.jsp官网地址:http://rdp.cme.msu.edu/index.jsp4、UniteUNITE is a user-friendly Nordic ITS Ectomycorrhiza Database designed to provide a stable and reliable platform for sequence-borne identification of ectomycorrhizal asco- and basidiomycetes, including only high-quality sequences of well identified fungi. 最新版本:Version 7.2参考文献:PMID:15869663下载地址:https://unite.ut.ee/repository.php官网地址:https://unite.ut.ee/5、FunGeneFunctional Gene Pipeline Scripts contains a set of python scripts that allows to run one or more individual tools offered by RDP FunGene Pipeline. These tools are offered a modular fashion allowing researchers to choose the appropriate subset based on their needs.最新版本:Version 9.3参考文献:PMID: 24101916官网地址:http://fungene.cme.msu.edu/ 老司机点评:扩增子分析中,16S序列注释以Greegene、Silva和 RDP为主,早期Greegene用的最多,当然这与打包在QIIME中密不可分,2013年5月后就一直没有更新,做分析的童鞋纷纷转去用Silva注释,Silva基本上每年还是都有更新的,好玩的是,后面我们会讲到两个比较有名的功能预测软件,PICRUSt需要与Greengene配合使用,Tax4fun推荐与Silva配合使用。
另外,真菌ITS注释主要还是应用Unite数据库。
功能基因早期用NT库注释效果惨不忍睹,近几年Fungene不断完善,基本上是功能基因扩增子测序物种注释的不二选择了。
07序列比对1、PyNASTPyNAST is a reimplementation of the NAST sequence aligner, which has become a popular tool for adding new 16s rRNA sequences to existing 16s rRNA alignments.最新版本:PyNAST 1.0参考文献:PMID: 19914921下载地址:http://biocore.github.io/pynast/install.html官网地址:http://biocore.github.io/pynast/2、MuscleMUSCLE is an alignment method which stands for MUltiple Sequence Comparison by Log-Expectation. On average, MUSCLE is cited by ten new papers every day. 最新版本:Version 3.8.31参考文献:PMID:15034147下载地址:http://www.drive5.com/muscle/downloads.htm官网地址:http://www.drive5.com/muscle/3、MafftMAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼30,000 sequences), etc.最新版本:Version 7.310参考文献:PMID: 12136088下载地址:http://mafft.cbrc.jp/alignment/software/#Download%20and%20Installation官网地址:http://mafft.cbrc.jp/alignment/software/4、InfernalInfernal (\"INFERence of RNA ALignment\") is for searching DNA sequence databases for RNA structure and sequence similarities. 最新版本:Version 1.1.2参考文献:PMID: 24008419下载地址:http://eddylab.org/infernal/#Downloads官网地址: http://eddylab.org/infernal/ 老司机点评:几款序列比对软件都打包在了QIIME中,调用 即可得到。
几款软件中,Pynast和Infernal类似,都是基于参考库比对,但Infernal运行速度要慢得多,应用也少很多。
Muscle和Mafft都是不依赖于参考库的全局比对软件,muscle号称每天产出十篇文章,虽然这个数字不只是微生物组的应用,但也不可谓不广泛,mafft与之类似,有测评软件显示mafft比对准确性高,但速度上没什么优势,目前对于没有好的参考库的序列比对时(如功能基因等),这俩方法都有应用。
08功能预测1、PICRUStPICRUSt (pronounced “pie crust”) is a bioinformatics software package designed to predict metagenome functional content from marker gene (e.g., 16S rRNA) surveys and full genomes.最新版本:PICRUSt 1.1.2参考文献:PMID:23975157下载地址:https://github.com/picrust/picrust官网地址:http://picrust.github.io/picrust/2、Tax4FunTax4Fun is a open-source R package that predicts the functional capabilities of microbial communities based on 16S datasets. Tax4Fun is applicable to output as obtained from the SILVAngs web server or the application of QIIME against the SILVA database.参考文献:PMID:25957349下载地址:http://tax4fun.gobics.de/#Download官网地址:http://tax4fun.gobics.de/3、FAPROTAXFAPROTAX is a database that maps prokaryotic clades (e.g. genera or species) to established metabolic or other ecologically relevant functions, using the current literature on cultured strains. 最新版本:FAPROTAX 1.1参考文献:PMID:28812567下载地址:http://www.zoology.ubc.ca/louca/FAPROTAX/lib/php/index.php?section=Download官网地址:http://www.zoology.ubc.ca/louca/FAPROTAX/lib/php/index.php4、FUNGuildAn open annotation tool for parsing fungal community datasets by ecological guild.参考文献:https://doi.org/10.1016/j.funeco.2015.06.006下载地址:https://github.com/UMNFuN/FUNGuild.git官网地址:http://www.stbates.org/guilds/app.php 老司机点评:由于扩增子本身是对物种层面的分析,如能实现对其功能的预测,能解决的科学问题就多了。
目前来说,功能预测软件仍以PICRUSt应用最多,但随着大家对古菌、真菌等多种非细菌群体的关注和注释数据库的更迭,其他软件应用也变多了。
比如,上面我们说到,随着注释数据库的变更,Tax4Fun应用增多;专注于于环境样本的生物地球化学循环过程的FAPROTAX,真菌功能预测的FUNGuild等。
09常用作图及统计软件1、基础作图类R ggplot2:https://cran.rproject.org/web/packages/ggplot2/Perl SVG: https://metacpan.org/pod/SVGPython matplotlib: https://matplotlib.org/QIIME: http://qiime.org/2、物种统计及可视化STAMP: kiwi.cs.dal.ca/Software/STAMPLefSE:http://huttenhower.sph.harvard.edu/galaxy/Metastat: http://clovr.org/docs/metastats/QIIME: http://qiime.org/3、多样性分析QIIME:http://qiime.org/Mothur: https://www.mothur.org/Usearch: http://drive5.com/usearch/4、系统发生树可视化GraPhlAn:http://huttenhower.org/galaxy/iTOL: https://itol.embl.de/5、环境因子分析R vegan:https://cran.r-project.org/web/packages/vegan/Canoco5: http://www.canoco5.com/6、网络互作分析Cytoscape: http://www.cytoscape.org/Gephi:https://gephi.org/ 老司机点评:这部分给大家列了一些常见的软件,一般来说,如果得到了物种注释后的otu_table和序列比对后构建的发生树rep_phylo.tre,基础的分析部分就已经做完了,后续分析主要是基于物种统计及展示、组间比较(多样性--alpha_div,群落结构--beta_div等)、关联分析(网络互作、环境因子等),根据需求可能还会有功能预测分析等,结合其他验证类实验解释微生物多样性变化相关联的科学问题。
/End.欢迎转发到朋友圈。

软件余款合集模块司机流程(下载地址参考文献官网地址最新版本)
(图片来源网络,侵删)

联系我们

在线咨询:点击这里给我发消息