神器系统ape(神器系统函数序列线图)

来看看能被我称为"神器"的,到底是什么?APE:在R语言中分析系统发育和进化 (e.g. 在获得全基因组Alignment之后)Emmanuel Paradis, Julien Claude & Korbinian Strimmer法国蒙彼利埃大学 古生物生物学和系统发育实验室;慕尼黑大学统计学系翻译APE发表的1篇原始文章,穿插一些代码、辅助信息,来了解APE的概况:DOI: 10.1093/bioinformatics/btg412APE (Analysis of Phylogenetics and Evolution) 是1个用R语言编写的用于分子进化和系统发育分析的免费软件包,提供了读写数据和操作系统发育树的实用函数,以及几种用于系统发育和进化分析的高级方法 (e.g. 比较和群体遗传方法)
APE利用了许多用于统计和图形的R函数,并为开发、实现进一步的进化过程分析的统计方法,提供了灵活的框架
可从官方的R包存档 (http://cran.r-project.org/src/contrib/PACKAGES.html#ape,现在可能为 https://cran.r-project.org/web/packages/ape/index.html)获得
从广义上说,系统发育分析涵盖了非常广泛的方法,从计算进化距离、重建基因树、估计分叉(Divergence)日期,到比较数据分析、进化速率估计和多样性(Diversification)分析
所有这些不同的任务都有一个特殊的共同点:严重依赖计算统计 (Computational statistics)
R系统是一个免费的独立于平台的开源分析环境,最近(2004)已经形成统计计算和图形(Graphics)事实上的标准 (Ihaka & Gentleman, 1996)
R的1个优点是,通过编写专门的包,可以很容易地针对特定应用领域进行定制(Tailored to a particular application area)
特别是,R在生物信息学中的有用性已经在基因表达数据(即转录组)的分析中得到了令人印象深刻的证明(http://www.bioconductor.org)
APE是首次在系统发育和进化数据的分析中发挥R的优势
APE专注于使用系统发育和系谱(Genealogical)树作为输入的统计分析
在v1.1版中,APE提供了读取、写入、绘制、操作系统发育树,并在系统发育框架中分析比较数据(Comparative data)、多样化分析、计算等位基因和核苷酸数据的距离、读取核苷酸序列和其它几个工具的功能,如Mantel测试、最小生成树的计算、群体遗传学参数的估计
表1概述了APE中目前可用的函数
注意,有些方法 (如比较法、天际线图/Skyline plot等) 以前只能在专门的软件中使用
外部的树重建程序(如PHYLIP)可以通过标准的Shell命令从R调用
表 1. APE v1.1版中可用的特定函数mst 最小生成树 - Minimum Spanning Treephylo 进化树在ape包中的数据对象的类(Class)名称 - A single phylogenetic tree may have several representations in the Newick format and in the "phylo" class of objects used in ‘ape’.skyline estimate of effective population size from an estimated phylogeny. ltt.plot Lineages Through Time Plot. coalescent 溯祖. hivtree 193个HIV-1序列的系统发育树数据. opsin 视紫蛋白数据.R语言的另1个优点是可以直接获得具有发表质量的图形输出,特别是使用PostScript设备时
例如,APE中的系统发育绘图功能处理颜色、线的粗细、字体、标签的间距,这些可为每个分枝(Branch)单独定义,因此可在单个系统发育图上表示3个不同的变量
APE还产生复杂的群体遗传学图,如广义的天际线图(Generalized skyline plot, Strimmer和Pybus, 2001),只需1个命令
与任何R包一样,APE是命令行驱动的 (重现、维护、共享非常方便)
函数由用户调用,可能带有参数和选项
在R中使用APE的任何会话(Session)都从此命令开始(当然,需要提前安装):library(ape)这使得APE的函数(执行特定的统计、绘图等功能)在R环境中可用
下面的命令可显示这些函数的列表 (名称和简述):library(help = ape)APE中所有可用的R函数(表1)都以R超文本格式记录,有关它们使用的信息可以通过应用help命令检索,例如:help(read.tree)以标准Newick插句格式保存在磁盘上的文本文件(e.g. tree1.txt)中的进化树可被读取:tree1<-read.tree('tree1.txt')这会将系统进化树存储在名为"tree1"的类(Class)'phylo'对象中
该对象中存储的信息(e.g. 分枝长度)可通过输入`tree1`(命令)来检查,并且可通过执行以下(命令),来获得进化分枝图(Cladogram)形式的图形输出:plot(tree1)这实际上调用了APE的plot.phylo函数,绘制了系统发育树tree1
由于R的面向对象的性质,命令plot(x) (泛型函数?)可能会给出1个完全不同的结果,取决于对象所属的类 (class of x)
默认情况下,树被绘制在图形窗口上(Graphical window),但可以根据操作系统以各种文件格式导出
除了这些简单的示例之外,在面向对象的结构中表示系统发育树可以直接操作进化分析中使用的各种计算(方法)的系统发育数据
目前在APE中实施的一些方法:系统发育独立对比(Felsenstein, 1985; Harvey and Pagel, 1991),拟合生卒模型 (Fitting birth–death models. Nee et al., 1994; Pybus and Harvey, 2000),群体遗传分析(Nee et al., 1995;Strimmer and Pybus, 2001),进化速率的非参数平滑 (Sanderson, 1997),利用Klastorin的方法在系统发育树中估计基因分组组 (Misawa and Tajima, 2000)
此外,在R函数hclust中实现的基于距离的聚类方法(Distance based clustering)可被APE使用,进而通过函数转换并生成类'phylo'类对象和'hclust'类对象
APE中的类和方法 (如'phylo') 可以很容易地进一步扩展,来包括其它功能,例如:注释系统发育树
因此,APE不仅是一个数据分析包,也是一个开发和实现新方法的(开发)环境
此外,用C、C++或Fortran77编写的程序可从R中链接和调用,对于计算机密集型计算很有用
(附) APE绘制天际线图 (示例,未美化)# mcmc.popsize函数 - 可逆跳跃MCMC (马尔可夫链蒙特卡罗) 推断群体数量史 (Demographic History)# get treedata("hivtree.newick") # example tree in NH formattree.hiv <- read.tree(text = hivtree.newick) # load tree# run mcmc chainmcmc.out <- mcmc.popsize(tree.hiv, nstep=100, thinning=1, burn.in=0,progress.bar=FALSE) # toy run#mcmc.out <- mcmc.popsize(tree.hiv, nstep=10000, thinning=5, burn.in=500) # remove comments!!# make list of population size versus timepopsize <- extract.popsize(mcmc.out)# plot and compare with skyline plotsk <- skyline(tree.hiv)plot(sk, lwd=1, lty=3, show.years=TRUE, subst.rate=0.0023, present.year = 1997)lines(popsize, show.years=TRUE, subst.rate=0.0023, present.year = 1997)mcmc.popsize函数 - 可逆跳跃MCMC (马尔可夫链蒙特卡罗) 推断群体数量史 (Demographic History)# skyline函数 - 使用天际线图估计有效种群大小(Effective Population Size)# get treedata("hivtree.newick") # example tree in NH formattree.hiv <- read.tree(text = hivtree.newick) # load tree# corresponding coalescent intervalsci <- coalescent.intervals(tree.hiv) # from tree# collapsed intervalscl1 <- collapsed.intervals(ci,0)cl2 <- collapsed.intervals(ci,0.0119)#### classic skyline plot ####sk1 <- skyline(cl1) # from collapsed intervalssk1 <- skyline(ci) # from coalescent intervalssk1 <- skyline(tree.hiv) # from treesk1plot(skyline(tree.hiv))skylineplot(tree.hiv) # shortcutplot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997)#### generalized skyline plot ####sk2 <- skyline(cl2) # from collapsed intervalssk2 <- skyline(ci, 0.0119) # from coalescent intervalssk2 <- skyline(tree.hiv, 0.0119) # from treesk2plot(sk2)# classic and generalized skyline plot together in one plotplot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997, col=c(grey(.8),1))lines(sk2, show.years=TRUE, subst.rate=0.0023, present.year = 1997)legend(.15,500, c("classic", "generalized"), col=c(grey(.8),1),lty=1)# find optimal epsilon parameter using AICc criterionfind.skyline.epsilon(ci)sk3 <- skyline(ci, -1) # negative epsilon also triggers estimation of epsilonsk3$epsilonskyline函数 - 使用天际线图估计有效种群大小(Effective Population Size)# skylineplot函数 - 绘制天际线图# get treedata("hivtree.newick") # example tree in NH formattree.hiv <- read.tree(text = hivtree.newick) # load tree#### classic skyline plotskylineplot(tree.hiv) # shortcut#### plot classic and generalized skyline plots and estimate epsilonsk.opt <- skylineplot.deluxe(tree.hiv)sk.opt$epsilon#### classic and generalized skyline plot ####sk1 <- skyline(tree.hiv)sk2 <- skyline(tree.hiv, 0.0119)# use years rather than substitutions as unit for the time axisplot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997, col=c(grey(.8),1))lines(sk2, show.years=TRUE, subst.rate=0.0023, present.year = 1997)legend(.15,500, c("classic", "generalized"), col=c(grey(.8),1),lty=1)#### various skyline plots for different epsilonslayout(mat= matrix(1:6,2,3,byrow=TRUE))ci <- coalescent.intervals(tree.hiv)plot(skyline(ci, 0.0));title(main="0.0")plot(skyline(ci, 0.007));title(main="0.007")plot(skyline(ci, 0.0119),col=4);title(main="0.0119")plot(skyline(ci, 0.02));title(main="0.02")plot(skyline(ci, 0.05));title(main="0.05")plot(skyline(ci, 0.1));title(main="0.1")layout(mat= matrix(1:1,1,1,byrow=TRUE))skylineplot函数 - 绘制天际线图(附) 关于程辑包‘ape v5.7-1’的信息library(help = ape)Version: 5.7-1Date: 2023-03-13Built: R 4.3.0; x86_64-w64-mingw32; windowsDescription: Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized/罚分 likelihood, dating trees with non-contemporaneous/非同期 sequences, translating DNA into AA sequences, and assessing sequence alignments.Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ, BIONJ, MVR, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.全部索引AAbin == Amino Acid SequencesCADM.global == Congruence/一致 among distance matricesDNAbin == Manipulate DNA Sequences in Bit-Level FormatDNAbin2indel == Recode Blocks of IndelsFastME == Tree Estimation Based on the Minimum Evolution AlgorithmInitialize.corPhyl == Initialize a 'corPhyl' Structure ObjectLTT == Theoretical Lineage-Through Time PlotsMPR == Most Parsimonious ReconstructionMoran.I == Moran's I Autocorrelation IndexSDM == Construction of Consensus Distance Matrix With SDMace == Ancestral Character Estimationadd.scale.bar == Add a Scale Bar to a Phylogeny Plotadditive == Incomplete Distance Matrix Fillingalex == Alignment Explorer With Multiple Devicesall.equal.DNAbin == Compare DNA Setsall.equal.phylo == Global Comparison of two Phylogeniesalview == Print DNA or AA Sequence Alignementape-package == Analyses of Phylogenetics and Evolutionapetools == Tools to Explore Filesas.alignment == Conversion Among DNA Sequence Internal Formatsas.bitsplits == Split Frequencies and Conversion Among Split Classesas.matching == Conversion Between Phylo and Matching Objectsas.phylo == Conversion Among Tree and Network Objectsas.phylo.formula == Conversion from Taxonomy Variables to Phylogenetic TreesaxisPhylo == Axis on Side of Phylogenybalance == Balance of a Dichotomous/二歧分枝 Phylogenetic Treebase.freq == Base frequencies from DNA Sequencesbd.ext == Extended Version of the Birth-Death Models to Estimate Speciation and Extinction Rates (估算物种形成和灭绝速率)bd.time == Time-Dependent Birth-Death ModelsbinaryPGLMM == Phylogenetic Generalized Linear Mixed Model for Binary Databind.tree == Binds Treesbionj == Tree Estimation Based on an Improved Version of the NJ Algorithmbird.families == Phylogeny of the Families of Birds From Sibley and Ahlquistbird.orders == Phylogeny of the Orders of Birds From Sibley and Ahlquistbirthdeath == Estimation of Speciation and Extinction Rates With Birth-Death Modelsboot.phylo == Tree Bipartition and Bootstrapping Phylogeniesbranching.times == Branching Times of a Phylogenetic Treec.phylo == Building Lists of Treescarnivora == Carnivora/食肉类 body sizes and life history traitscheckAlignment == Check DNA AlignmentscheckLabel == Checking LabelscheckValidPhylo == Check the Structure of a "phylo" Objectcherry == Number of Cherries and Null Models of Treeschiroptera == Bat PhylogenychronoMPL == Molecular Dating With Mean Path Lengthschronopl == Molecular Dating With Penalized Likelihoodchronos == Molecular Dating by Penalised Likelihood and Maximum Likelihoodclustal == Multiple Sequence Alignment with External Applicationscoalescent.intervals == Coalescent/溯祖 Intervalscollapse.singles == Collapse Single Nodescollapsed.intervals == Collapsed Coalescent Intervalscompar.cheverud == Cheverud's Comparative Methodcompar.gee == Comparative Analysis with GEEscompar.lynch == Lynch's Comparative Methodcompar.ou == Ornstein-Uhlenbeck Model for Continuous CharacterscomparePhylo == Compare Two "phylo" Objectscompute.brlen == Branch Lengths Computationcompute.brtime == Compute and Set Branching Timesconsensus == Concensus Treescophenetic.phylo == Pairwise Distances from a Phylogenetic Treecophyloplot == Plots two phylogenetic trees face to face with links between the tips.corBlomberg == Blomberg et al.'s Correlation StructurecorBrownian == Brownian Correlation StructurecorClasses == Phylogenetic Correlation StructurescorGrafen == Grafen's (1989) Correlation StructurecorMartins == Martins's (1997) Correlation StructurecorPagel == Pagel's "lambda" Correlation Structurecorphylo == Correlations among Multiple Traits with Phylogenetic Signalcorrelogram.formula == Phylogenetic Correlogram/相关图data.nex == NEXUS Data Exampledbd == Probability Density Under Birth-Death Modelsdef == Definition of Vectors for Plotting or Annotatingdegree == Vertex Degrees in Trees and Networksdel.gaps == Delete Alignment Gaps in DNA Sequencesdelta.plot == Delta Plotsdist.dna == Pairwise Distances from DNA Sequencesdist.gene == Pairwise Distances from Genetic Datadist.topo == Topological Distances Between Two Treesdiversi.gof == Tests of Constant Diversification Rates (恒定多样化比率)diversi.time == Analysis of Diversification with Survival Modelsdiversity.contrast.test == Diversity Contrast Testdnds == dN/dS Ratiodrop.tip == Remove Tips in a Phylogenetic Treeedges == Draw Additional Edges on a Plotted Treeevonet == Evolutionary NetworksewLasso == Incomplete distances and edge weights of unrooted topologygammaStat == Gamma-Statistic of Pybus and HarveygetAnnotationsGenBank == Read Annotations from GenBankhivtree == Phylogenetic Tree of 193 HIV-1 Sequenceshowmanytrees == Calculate Numbers of Phylogenetic Treesidentify.phylo == Graphical Identification of Nodes and Tipsimage.DNAbin == Plot of DNA Sequence Alignementis.binary == Test for Binary Treeis.compatible == Check Compatibility of Splitsis.monophyletic == Is Group Monophyletic(单源)is.ultrametric == Test if a Tree is Ultrametric/超度量/超矩阵kronoviz == Plot Multiple Chronograms(编年/计时图) on the Same Scalelabel2table == Label Managementladderize == Ladderize(阶梯化) a Treelatag2n == Leading and Trailing Alignment Gaps to Nlmorigin == Multiple regression through the originltt.plot == Lineages Through Time PlotmakeLabel == Label ManagementmakeNodeLabel == Makes Node Labelsmantel.test == Mantel Test for Similarity of Two Matricesmat3 == Three Matricesmat5M3ID == Five Treesmat5Mrand == Five Independent Treesmatexpo == Matrix Exponential/指数mcconwaysims.test == McConway-Sims Test of Homogeneous Diversificationmcmc.popsize == Reversible Jump MCMC to Infer Demographic/群体数量 HistorymixedFontLabel == Mixed Font Labels for Plottingmrca == Find Most Recent Common Ancestors Between Pairsmst == Minimum Spanning Treemulti2di == Collapse and Resolve Multichotomies/多歧分枝multiphylo == Manipulating Lists of Treesmvr == Minimum Variance Reductionnj == Neighbor-Joining Tree Estimationnjs == Tree Reconstruction from Incomplete Distances With NJ or bio-NJnode.dating == node.datingnode.depth == Depth and Heights of Nodes and Tipsnodelabels == Labelling the Nodes, Tips, and Edges of a Treenodepath == Find Paths of Nodesparafit == Test of host-parasite coevolution (宿主-寄主共进化)pcoa == Principal Coordinate Analysisphydataplot == Tree Annotationphymltest == Fits a Bunch of Models with PhyMLpic == Phylogenetically Independent Contrastspic.ortho == Phylogenetically Independent Orthonormal/标准正交 Contrastsplot.correlogram == Plot a Correlogram/相关图plot.phylo == Plot Phylogeniesplot.phylo.extra == Extra Fuctions to Plot and Annotate Phylogeniesplot.varcomp == Plot Variance ComponentsplotTreeTime == Plot Tree With Time Axisprint.phylo == Compact/密集 Display of a PhylogenyrDNAbin == Random DNA SequencesrTraitCont == Continuous Character SimulationrTraitDisc == Discrete Character SimulationrTraitMult == Multivariate/多变 Character Simulationread.GenBank == Read DNA Sequences from GenBank via Internetread.caic == Read Tree File in CAIC Formatread.dna == Read DNA Sequences in a Fileread.gff == Read GFF Filesread.nexus == Read Tree File in Nexus Formatread.nexus.data == Read Character Data In NEXUS Formatread.tree == Read Tree File in Parenthetic Formatreconstruct == Continuous Ancestral Character Estimation (连续祖先特征估计)reorder.phylo == Internal Reordering (内部重排) of Treesrichness.yule.test == Test of Diversification-Shift With the Yule Processrlineage == Tree Simulation Under the Time-Dependent Birth-Death Modelsroot == Roots Phylogenetic Treesrotate == Swapping/交换 Sister Cladesrtree == Generate Random Treesrtt == Root a Tree by Root-to-Tip Regressionseg.sites == Find Segregating/分隔 Sites in DNA Sequencesskyline == Skyline Plot Estimate of Effective Population Sizeskylineplot == Drawing Skyline Plot Graphsslowinskiguyer.test == Slowinski-Guyer Test of Homogeneous DiversificationsolveAmbiguousBases == Solve Ambiguous Bases in DNA SequencesspeciesTree == Species Tree Estimationstree == Generates Systematic Regular Trees (系统规则树)subtreeplot == Zoom on a Portion of a Phylogeny by Successive/连续 Clickssubtrees == All subtrees of a Phylogenetic Treesummary.phylo == Print Summary of a Phylogenytrans == Translation from DNA to Amino Acid SequencestreePop == Tree Popping/爆开trex == Tree Explorer With Multiple DevicestriangMtd == Tree Reconstruction Based on the Triangles/三角 Methodunique.multiPhylo == Revomes Duplicate TreesupdateLabel == Update LabelsvarCompPhylip == Variance Components with Orthonormal Contrastsvarcomp == Compute Variance Component Estimatesvcv == Phylogenetic Variance-covariance or Correlation Matrixvcv2phylo == Variance-Covariance Matrix to Treeweight.taxo == Define Similarity Matrixwhere == Find Patterns in DNA Sequenceswhich.edge == Identifies Edges of a Treewoodmouse == Cytochrome b Gene Sequences of Woodmicewrite.dna == Write DNA Sequences in a Filewrite.nexus == Write Tree File in Nexus Formatwrite.nexus.data == Write Character Data in NEXUS Formatwrite.tree == Write Tree File in Parenthetic Formatyule == Fits the Yule Model to a Phylogenetic Treeyule.cov == Fits the Yule Model With Covariatesyule.time == Fits the Time-Dependent Yule Modelzoom == Zoom on a Portion of a Phylogeny(附) 实战及示例数据简要测试R包apeape包用于系统发育和进化分析,提供了以下功能:1.读、写、操作、分析、模拟系统发育树及DNA序列;2.计算DNA距离;3.翻译为AA(氨基酸?)序列;4.基于距离法估计(进化)树;5.比较及多样性分析;6.自编程新的系统发育方法
文档:用library(help = ape)显示完整函数列表;案例代码 x:/Program Files/R/R-x.x.x/library/ape/doc;更多信息 https://cran.r-project.org/web/packages/ape/ape.pdf DOI: 10.1093/bioinformatics/btg412 doi: 10.1093/bioinformatics/bty633 https://hal.ird.fr/ird-01887318 http://ape-package.ird.fr/, https://github.com/emmanuelparadis/ape http://ape-package.ird.fr/ 等library(ape)# library(help = ape)ape包的read.dna函数:从文件中读取DNA序列,返回1个矩阵或DNA序列列表,分别将读取的分类群(Taxa)的名称作为rowname或names序列以二进制格式返回(默认),或小写字母(选项as.character = TRUE)# 读取序列ex.dna <- ape::read.dna("./result/pairsnp/core.Temporal.aln.fasta", format = "fasta", as.character=F)ex.dna.sub <- ape::read.dna("./result/pairsnp/core.Temporal.aln.noREF.fasta", format = "fasta", as.character=F)# 选项format:"fasta", "sequential", "clustal", "interleaved"(序列名含空格时), 或任何明确的缩写# 查看str(ex.dna)## 'DNAbin' raw [1:22, 1:984] g g g g ...## - attr(, "dimnames")=List of 2## ..$ : chr [1:22] "ERR2245277" "ERR2245279" "ERR2245288" "ERR2245293" ...## ..$ : NULLex.dna[1:5,1:5,drop=F]## 5 DNA sequences in binary format stored in a matrix.## ## All sequences of same length: 5 ## ## Labels:## ERR2245277## ERR2245279## ERR2245288## ERR2245293## ERR2245294## ## Base composition:## a c g t ## 0.0 0.6 0.4 0.0 ## (Total: 25 bases)alview(ex.dna, file = "", uppercase = TRUE, showpos = TRUE)## 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111122222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222223333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444455555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555556666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777788888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888889999999999999999999999999999999999999999999999999999999999999999999999999999999999999## 000000000111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999000000000011111111112222222222333333333344444444445555555555666666666677777777778888888888999999999900000000001111111111222222222233333333334444444444555555555566666666667777777777888888888899999999990000000000111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999000000000011111111112222222222333333333344444444445555555555666666666677777777778888888888999999999900000000001111111111222222222233333333334444444444555555555566666666667777777777888888888899999999990000000000111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999000000000011111111112222222222333333333344444444445555555555666666666677777777778888888888999999999900000000001111111111222222222233333333334444444444555555555566666666667777777777888888888899999999990000000000111111111122222222223333333333444444444455555555556666666666777777777788888## 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234## ERR2245277 GCCCGTCGCTGGTGGGTAATAATGAAGAGCGTGGTGCAGTCTAAGGGCGTATGGACCCTTGCGCCCACTCTACCGTGCGGACGTCTCACTCTATATGCTCGACACCGTTCGTTCGACTCGCCCCCGTAGGCTCGGCCACAAGTGCCGTCGCAGGATTCAGCTCGAGCGCCCGAGCAGATCCAACCCCTTCTTCCTTGGGGTTCCTCGGTTGCTCGAATTCGCTACGCCCCCAGGGCAGATCCAGCGCTCGCAGGGTAACGGTCGTACCAGGGGTAAGTGGCACCAAGCAGCAGGGTCCGGGACCTGCACGCCGGACCGGCGCTACCTCGCTTCTGCGTTGGTCGTCTCATTGATCGCCGTTCCACGTAACCCAGCTATGCTAGCAGACTCACAAACTGGGCGACATCGATACCCCCAGACTTTTCACTGGCTCAGATGTCAAGCACGGACTGGTGACTCAGACTACCATTTTTTTGGCTCCTTGGTGTCCTAACCGCGCACGAAGCCGGGCTTACTGACACATGGTGCGCAGCTACGACGCAAGAAACATCGGAACCGCGCGCCTACGCGTAAAGAAGGTGGGGAAGGCCGTTCAGGCGTGGCTGTGACGCCATAAGAGGTCCAGGCGCAGCATGGGCCCTGCGTACAGAGGACCCCAAAAAAGAAAGGTGTAAGCCCCTCGTGATTGGCGTGTTCACAGCAACCGGGAGGGCGTACCGTCGGGACTCCAAACATCCCTGGATGTTCGTGTGCGCCCGGGAGTGCCCCTAGCGGCGCATAGTCCTCGCGGCTCAGTCAATTTCACTTCCTTGCATCTATGGGGGTCCCGCATTTTTGGTTAGCCGCGCGAGGGTCACCCAACGTTTGATATTGCCGTAGCTAGTGAGTGATGTGCCTTCCCCTGTCAACGATGCGAATCGTACGTAGGGGCGGGGTGCTGCTAGTAGGGAACATTGAAAAAAAATGCAATGTACTGGCCAAAGC## ERR2245279 ...........................................................C.....................................................................................................................G.............................................G..............................................................................................................................................................C.....................................................................................C...................................................G........................................................................................................A......................................G........G...............................A............A..................................................G...........................................................G...............A..............................................C...........................................................................................## ERR2245288 ........A.....................AC......A...G..........T.....C...G.......................G.............G...............C...............A...G...................T...................G.......G.G......................C............G...T..............G.......................T................................AC...............T...............T.......................A.A...A.......T...........C.......................G......A.....A................................A.C.............C..............................G....................G...........................C................G...............A.............A.............A..............CA..G.............................G.....GG.....C.G...TT.....G.............G......A...A........A......T...........................................G..T.T.......................G...........C....T..C..........G...............A.........G..A.......A..G..C...................C.......G..C.T..................................................G..........C................## ERR2245293 ........A......................C...........................C...GT...........T..........G.............G....A..........C...............A...G..................G....................G.........G......................C............GT............A....G.......................T..A.............................AC...............................T.......A...............A.A..TA.......T...........C.......................G............A..................................C.............C......A.......................G....................G..A........................C................G...............................TA..........A...............AA.G.............................G.....GG.....C.G...TT.....G.............G....T.A...A........A..............G................................T..G..T.T..G..............T.....G...........C....T..C......T...G.......A...A...AA........G.............G..C.....A.....T.......C.......G..C.T...........................A......................G..........C................## ERR2245294 ...............................C...........................C...G.....................................G...........T...............................................................G.....................T.......................G.A..............T..............................................................................................T......................A.......................C.....................C................................................C..............C...........T.......................................G...........................................C............................................................A......................................G......C.G..........G............TG......A............A...............C..................................G................................................C..........G...............A..........................C...................C.......G.............................................A....................C................## ERR2245295 ...........................................................C.....................................................................................................................G.............................................G................................................................................................................................................C.............C.....................................................................................C...................................................G........................................................................................................A......................................G........G...............................A............A..................................................G...........................................................G...............A..............................................C...........................................................................................## ERR2245344 ..........................AC.T.C.........................T.C...G.....................................G.......T.......C........................................G...........T......G..G......G.T.................................G............................T.............T..................G..........................TT.......T.......T...................C....A...A.......................C..............A....G.................................C.................C.............C..............................G....................G...............A..........T.......................................T.....................................A....T..................G..............G......C.G....T.....G.............G......A.......A....A..................................................G.G.................A............................C..........G...............A......................CG..C......T............C.......G..C...................................A.A..............G..........C................## ERR2245378 ...............................C...........................C.....................................................................................................................G.............................................G......................................................................................................................................A.....G.................C.............................T.......................................................C...................................................G......................A.......T.......A.................................................................A......................................G........G...............................A......G.....A..................................................G....................T................T.....................G...............A..........................C...................C.........................................................................................C.## ERR2245380 ........................G......C.C.........................C...G..................................G..G...............C.........................A...........................A.....G.........G...................................G.................AG.......T...............T......................A...................................T................................A...A..........A........C.......G.........G..................A....T.........T...................C.............C.................A..........A.G....................G...................T...............................................A....................................A..G...................................GG.....C.G....T.....G......G......G......A............A..................................................G..T.....................A.A.............C.......C..........G...............A...................G...G..C...................C.......G..C........A...........................................G...C......C................## ERR2245386 ...............................C...........................CA..G.................T...................G...........................................................................G.............................A...............G..............................................A............G..........................................................................A.......................C..........................G................T.........................................C...................................................G..............................................A.....................................................T...A......................................G......C.G..........G...A.........G......A.A..........A..................................................G................................................C....A.....G...............A...............G..........C...................C.......G............................A..A..................................C................## ERR2245388 ...........................................................C.....................................................................................................................G.............................................G..............................................................................................................................................................C.....................................................................................C...................................................G........................................................................................................A......................................G........G...............................A............A..................................................G...........................................................G...............A..............................................C...........................................................................................## ERR2245393 T.......A......................C...........................C...G....................T..G.............G...............C...............A...G...................T...................G.......G.G......................C.......G....G..................G.................A.....T................................AC...............................T.......................A.A...A.......T...........C.......................G......A.....A.................A..............A.C.............C..............................G....................G...........................C................G...T..........T............................A.........A.....A..G..........................A..G.....GG.....C.G...TT.....G.............G......A...A........A........T..............G..........................G..T.T.......................G...........C....T..C..........G..C............A.........G.............G..C..................AC.......G..C.T..................................................G..........C........A.......## ERR2245400 ...............................C...........................C.....................................................................................................................G.............................................G......................................................................................................................................A.....G.................C.............................T......................................A................C.................................................T.G..............................T.......A.................................................................A......................................G........G...............................A......G.....A..................................................G....................T................T.....................G...............A..........................C...................C.........................................................................................C.## ERR2245401 ........................G......C.C.........................C...G..................................G..G...............C.........................A...........................A.....G.........G...................................G.................AG.......................T......................A....................................................................A...A...................C.......G.........G..................A..............T...................C..........T..C............................A.G....................G..............................................................T.........................................A..G......................T............GG.....C.G....T.....G......G......G......A............A..................................................G..T.....................................C.......C..........G...............A.......................G..C...................C.......G..C........A...........................................G..........C................## ERR2245402 .....C.........................CA..........................C...G...............A.....................G...............................................................A...........G.............................................G.........................................................................A.........................................C...............A..A.......................CA......................................................G.............................C..........T........A...............................G.....................................A..................................................................A.....T................................G...T..C.G..........G.............G......A............A...........................T....C...........A.....G......G.....A...................................C...A.C....G...............A..........................C...................C.......G..................................................................C................## ERR2245406 ........................G......C.C.........................C...G..................................G..G...............C.........................A...........................A.....G.........G...................................G.................AG.......................T......................A....................................................................A...A...................C.......G.........G..................A..............T...................C..........T..C............................A.G....................G..............................................................T..........................T..............A..G......................T............GG.....C.G....T.....G......G......G......A............A..................................................G..T.....................................C.......C..........G...............A.......................G..C...................C.......G..C........A...........................................G..........C................## ERR2245411 .............................T.C.........................T.C...G.........T...............C...........G.......T.......C...........................................................G..G......G.T.................................G............................T.............T..................G...................T.......T.......T.....T.....................C....A...A.......................C...........G.......G.................................C.................C.............C..............................G....................G..........................T.............................................................................A....T.................................G......C.G....T.....G.............G.....TA............A........................A........G................G.G.................A.............G..............C..........G...............A......................CG..C...................C.......G..C...................................A................G..........C................## ERR2245418 .............................T.C.........................T.C...G.........T...............C...........G.......T.......C...........................................................G..G......G.T.................................G............................T.............T..................G...................T.......T.......T...........................C....A...A.......................C...........G.......G.................................C.................C.............C..............................G....................G..........................T.............................................................................A....T.................................G......C.G....T.....G.............G.....TA............A........................A........G................G.G.................A.............G..............C..........G...............A......................CG..C...................C.......G..C...................................A................G..........C................## ERR2245420 ........................G......C.C.........................C...G..................................G..G...............C.........................A...........................A.....G.........G...................................G.................AG.......................T......................A....................................................................A...A...................C.......G.........G..................A..............T...................C..........T..C............................A.G....................G..............................................................T.........................................A..G......................T............GG.....C.G....T.....G......G......G......A............A..................................................G..T.....................................C.......C..........G...............A.......................G..C...................C.......G..C........A...........................................G..........C................## ERR2245425 .............................T.C.........................T.C...G.........T...............C...........G.......T.......C...........................................................G..G......G.T.................................G............................T.............T..................G...................T.......T.......T...........................C....A...A.......................C...........G.......G.................................C.................C.............C..............................G....................G..........................T.............................................................................A....T.................................G......C.G....T.....G.............G.....TA............A........................A........G................G.G.................A.............G..............C..........G...............A......................CG..C...................C.......G..C...................................A................G..........C................## ERR2512377 ....................C......................................C...........C..C.............................T..............................................G.........................G.............................................G..................G.T.............................G..CA..............................A................................T.......C........C....................C.C.....C................C............G.........................G.....................T.C...............................................T...G...A.............G.........C................................................................C...........A................A...........T.........G........G...............................A............A..................................................G.............T.............................................G...............A..............................................C................A..........C............................G.........G....................C...## Reference .TGGA.TC.GCCCCAAGGGCCGGA.C..T..C..CATG.CTC.GCCATACGGT.GTT.C..TCG.TGTAACCG.CC.TC.C.AG.CTGG.GCCCGGCA.TAGTGTT.CG.ACC.TGACGCGAAGTTGGTCACG.CGT.TCCCG.TAACTCTGTTGC..GTTCTAG.TCTT..GATCA.CT.CGTG.TGC.CCGTCCCAA.GGTTGTC.CC.TGTCCGG.TATC...G.GGAGCCATG.GC..GCTATCTA.C.ATCTGTT.CTAGCTGC..TACGTACAATCT.T.AAG.ACAAACT.A.CGGTC.TGGATA..GG.AATT.CCT.C.T.CC.CC.ACC..CTCCACAG.CACG...TACC.AC.TCG.T.AG.TCGCCGCC..CAGTCA.TGG.TCAAA.A.T.C.AG.GG..TTGCGAGCAC.C.CCCTCGC.G...GGGATGGATGGC...CTTCAGTGTCG.TG.GCACCC.AAC..CCCTC.C.AGCGTAAA.GGAACCTGTACTACCCTCTG.G.GC.AGATAGGCGCGG.CGC.GG.GGC.CGG.CGGAGC..GATG..T.T.CGGCAGCAAC..A.CCA..TACCGG.T..CACTCTCCGA.TTGCCCAGA.C.TC...AGG..CCAAATGGCAAACGTGC.T.TTT.GGGTGGGT.GCC.GACG.ATA.TCTACAGCCA.TC.ACAGG.GCGGG...A.C.TA..GTGA.ATAAAC.C.AGGCA..AGGCCAG..CC.CCAC..TGTGCACGAC.GG.AC.A.TC.A..GGAC.TCTAAT..G.G.C.GGCGCG.GGC.GCCTTGGTCGCAAT....GAAGCG.ACCA.CCG.TGA..TAGCTACG.TT.GG.ACC..G.GCCCGTACG..CGTC.GAGAGG..CTTCGTTGTCCG.GGGACC.TAGGGGCCGTAACCAA.GA..CCATC.T.GACG.CAGGTGCCAGCGG.GGGGCACGGCCCTGC.TGGCC.Gimage(ex.dna[, 1:100], cex.lab = 0.5, cex.axis = 0.7)checkAlignment(ex.dna, check.gaps = TRUE, plot = TRUE, what = 1:4)## ## Number of sequences: 22 ## Number of sites: 984 ## ## No gap in alignment.## ## Number of segregating sites (including gaps): 984## Number of sites with at least one substitution: 984## Number of sites with 1, 2, 3 or 4 observed bases:## 1 2 3 4 ## 0 984 0 0由fasta到DNAbin的另1个可选方法# adegenet包的fasta2DNAbin函数:读取fasta格式的(序列)对齐,扩展名 .fasta .fa .fas# 输出DNAbin对象 (来自ape包的高效DNA表示),含完整的序列对齐(Alignment) 或 SNP(参数`snpOnly`)# 此函数可提高内存效率,并且可以读取比Ape包的read.dna函数更大的数据集ex.dna.f2d <- adegenet::fasta2DNAbin("./result/pairsnp/core.Temporal.aln.fasta", quiet=FALSE, chunkSize=10, snpOnly=FALSE)## ## Converting FASTA alignment into a DNAbin object... ## ## ## Finding the size of a single genome... ## ## ## genome size is: 984 nucleotides ## ## ( 2 lines per genome )## ## Importing sequences... ## ............................................................## Forming final object... ## ## ...done.str(ex.dna.f2d) # DNA sequences in binary format stored in a matrix## 'DNAbin' raw [1:22, 1:984] g g g g ...## - attr(, "dimnames")=List of 2## ..$ : chr [1:22] "ERR2245277" "ERR2245279" "ERR2245288" "ERR2245293" ...## ..$ : NULLidentical(ex.dna.f2d, ex.dna)## [1] TRUEall.equal.DNAbin(ex.dna.f2d, ex.dna, plot=T)## [1] TRUEidentical(ex.dna.sub, ex.dna)## [1] FALSEall.equal.DNAbin(ex.dna.sub, ex.dna, plot=T)## [1] "Number of sequences different:" ## [2] "21 sequences in 1st object; 22 sequences in 2nd object."## [3] "Subset your data for further comparison."ape的更多函数:可在文档(ape.pdf)中搜索更多的应用,e.g. 搜索”DNAbin” (查看能分析DNAbin的所有函数) https://cran.r-project.org/web/packages/ape/ape.pdf# 其它函数:read.fastq可下载、读取Fastq;read.GenBank;read.gff;as.DNAbin;dist.dna;clustal/muscle(多序列比对)# dnds dN/dS Ratio# dnds(ex.dna, code = 1, codonstart = 1, quiet = FALSE, details = FALSE, return.categories = FALSE)# Error: sequences are not unique# data(woodmouse) # Woodmice的细胞色素b的基因序列# res <- dnds(woodmouse, quiet = TRUE) # NOT correct# 多序列比对图# image.DNAbin Plot of DNA Sequence Alignement# image(woodmouse)# rug(seg.sites(woodmouse), -0.02, 3, 1)# image(woodmouse, "n", "blue") # show missing data# image(woodmouse, c("g", "c"), "green") # G+C# par(mfcol = c(2, 2))# ### barcoding style:# for (x in c("a", "g", "c", "t"))# image(woodmouse, x, "black", cex.lab = 0.5, cex.axis = 0.7)# par(mfcol = c(1, 1))# ### zoom on a portion of the data:# image(woodmouse[11:15, 1:50], c("a", "n"), c("blue", "grey"))# grid(50, 5, col = "black")# ### see the guanines on a black background:# image(woodmouse, "g", "yellow", "black")# 多序列比对图 注释到 进化树上# phydataplot(x, phy, style = "bars", offset = 1, scaling = 1, continuous = FALSE, width = NULL, legend = "below",# funcol = rainbow, ...)# ring(x, phy, style = "ring", offset = 1, ...)# If you want to plot a DNA alignment in the same way than image.DNAbin, try style = "mosaic".## plot an alignment with a NJ tree:data(woodmouse)trw <- nj(dist.dna(woodmouse))plot(trw, x.lim = 0.1, align.tip = TRUE, font = 1)phydataplot(woodmouse[, 1:100], trw, "m", 0.02, border = NA)# (序列)标签管理:character、DNAbin、phylo、multiPhylo# makeLabel(x, ...)# ## S3 method for class 'character'# makeLabel(x, len = 99, space = "_", make.unique = TRUE,# illegal = "():;,[]", quote = FALSE, ...)# ## S3 method for class 'phylo'# makeLabel(x, tips = TRUE, nodes = TRUE, ...)# ## S3 method for class 'multiPhylo'# makeLabel(x, tips = TRUE, nodes = TRUE, ...)# ## S3 method for class 'DNAbin'# makeLabel(x, ...)# updateLabel - Update Labels# where - Find Patterns in DNA Sequences# nexus2DNAbin - Convert the output from the previous function into the class "DNAbin".# write.nexus.data - Write Character Data in NEXUS Format# 获得 class ‘phylo’# tree1 <- read.tree(‘tree1.txt’)# This stores the phylogenetic tree is in an object named tree1 of class ‘phylo’. # write.tree Write Tree File in Parenthetic Formattrw <- bionj(dist.dna(ex.dna, model = "raw"))plot(trw)str(trw)## List of 4## $ edge : int [1:41, 1:2] 23 25 26 37 37 26 25 27 28 32 ...## $ edge.length: num [1:41] 6.78e-05 1.76e-04 1.95e-02 7.50e-01 1.42e-02 ...## $ tip.label : chr [1:22] "ERR2245277" "ERR2245279" "ERR2245288" "ERR2245293" ...## $ Nnode : int 20## - attr(, "class")= chr "phylo"## - attr(, "order")= chr "cladewise"write.tree(trw, file = "./tree.txt", append = FALSE, digits = 10, tree.names = FALSE)# phy - an object of class "phylo" or "multiPhylo".# mcmc.popsize函数 - 可逆跳跃MCMC (马尔可夫链蒙特卡罗) 推断群体数量史 (Demographic History)# get treedata("hivtree.newick") # example tree in NH formattree.hiv <- read.tree(text = hivtree.newick) # load tree# run mcmc chainmcmc.out <- mcmc.popsize(tree.hiv, nstep=100, thinning=1, burn.in=0,progress.bar=FALSE) # toy run#mcmc.out <- mcmc.popsize(tree.hiv, nstep=10000, thinning=5, burn.in=500) # remove comments!!# make list of population size versus timepopsize <- extract.popsize(mcmc.out)# plot and compare with skyline plotsk <- skyline(tree.hiv)plot(sk, lwd=1, lty=3, show.years=TRUE, subst.rate=0.0023, present.year = 1997)lines(popsize, show.years=TRUE, subst.rate=0.0023, present.year = 1997)# skyline函数 - 使用天际线图估计有效种群大小(Effective Population Size)# get treedata("hivtree.newick") # example tree in NH formattree.hiv <- read.tree(text = hivtree.newick) # load tree# corresponding coalescent intervalsci <- coalescent.intervals(tree.hiv) # from tree# collapsed intervalscl1 <- collapsed.intervals(ci,0)cl2 <- collapsed.intervals(ci,0.0119)#### classic skyline plot ####sk1 <- skyline(cl1) # from collapsed intervalssk1 <- skyline(ci) # from coalescent intervalssk1 <- skyline(tree.hiv) # from treesk1## $time## [1] 0.021161 0.049822 0.050058 0.054444 0.057580 0.057620 0.058582 0.059336## [9] 0.059726 0.061847 0.063012 0.065150 0.065542 0.067767 0.068246 0.068285## [17] 0.069341 0.069631 0.069725 0.070130 0.070506 0.071398 0.077168 0.077774## [25] 0.078085 0.078439 0.078488 0.079791 0.079902 0.080093 0.080206 0.080907## [33] 0.080919 0.080923 0.081266 0.081732 0.082433 0.082434 0.082720 0.083124## [41] 0.083172 0.083436 0.083818 0.084541 0.084566 0.084629 0.085711 0.086004## [49] 0.087462 0.087519 0.087787 0.088609 0.088644 0.088785 0.088823 0.089216## [57] 0.089479 0.089540 0.090237 0.090248 0.090579 0.090777 0.091414 0.091575## [65] 0.092297 0.092865 0.092881 0.093773 0.094006 0.094130 0.094174 0.094384## [73] 0.094521 0.094683 0.094749 0.095068 0.095107 0.095333 0.095334 0.095369## [81] 0.096163 0.096270 0.096432 0.096562 0.097150 0.097278 0.097341 0.097342## [89] 0.099386 0.099467 0.099582 0.099726 0.099948 0.101084 0.101878 0.101892## [97] 0.101961 0.101962 0.102399 0.102590 0.103060 0.103714 0.103878 0.103926## [105] 0.104386 0.104484 0.104691 0.105017 0.105018 0.105020 0.105181 0.106054## [113] 0.106195 0.107063 0.107287 0.107629 0.108329 0.108551 0.108702 0.109333## [121] 0.109996 0.111546 0.111869 0.112580 0.112666 0.113317 0.113663 0.113664## [129] 0.113731 0.113986 0.114132 0.114392 0.114561 0.114632 0.114774 0.116003## [137] 0.116109 0.117194 0.117310 0.117514 0.117728 0.118538 0.118650 0.118782## [145] 0.118990 0.119083 0.120265 0.120408 0.120491 0.120494 0.122054 0.122422## [153] 0.122572 0.123109 0.124283 0.125240 0.127902 0.127903 0.127904 0.127999## [161] 0.129007 0.130366 0.130367 0.130470 0.131217 0.131462 0.132291 0.135095## [169] 0.135794 0.135795 0.135914 0.140307 0.143055 0.144389 0.146425 0.148331## [177] 0.150189 0.151362 0.155704 0.164770 0.166301 0.166846 0.176701 0.178108## [185] 0.179167 0.181334 0.192334 0.196999 0.197000 0.204899 0.204900 0.209112## ## $interval.length## [1] 0.021161 0.028661 0.000236 0.004386 0.003136 0.000040 0.000962 0.000754## [9] 0.000390 0.002121 0.001165 0.002138 0.000392 0.002225 0.000479 0.000039## [17] 0.001056 0.000290 0.000094 0.000405 0.000376 0.000892 0.005770 0.000606## [25] 0.000311 0.000354 0.000049 0.001303 0.000111 0.000191 0.000113 0.000701## [33] 0.000012 0.000004 0.000343 0.000466 0.000701 0.000001 0.000286 0.000404## [41] 0.000048 0.000264 0.000382 0.000723 0.000025 0.000063 0.001082 0.000293## [49] 0.001458 0.000057 0.000268 0.000822 0.000035 0.000141 0.000038 0.000393## [57] 0.000263 0.000061 0.000697 0.000011 0.000331 0.000198 0.000637 0.000161## [65] 0.000722 0.000568 0.000016 0.000892 0.000233 0.000124 0.000044 0.000210## [73] 0.000137 0.000162 0.000066 0.000319 0.000039 0.000226 0.000001 0.000035## [81] 0.000794 0.000107 0.000162 0.000130 0.000588 0.000128 0.000063 0.000001## [89] 0.002044 0.000081 0.000115 0.000144 0.000222 0.001136 0.000794 0.000014## [97] 0.000069 0.000001 0.000437 0.000191 0.000470 0.000654 0.000164 0.000048## [105] 0.000460 0.000098 0.000207 0.000326 0.000001 0.000002 0.000161 0.000873## [113] 0.000141 0.000868 0.000224 0.000342 0.000700 0.000222 0.000151 0.000631## [121] 0.000663 0.001550 0.000323 0.000711 0.000086 0.000651 0.000346 0.000001## [129] 0.000067 0.000255 0.000146 0.000260 0.000169 0.000071 0.000142 0.001229## [137] 0.000106 0.001085 0.000116 0.000204 0.000214 0.000810 0.000112 0.000132## [145] 0.000208 0.000093 0.001182 0.000143 0.000083 0.000003 0.001560 0.000368## [153] 0.000150 0.000537 0.001174 0.000957 0.002662 0.000001 0.000001 0.000095## [161] 0.001008 0.001359 0.000001 0.000103 0.000747 0.000245 0.000829 0.002804## [169] 0.000699 0.000001 0.000119 0.004393 0.002748 0.001334 0.002036 0.001906## [177] 0.001858 0.001173 0.004342 0.009066 0.001531 0.000545 0.009855 0.001407## [185] 0.001059 0.002167 0.011000 0.004665 0.000001 0.007899 0.000001 0.004212## ## $population.size## [1] 392.071008 525.528096 4.282220 78.750630 55.714176 0.703120## [7] 16.730142 12.972570 6.637800 35.709156 19.400745 35.214998## [13] 6.385680 35.844750 7.630949 0.614367 16.448256 4.466000## [19] 1.431150 6.095655 5.594128 13.117752 83.866950 8.705190## [25] 4.414956 4.965912 0.679189 17.844585 1.501830 2.552906## [31] 1.491939 9.141741 0.154560 0.050880 4.308423 5.779798## [37] 8.584446 0.012090 3.413410 4.759524 0.558144 3.029664## [43] 4.326150 8.079525 0.275650 0.685314 11.610942 3.101405## [49] 15.221520 0.586872 2.721004 8.229042 0.345450 1.371930## [55] 0.364458 3.715029 2.450108 0.559980 6.304365 0.098021## [61] 2.905518 1.711908 5.424055 1.349985 5.960832 4.616704## [67] 0.128016 7.024500 1.805750 0.945624 0.330132 1.550010## [73] 0.994620 1.156680 0.463386 2.202057 0.264654 1.507420## [79] 0.006555 0.225435 5.024432 0.665112 0.989010 0.779350## [85] 3.460968 0.739584 0.357273 0.005565 11.160240 0.433836## [91] 0.604095 0.741744 1.121100 5.623200 3.851694 0.066542## [97] 0.321264 0.004560 1.951205 0.834861 2.010660 2.737644## [103] 0.671580 0.192240 1.801360 0.375144 0.774387 1.191530## [109] 0.003570 0.006972 0.547883 2.899233 0.456840 2.742880## [115] 0.690144 1.027026 2.048200 0.632700 0.419025 1.704331## [121] 1.742364 3.961800 0.802655 1.717065 0.201756 1.482978## [127] 0.765006 0.002145 0.139360 0.514080 0.285138 0.491660## [133] 0.309270 0.125670 0.242962 2.031537 0.169176 1.670900## [139] 0.172260 0.291924 0.294892 1.074060 0.142800 0.161700## [145] 0.244608 0.104904 1.277742 0.148005 0.082170 0.002838## [151] 1.408680 0.316848 0.123000 0.418860 0.869934 0.672771## [157] 1.772892 0.000630 0.000595 0.053295 0.532224 0.674064## [163] 0.000465 0.044805 0.303282 0.092610 0.290979 0.911300## [169] 0.209700 0.000276 0.030107 1.014783 0.577080 0.253460## [175] 0.348156 0.291618 0.252688 0.140760 0.455910 0.825006## [181] 0.119418 0.035970 0.542025 0.063315 0.038124 0.060676## [187] 0.231000 0.069975 0.000010 0.047394 0.000003 0.004212## ## $parameter.count## [1] 192## ## $epsilon## [1] 0## ## $logL## [1] 1408.966## ## $logL.AICc## [1] NA## ## attr(,"class")## [1] "skyline"plot(skyline(tree.hiv))skylineplot(tree.hiv) # shortcutplot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997)#### generalized skyline plot ####sk2 <- skyline(cl2) # from collapsed intervalssk2 <- skyline(ci, 0.0119) # from coalescent intervalssk2 <- skyline(tree.hiv, 0.0119) # from treesk2## $time## [1] 0.021161 0.049822 0.061847 0.077168 0.089216 0.101878 0.113986 0.127902## [9] 0.140307 0.155704 0.176701 0.192334 0.209112## ## $interval.length## [1] 0.021161 0.028661 0.012025 0.015321 0.012048 0.012662 0.012108 0.013916## [9] 0.012405 0.015397 0.020997 0.015633 0.016778## ## $population.size## [1] 392.07100800 525.52809600 26.43747675 18.16241385 4.32071145## [6] 2.19342354 1.06974257 0.55211856 0.27727433 0.33138171## [11] 0.38060475 0.09827875 0.02431880## ## $parameter.count## [1] 13## ## $epsilon## [1] 0.0119## ## $logL## [1] 1239.478## ## $logL.AICc## [1] 1225.456## ## attr(,"class")## [1] "skyline"plot(sk2)# classic and generalized skyline plot together in one plotplot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997, col=c(grey(.8),1))lines(sk2, show.years=TRUE, subst.rate=0.0023, present.year = 1997)legend(.15,500, c("classic", "generalized"), col=c(grey(.8),1),lty=1)# find optimal epsilon parameter using AICc criterionfind.skyline.epsilon(ci)## Searching for the optimal epsilon... epsilon = 0.01191938## [1] 0.01191938sk3 <- skyline(ci, -1) # negative epsilon also triggers estimation of epsilon## Searching for the optimal epsilon... epsilon = 0.01191938sk3$epsilon## [1] 0.01191938# skylineplot函数 - 绘制天际线图# get treedata("hivtree.newick") # example tree in NH formattree.hiv <- read.tree(text = hivtree.newick) # load tree#### classic skyline plotskylineplot(tree.hiv) # shortcut#### plot classic and generalized skyline plots and estimate epsilonsk.opt <- skylineplot.deluxe(tree.hiv)## Searching for the optimal epsilon... epsilon = 0.01191938sk.opt$epsilon## [1] 0.01191938#### classic and generalized skyline plot ####sk1 <- skyline(tree.hiv)sk2 <- skyline(tree.hiv, 0.0119)# use years rather than substitutions as unit for the time axisplot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997, col=c(grey(.8),1))lines(sk2, show.years=TRUE, subst.rate=0.0023, present.year = 1997)legend(.15,500, c("classic", "generalized"), col=c(grey(.8),1),lty=1)#### various skyline plots for different epsilonslayout(mat= matrix(1:6,2,3,byrow=TRUE))ci <- coalescent.intervals(tree.hiv)plot(skyline(ci, 0.0));title(main="0.0")plot(skyline(ci, 0.007));title(main="0.007")plot(skyline(ci, 0.0119),col=4);title(main="0.0119")plot(skyline(ci, 0.02));title(main="0.02")plot(skyline(ci, 0.05));title(main="0.05")plot(skyline(ci, 0.1));title(main="0.1")layout(mat= matrix(1:1,1,1,byrow=TRUE))# 其它有关R包:Felsenstein, J. (1993) Phylip (Phylogeny Inference Package) version 3.5c. http://evolution.genetics.washington.edu/phylip/phylip.htmlape包的dist.dna函数:计算DNA序列成对距离 (Pairwise Distances from DNA Sequences)输入含DNA序列的矩阵或列表(List),类(Class)必须是DNAbin
若DNA序列被存储为字符(Character),可使用as.DNAbin函数来转换dna.bin.dist = ape::dist.dna(ex.dna, model = "N", variance = FALSE, gamma = FALSE, pairwise.deletion = FALSE, base.freq = NULL, as.matrix = F)dna.bin.dist.sub = ape::dist.dna(ex.dna.sub, model = "N", variance = FALSE, gamma = FALSE, pairwise.deletion = FALSE, base.freq = NULL, as.matrix = F)dna.bin.dist.f2d = ape::dist.dna(ex.dna.f2d, model = "N", variance = FALSE, gamma = FALSE, pairwise.deletion = FALSE, base.freq = NULL, as.matrix = F)# as.matrix:对角线或方矩阵# model:"raw", "N", "TS", "TV", "JC69", "K80" (the default), "F81", "K81", "F84", "BH87", "T92", "TN93", "GG95", "logdet", "paralin", "indel", or "indelblock"# raw, N: 每对序列之间差异位点的比例,或数量
有助于绘制“饱和图(Saturation plots)”
选项variance(方差)和gamma对其没有影响,但pairwise.deletion可以# 故model为"N"时,返回整数 (SNP距离绝对值)str(dna.bin.dist) # 'dist'
## 'dist' num [1:231] 15 80 90 38 16 67 27 66 41 15 ...## - attr(, "Size")= int 22## - attr(, "Labels")= chr [1:22] "ERR2245277" "ERR2245279" "ERR2245288" "ERR2245293" ...## - attr(, "Upper")= logi FALSE## - attr(, "Diag")= logi FALSE## - attr(, "call")= language ape::dist.dna(x = ex.dna, model = "N", variance = FALSE, gamma = FALSE, pairwise.deletion = FALSE, base.freq| __truncated__## - attr(, "method")= chr "N"dna.bin.dist # 是否为汉明距离 (序列之间核苷酸差异的数量)?## ERR2245277 ERR2245279 ERR2245288 ERR2245293 ERR2245294 ERR2245295## ERR2245279 15 ## ERR2245288 80 65 ## ERR2245293 90 75 42 ## ERR2245294 38 23 66 76 ## ERR2245295 16 1 66 76 24 ## ERR2245344 67 52 77 87 53 53## ERR2245378 27 12 71 81 29 13## ERR2245380 66 51 62 72 52 52## ERR2245386 41 26 69 79 27 27## ERR2245388 15 0 65 75 23 1## ERR2245393 82 67 26 44 68 68## ERR2245400 28 13 72 82 30 14## ERR2245401 58 43 54 64 44 44## ERR2245402 47 32 75 85 33 33## ERR2245406 59 44 55 65 45 45## ERR2245411 63 48 73 83 49 49## ERR2245418 62 47 72 82 48 48## ERR2245420 58 43 54 64 44 44## ERR2245425 62 47 72 82 48 48## ERR2512377 48 33 94 104 56 34## Reference 772 785 784 794 786 786## ERR2245344 ERR2245378 ERR2245380 ERR2245386 ERR2245388 ERR2245393## ERR2245279 ## ERR2245288 ## ERR2245293 ## ERR2245294 ## ERR2245295 ## ERR2245344 ## ERR2245378 58 ## ERR2245380 63 57 ## ERR2245386 56 32 55 ## ERR2245388 52 12 51 26 ## ERR2245393 79 73 64 71 67 ## ERR2245400 59 3 58 33 13 74## ERR2245401 55 49 14 47 43 56## ERR2245402 62 38 61 36 32 77## ERR2245406 56 50 15 48 44 57## ERR2245411 22 54 59 52 48 75## ERR2245418 21 53 58 51 47 74## ERR2245420 55 49 14 47 43 56## ERR2245425 21 53 58 51 47 74## ERR2512377 85 45 82 59 33 96## Reference 793 791 782 789 785 786## ERR2245400 ERR2245401 ERR2245402 ERR2245406 ERR2245411 ERR2245418## ERR2245279 ## ERR2245288 ## ERR2245293 ## ERR2245294 ## ERR2245295 ## ERR2245344 ## ERR2245378 ## ERR2245380 ## ERR2245386 ## ERR2245388 ## ERR2245393 ## ERR2245400 ## ERR2245401 50 ## ERR2245402 39 53 ## ERR2245406 51 1 54 ## ERR2245411 55 51 58 52 ## ERR2245418 54 50 57 51 1 ## ERR2245420 50 0 53 1 51 50## ERR2245425 54 50 57 51 1 0## ERR2512377 46 74 65 75 81 80## Reference 792 774 795 773 793 792## ERR2245420 ERR2245425 ERR2512377## ERR2245279 ## ERR2245288 ## ERR2245293 ## ERR2245294 ## ERR2245295 ## ERR2245344 ## ERR2245378 ## ERR2245380 ## ERR2245386 ## ERR2245388 ## ERR2245393 ## ERR2245400 ## ERR2245401 ## ERR2245402 ## ERR2245406 ## ERR2245411 ## ERR2245418 ## ERR2245420 ## ERR2245425 50 ## ERR2512377 74 80 ## Reference 774 792 752# δ plotdelta.plot(dna.bin.dist, k = 20, plot = TRUE, which = 1:2)# Holland, B. R., Huber, K. T., Dress, A. and Moulton, V. (2002) Delta plots: a tool for analyzing phylogenetic distance data. Molecular Biology and Evolution, 12, 2051–2059.ex.dna=ex.dna.sub# Tree Estimation Based on an Improved Version of the NJ Algorithm, Gascuel (1997)# trw <- bionj(dist.dna(ex.dna.sub))# trw <- bionj(dist.dna(ex.dna)) # 默认参数时 报错,可能是因为存在距离较远的Taxa或毒株# trw <- bionj(dist.dna(ex.dna, model = "N")) # Error - at least one distance was greater than 100trw <- bionj(dist.dna(ex.dna, model = "raw"))plot(trw)# f <- function(x) nj(dist.dna(x, model = "raw"))# tr <- f(ex.dna.sub)# ### Are bootstrap values stable?# for (i in 1:5)# print(boot.phylo(tr, ex.dna.sub, f, quiet = TRUE))# ### How many partitions in 100 random trees of 10 labels?...# TR <- rmtree(100, 10)# pp10 <- prop.part(TR)# length(pp10)# ### ... and in 100 random trees of 20 labels?# TR <- rmtree(100, 20)# pp20 <- prop.part(TR)# length(pp20)# plot(pp10, pch = "x", col = 2)# plot(pp20, pch = "x", col = 2)fun <- function(x) as.phylo(hclust(dist.dna(x, model="raw"), "average")) # upgma() in phangorntree <- fun(ex.dna.sub)## get 100 bootstrap trees:bstrees <- boot.phylo(tree, ex.dna.sub, fun, trees = TRUE)$trees## Running bootstraps: 100 / 100## Calculating bootstrap values... done.## get proportions of each clade:clad <- prop.clades(tree, bstrees, rooted = TRUE)## get proportions of each bipartition:boot <- prop.clades(tree, bstrees)layout(1)par(mar = rep(2, 4))plot(tree, main = "Bipartition vs. Clade Support Values")drawSupportOnEdges(boot)nodelabels(clad)legend("bottomleft", legend = c("Bipartitions", "Clades"), pch = 22,pt.bg = c("green", "lightblue"), pt.cex = 2.5)d <- dist.dna(ex.dna.sub, model="N")delta.plot(d)layout(1)delta.plot(d, 40, which = 1)# Tree Estimation Based on the Minimum Evolution Algorithm, Desper and Gascuel (2002).trw <- fastme.bal(dist.dna(ex.dna, model="N"))plot(trw)rt <- dist.dna(ex.dna.sub, model="raw", variance = TRUE)v <- attr(rt, "variance")tr <- mvr(rt, v) # Phylogenetic tree construction based on the minimum variance reduction.plot(tr, "u")## Warning in plot.phylo(tr, "u"): 39 branch length(s) NA(s): branch lengths## ignored in the plot# Neighbor-Joining Tree Estimation, Saitou and Nei (1987).trw <- nj(dist.dna(ex.dna, model="N"))plot(trw)## NJ树+多序列比对 - plot an alignment with a NJ tree:trw <- nj(dist.dna(ex.dna.sub, model="raw"))# 绘制带虚线、且右端对齐的树plot(trw, x.lim = 0.1, align.tip = TRUE, font = 1)# Tree Annotationphydataplot(ex.dna[, 1:50], trw, "m", 0.05, border = NA)## from ?boot.phylo:f <- function(x) nj(dist.dna(x, model="raw"))tw <- f(ex.dna) # NJ tree with K80 distanceset.seed(1)## bootstrap with 100 replications:(bp <- boot.phylo(tw, ex.dna, f, quiet = TRUE))## [1] NA 23 44 88 90 100 89 23 100 100 100 100 95 26 100 100 29 100 100## the first value relates to the root node and is always 100## it is ignored below:plot(tw, "u") # 辐射状的树drawSupportOnEdges(bp)## more readable but the tree is really unrooted:plot(tw)drawSupportOnEdges(bp)op <- par(mfcol = 1:2)par(op)# tr <- rtree(10)tr=tw# tr$edge.length[c(1, 18)] <- 100op <- par(mfcol = 1:2)plot(tr); axisPhylo()# 绘制长枝打断的树plotBreakLongEdges(tr, 2); axisPhylo()# tr <- triangMtd(dist.dna(ex.dna.sub, model="raw"))# plot(tr)
神器系统ape(神器系统函数序列线图)
(图片来源网络,侵删)

联系我们

在线咨询:点击这里给我发消息