A simple parsimony-based approach to assess ancestor-descendant relationships

One of the main goals of systematics is to reconstruct the tree of life. Half a century ago, the breakthrough of cladistics was a major step towards this objective because it allowed us to assess relatedness patterns among species, an abstract kind of relationship. Unfortunately, the philosophy of cladism forbade to go further and to seek more realistic relationships, like the ancestor-descendant relationship, which is the expected fundamental kind of relationship of the tree of life according to Darwinian evolution. Here, I describe a simple parsimony-based procedure which can be used to transform a classical cladogram into a genuine phylogenetic tree, i.e. a caulogram. It consists in deleting as many unobserved and unnamed nodes as possible and replacing them with observed and named species. A new Bayesian non-stochastic weighting scheme is used to assess character reliability for both this procedure and classical cladogram construction. I illustrate the whole process by assessing the relationships between the species of the moss genus Didymodon sensu lato (Pottiaceae) and discuss the resulting caulogram by confronting it with the previous methodology from the evolutionary literature. I finally argue that strictly adhering to cladist epistemology is untenable and that we must seek new formal methods to find ancestral species as well as ancestral higher taxa.


Background and Motivation
Since the sixties, cladistics has aimed to resolve the relationships between species (Hennig, 1950(Hennig, , 1966)).These relationships are depicted by cladograms, i.e. hierarchical treelike diagrams where clusters show which two of any three species are more closely related to each other than either is to the third one (Hennig, 1966;Hull, 1979).This type of relationship is characterized as "relatedness" and is supposed to represent the relative recency of the "hypothetical last common ancestors" (or more rigorously the order of emergence of evolutionary novelties).It is, however, dubious that such a relationship corresponds to a true natural process.Unless one believes that a mother species always disappears when it speciates (Lee, 1995), relatedness may ambiguously refer either to a true sister-group relationship (SGR) or to an ancestor-descendant relationship (ADR) (Aubert 2015).In the first case, the two species are descendants of another unobserved third species (so, a true SGR is in fact two hidden ADRs), while in the second case one of them is a descendant of the other one.This second type of relationship is nonetheless misleadingly represented as a false SGR through the artificial introduction of unobserved species on internal nodes (see Fig. 1) if a cladogram is interpreted as a true phylogenetic tree (here "true" means sensu Hennig, see below) or caulogram (the word "caulogram" designates a tree that emphasizes serial macroevolutionary transformations, i.e.ADRs, see Zander, 2013).The internal nodes may be avoided by representing a cladogram as a set of nested parentheses, but the introduction of unobserved species is still logically implied by the assumption that none of the known species is actually an ancestor (Podani, 2013).Cladograms are thus ambiguous, not faithful pictures of evolutionary history.doi: 10.15407/ukrbotj74.02.103 104 Ukr. Bot. J., 2017, 74(2) process from a parental population of continental Leptoscyphus porphyrius (Vanderpoorten and Long, 2006).Moreover, we must point out that this mode of speciation is not restricted to plants.For instance, a recent palaeontological study of the pterocephaliid trilobites have shown through the implementation of a probabilistic model that the main mode of speciation (if not the sole one) is indeed budding cladogenesis, neither bifurcating cladogenesis nor pure anagenesis (Bapst and Hopkins, 2017).This result is largely consistent with the literature in other fields like the foraminifera (Aze et al., 2011).Last but not least, the cladistic axiom that two of any three species must be closer is false since a single mother species may give birth to more than two daughters.Unfortunately, cladistic algorithms force the data to fit a dichotomous tree, which is like trying to hammer a square peg into the round hole of an ideal (Zander, 2013).Such a propensity to use axiomatized synchronous (ahistorical) structures as a fundamental framework is referred to as structuralism (Zander, 2011;Aubert, 2015).
All of the above reasons motivate the research for new methods able to transform cladograms, i.e.Hennig's "phylogenetic diagrams", into true phylogenetic trees (or caulograms) reflecting ADRs, i.e. the real genealogical relationships between species (Prothero and Lazarus, 1980;Paul, 1992;Alroy, 1995;Crawford, 2010;Aze et al., 2011;Tsai and Fordyce, 2015).Contrary to what is sometimes assumed (Gee, 2000), the probability of encountering an ancestral species in the fossil record or among extant species is far from being negligible (Rieseberg and Brouillet, 1994;Crisp and Chandler, 1996;Foote, 1996;Funk and Omland, 2003;Aldous Furthermore, even in the case of a true SGR, unnamed nodes would imply that relatedness is a fundamental phylogenetic relationship.This poses epistemological problems because sisters have always been independent entities; they may for example be born or speciated at very different times.The fact that one sister would have never existed does not necessarily imply the non-existence of the other one, while the non-existence of a mother species necessarily implies the non-existence of all its daughter species.In fact, a true SGR (contrary to a false SGR) only means that both species share a ADR with the same third species.ADRs are thus far more fundamental evolutionary (i.e.truly "phylogenetic") relationships than SGRs.From a more biologically grounded perspective, ADRs generally represent peripatric (or "budding") speciations.This means that a mother species tends to disperse and invade geographically isolated locations.From there, local populations evolve new traits through genetic bottlenecks and directional selection.Thus, they transform into daughter species while the principal population remains morphologically unchanged.This is because the latter benefits from the stabilizing effects of a larger genetic pool and purifying selection.One may call this phenomenon "phylogenetic niche conservatism" (Pyron et al., 2015).It is very unlikely that the mother species transforms entirely into two daughter species through the gradual divergence of two subspecies at the same time (Levin, 1993).Of course, budding speciation is especially widespread in islands, but is not limited to such.As an example, the endemic liverwort Leptoscyphus azoricus of Macaronesia has been shown to have originated through this Fig. 1.The artefactual entities introduced by cladistic analysis.The evolutionary model (dichotomous splitting) used by cladistic analysis forces us to hypothesize many unobserved entities in order to optimize the number of transformation events.Укр. бот. журн., 2017, 74(2) disprove that the putative common ancestor and the terminal species are distinct entities, then they are (or at least one cannot decide).Here, the burden of proof has just been unjustifiably reversed.Indeed, the burden of proof lies upon a person making scientifically unfalsifiable claims.The very existence of an unobserved common ancestor is an unfalsifiable claim because even if we would have sampled a species matching its reconstructed phenotype, one could argue that since the species has been observed then it is not the common ancestor we were looking at.In this framework, common ancestors are not only unobserved, but also unobservable entities.Yet, the principle of Occam's razor tells us that we should minimize the number of such ad hoc entities.On the contrary, the claim that an observed species is the same entity as the predicted common ancestor is a falsifiable claim.It would be theoretically sufficient to find a single autapomorphy in order to disprove it.The existence and the observation of common ancestors are both expected and likely from the theory of evolution.Science must therefore always favour the simplest explanation: if an observed species matches the phenotype of a predicted species, then both species are the same entity.In other words, this is the null hypothesis we must test against alternatives.The very concept of "metaspecies" is therefore unneeded; all so-called unresolved species must be considered paraspecies.
Now, not all morphological characters are equally reliable.Characters in a phylogenetic data set that transform as shared traits (synapomorphies) only once in a cladogram are reputed quite stable, and so are reliable indicators of relationships.But characters that transform many times are rather labile and create many homoplastic misleading relationships.What if then, an observed species nearly matches the phenotype of a predicted species?Are they the same?Here we must leave naïve Popperian hypothetico-deductivism, i.e. unweighted parsimony optimization.It is rather obvious that if the observed autapomorphies are several stable characters, then the null hypothesis must be rejected in favour of the alternative one.But if the observed autapomorphy is only a single very labile character, then the null hypothesis cannot be convincingly rejected.The objective evaluation of the null hypothesis therefore demands a probabilistic quantification of characters' reliability.As we will see, weighted parsimony can be interpreted as a form of non-parametric (i.e.not "model-based") Bayesian approach. and Popovic, 2005;Aldous et al., 2011;Ross, 2014).A phylogenetic analysis that only results in the publication of a new cladogram is therefore merely a preliminary work, which demands a post-cladistic treatment in order to eliminate the ad hoc virtual ancestors and to clarify the nature of the relationships.The feasibility of this objective will be demonstrated by the study of the North American species of the moss genus Didymodon sensu lato (Pottiaceae), which has been recently divided into six segregate genera: Vinealobryum, Didymodon sensu stricto, Trichostomopsis, Geheebia, Exobryum and Fuscobryum (Zander, 2016).This work builds on the studies of Zander (2013Zander ( , 2014aZander ( , b, c, 2016) ) which introduce means of diagramming serial evolution of taxa as caulograms, and suggest support values in terms of decibans.Although Zander mentioned that variation in occurrence of shared traits affects credibility, he did not detaile xplicit means of formally measuring and incorporating variation.This paper introduces the consistency index and successive weighting in cladistic analysis as a means of evaluating variability of traits, with those less variable being more important.This study is restricted to morphological traits.

Rationale for the Post-Cladistic Analysis
In the cladist framework synapomorphies are considered the only evidence of common ancestry.Morphological character mapping over the resulting cladogram allows us to infer the phenotype of this common ancestor.If a branch connecting such an internal node to a terminal species bears no character transformation, and hence has a length of zero, then the phenotype of the common ancestor is exactly the same as the terminal species.However, cladists generally do not regard this as evidence that the terminal species and the common ancestor are the same entity, and prefer to systematically hypothesize that they are different (note that cladism is not the same thing as cladistics; see Aubert, 2015).They argue that only shared character transformations can provide evidence of relationship, and that the lack of a transformation is only a lack of evidence, not evidence per se.At best, a species characterized by the absence of autapomorphy is termed "metaphyletic" (Donoghue, 1985;de Queiroz and Donoghue, 1988).This means that we do not know whether this species is holophyletic or paraphyletic (respectively, all descendants included or not; see Ashlock, 1971).
I would think however that this interpretation is unscientific.It is argued that since one cannot positively Ukr. Bot. J., 2017, 74(2) Once z has been computed, we can easily estimate q as the weighted sum of both cases, i.e. q = z + (1 -z)/3 = 2z/3 + 1/3.Therefore the posterior odds (ratio of probabilities), knowing F, are: q/(1 -q) = (2z/3 + 1/3)/(2/3 -2z/3) = (2z + 1)/ (2 -2z) Since without knowing F, the relationship R can only be true by chance, the prior odds were (1/3)/ (2/3) = 1/2 (i.e., 0.5:1).Hence, the evidence provided by F can be evaluated as the ratio of odds, also known as Bayes factor: This Bayes factor is independent of the prior probability of R, which means that q/(q -1) = kp/ (1 -p) is always true even if p ≠ 1/3 because of some other sources of knowledge (stratigraphy or biogeography for example).Thus, k measures the amount of knowledge that F adds to our previous knowledge.This evidence provided by the consistency index is more appropriately expressed in the logarithmic unit of bans or decibans (dB) because this allows us to interpret evidence in an intuitive manner and makes it possible to literally add units of knowledge.Thus, if we get several independent sources of evidence from different characters, we can mentally add up units of evidence instead of doing complicated computations.A deciban (or decihartley) is a tenth of a ban, a unit used by Bayesian statisticians to represent Bayes factors in hypothesis testing.The deciban scale is here calculated with the formula: w = 10 × log 10 k (the letter w stands for "weight of evidence").This scale goes from 0 to infinity, but 13 dB can be interpreted as a strong evidence (> 95% chances to be true, see Table I).Let us mention that the smallest intuitively detectable evidence is roughly 1 dB, which approximately corresponds to the difference we perceive between the odds 5:4 (around 55-56%) and the totally equivocal 1:1 (exactly 50%) (Good, 1979(Good, , 1985)).
The computation of z is a little more complicated for multistate characters, but the problem can be reduced to a weighted average of the reliability of each possible pair of states.There are � � � � 2 � such pairs.For example, if we consider a three-state character, A and B may be in state 0 and C and D in state 1 or 2, or A and B may be in state 1 and C and D in state 2, or inversely.We can therefore evaluate independently the three possible pairs 0/1, 0/2 and 1/2.If s = 3, then one of the states is represented by two separated monophyletic groups instead of just one, so two of the three pairs have a z equal to 0.5 whereas the third one have a z equal to 1, hence a global value z = 2/3.

A Bayesian Interpretation of the Consistency Index
We consider a morphological binary character x in a matrix of OTUs.We note s the actual number of changes of this character occurring in the most parsimonious dichotomous unrooted cladogram (or at least the chosen one) and m the minimum number of changes that it may require in any such cladogram (i.e.one, in this case).The consistency index is thus equal to c = m/s.Let us now consider four OTUs A, B C, and D. We know the fact F = "A and B share the same state of x, while C and D share another state of this character".The reliability of x can be regarded as the increase in probability that the relationship R = "{A, B} and {C, D} are two mutually exclusive monophyletic groups" is true.We are only interested in monophyly, not holophyly, because rooting a topology is a different task from reconstructing it ("monophyletic" means that the most recent common ancestor is a member of the group, this can be tested without rooting; and then different rootings of the tree can make this group "holophyletic" or "paraphyletic", i.e. containing all its descendants or not, respectively).
The prior probability of R, i.e. not knowing F, is theoretically p = 1/3 because there are exactly three unrooted possible four-taxon trees and only one is compatible with R. The posterior probability q, knowing F, would be one if and only if both character states are homologous for A and B, and for C and D respectively.If either of the two states is homoplastic in these pairs (for example it evolved independently in A and B) then R would be true only by chance, so its probability would be only 1/3.We must therefore evaluate the probability z that the first case occurs.
The character x clusters the whole tree into s + 1 monophyletic parts.There may be u monophyletic groups with state 1 and v monophyletic groups with state 0, so that u + v = s + 1.The probability z that A has been randomly picked in the same monophyletic as B and C in the same as D is therefore z = 1/(uv).Any u and v are theoretically possible, but since convergences and reversions are here considered equally probable, we should generally get u ≈ v.More accurately, this is like tossing a coin s -1 times because of the constraint that u ≥ 1 and v ≥ 1.Consequently, we have a simple binomial distribution: Укр. бот. журн., 2017, 74(2) cannot go below m/g.This has led some systematists to the conclusion that c must be rescaled between 0 and 1 (Farris, 1989).However, this would mean that the amount of evolution needed for the transformation of a cladistically uninformative character is exactly zero, i.e. is equivalent to no transformation at all.Thus, I do not recommend the use of the rescaled consistency index (RCI) to calculate the evidence provided by a character to assess the nature of a transformation of shared traits.
If s = 4, then either two states are represented by two separated monophyletic groups, or one state is represented by three such groups.In the first case, two pairs have z = 0.5 and the third one z = 0.25, while in the second case two pairs have z = 1/3 and the third one z = 1.Provided that these supplementary monophyletic partitions are distributed randomly among the different states, the first case has a probability of 2/3 while the second one has a probability of 1/3.Hence, the global value of z = 2/3 × ( 1/2 + 1/2 + 1/4 )/3 + 1/3 ×   ( 1/3 + 1/3 + 1)/3 = 25/54.Here, we observe that (m, s) = (2, 4) implies z ≈ 0.463, which is slightly inferior to the case (m, s) = (1,2) where z = 0.5, although c = 0.5 in both cases.Thus, the consistency index does not accurately take into account the number of distinct states.In the general case we have: , but the value of z is generally strictly lower than that of c.The values of z are presented in the Table II, as well as the corresponding values of w.Note that the consistency index cannot reach 0 even if the character is cladistically completely uninformative.We could note g the maximum number of transformations that the character x may undergo among all possible cladograms in order to explain its state distribution, i.e. the minimum between the number of 0 and the number of 1

Data Source
As an example to demonstrate the feasibility of the analysis, I have used the matrix of 20 characters of 23 OTUs from Zander (1998), including 22 species from North America of the genus Didymodon sensu lato, plus an outgroup species Barbula unguiculata.Two additional species and 22 supplementary characters were included after reviewing the most recent literature (Zander, 2013(Zander, , 2014a(Zander, , b, c, 2016)), plus the Internet website <www.efloras.org>.The data are entirely neontological, all species being extant (see Table III and Annex).

Weighted Parsimony Cladistic Analysis
Character transformations were generally considered as unordered and of equal weight, except for discretized The Bayesian pieces of evidence provided by putative characters' transformations can be used as weights in a weighted parsimony cladistic analysis and as a branch length scale in a phylogrammatic representation of amounts of evolution separating species.Indeed, additivity is an expected property of distances on a phylogenetic tree, and contrary to the raw consistency index, Bayesian evidence measured in decibans is additive.This makes sense intuitively: if a stable character transformation is as probable as several more common character transformations, then they must be represented by the same length.Therefore, labile character transformations should be represented by shorter branches.The (patristic) distances on such a phylogram would represent the probability that any character transform.It is an intuitive measure of the "amount of evolution" between any pair of species.the trees that are less than 12.99 dB away from the most parsimonious one were sampled and a strict consensus tree was built.Branches whose loss is inferior to this cost were therefore not retrieved, which means that their Bremer support is strictly inferior to 13 dB (i.e.< 95%).
On the other hand, branches that were retrieved have a Bremer support superior or equal to 13 dB.

Post-Cladistic Analysis
Character changes can be mapped using either ACCTRAN or DELTRAN algorithms, so that the mean length of every branch is estimated.Branch length is simply the sum of weights of all character transformations (measured in dB as assigned by the weighted parsimony cladistic analysis, PAUP* can do it automatically, see above) along this branch.Nodes that are joined by very short branches correspond to clades supported by very labile characters.Intuitively, the support of such branches is not very strong.In fact, if the length is inferior to 13 dB, we should generally conclude that the branch does not exist and the two nodes represent the same entity.Such a deletion would have consequences on neighbouring branches (see Fig. 3).Any procedure of elimination of unnecessary entities may be called superoptimization (Zander, 2013).However, we cannot proceed directly to these deletions directly on the most parsimonious cladogram, because we would rely on the assumption that its topology is not strongly distorted by cladistic overfitting (it could be possible to do so only if each and every ancestral species gave birth to only one or two derived species, which is a very strong assumption I do not hold).
morphometric characters and for those containing intermediary, variable or ambiguous states.These were considered ordered characters and were rescaled so that their portion of the transition over their full range represents a single transformation (see Annex).
Heuristic search of the most parsimonious cladograms were carried on with PAUP* version 4.0a150, with 100 replicates starting with random trees, holding 10 trees at each step, swapping on all trees with TBR algorithm, letting reconnection limit at 8 by default and saving multiple trees.After the initial search non-homoplastic characters were considered equal to 13 dB while others were reweighted according the Bayesian interpretation of the mean consistency index (see Table II) of all retained trees.The weights were used in the next steps to search again the most parsimonious trees and then compute again new weights, and so on iteratively (Farris, 1969).Computed weights were used at a precision of 2 digits after the point.The branches were systematically collapsed (creating polytomies) when the minimum length was zero (parameter "amb-").

Bremer Support
The most parsimonious tree is not always the true tree.
In fact, optimization of the data over a model can result in overfitting.This is a serious bias (see Fig. 2 for a simple illustration of this notion).In classical cladistic analysis, Bremer support of a clade in the most parsimonious tree is the minimum number of extra steps required to draw a near-most-parsimonious tree that does not contain this clade (Bremer, 1988(Bremer, , 1994)).
In order to evaluate the support of the putative clades,   This software is able to map characters' transformations according to several optimization algorithms (ACCTRAN, DELTRAN or MINF) but a particular branch cannot be directly forced to have a length of zero.However, if we add a new OTU identical to the putative ancestor (here «anc1» has exactly the same character states as G. maxima) in an unresolved trichotomy then PAUP* is obliged to infer that the last common ancestor of these three OTUs had the same character states as the majority of them (that is, «anc1» and G. maxima), thus drawing two branches with a length of zero and increasing the length of the remaining branches.The tests must be conducted in the context of the tree, not in isolation.Because the result may change according to this very context, huge polytomies (like the one including G. fallax, G. ferruginea, etc.) necessitate trying many rearrangements of outgroups (not just pairwise tests like those of Table V) in order to find the best superoptimization.Укр. бот. журн., 2017, 74(2) parsimonious trees were searched again, and then new weights were computed again.This second iteration had also only one most parsimonious tree with a length of 774.56 steps.The third iteration led to the tree and the same weights (see Table IV).The tree is described in Fig. 5A.I obtained 1474 trees with a score inferior to 787.55 steps.A strict consensus tree has been built from them (see Fig. 5B).Clades that appear on the first tree but not on the second have a Bremer support strictly inferior to 13 dB (i.e.< 95%) and so are not retained.
If one compares the two trees obtained in Fig. 5, the data could seem rather noisy.However, the evaluation of 10 million random trees with PAUP* show that none of them approaches the score of the most parsimonious tree.The mean score was 1422.71steps, with a standard deviation of 46.34 steps and a skewness index g1 = -0.5873(or -0.4365 without weighting), which is far more negative than the critical values needed for such amounts of taxa and characters (Hillis and Huelsenbeck, 1992).According to Table IV, the best estimates of characters' weights may be a little doubtful for only five characters: 24, 26, 31, 32 and 36.However, eliminating them completely does not substantially alter the most parsimonious topology nor the consensus tree obtained from all near-most-parsimonious trees (data not shown).
The instability of the cladogram can thus be attributed to the unstable phylogenetic positions of For each putative ADR, the strict consensus cladogram computed from all near-most-parsimonious cladograms was modified by pruning it from the other competing ADR hypotheses so as to compare its total length if the two taxa retained are in a sister-group relationship (with an unobserved common ancestor) or in an ancestor-descendant relationship (without any unobserved ad hoc entity).The ADR was forced in PAUP* by copying several times the putative ancestral species in a basal polytomy (see Fig. 4).If the total length difference was inferior to 13 dB, then it was considered that the null hypothesis (ADR) could not be rejected, and so was accepted.Since there can exist only one mother species (unless we assume that hybridization is likely), in the case where several possible ancestors could not be rejected, the less costly competing hypothesis would be accepted, however only with a credibility corresponding to the difference of the two costs (i.e.unfortunately necessarily inferior to 13 dB).

Cladistic Analysis
In the initial step of the analysis only one most parsimonious tree was found, at a length of 155.67 steps.The consistency indices were computed for each character and accordingly reweighted using the Bayesian interpretation described above.The most Table IV.The stable weights obtained after successive weighting.The best estimate of weight corresponds to the consistency index of the only one most parsimonious tree found at the end of the iterative search.The minimum, mean and maximum weights correspond to the minimum, mean and maximum consistency indices found among all trees that are less than 12.99 dB away from the most parsimonious one (the mean weight is computed by rounding the mean value of s to the closest integer).Only two iterations were necessary to obtain stable weights.The five highlighted characters are those whose best estimates differ from their mean estimates.Ukr.Bot.J., 2017, 74 (2) mother species relative to their daughter species which may be wrongly grouped together because of convergent evolution or reversions.In other words, these are hard polytomies; they are not resolvable because of the clear implication that a single ancestor gave birth to several derived relatives.They are not soft polytomies that could be solved by using more and more data.The less resolved tree of Fig. 5B is therefore certainly more accurate, i.e. closer to truth, than is the more precise tree of Fig. 5A.One should not force the data into an artificial dichotomous scheme (Hull, 1979).The data are well structured, but not cladistically so.

Reconstruction of ADRs
As a first example, let us discuss the case of the relationship between G. maxima and G. gigantea.The total length of the strict consensus tree of all nearmost-parsimonious trees is 947.25 steps.If we suppose that G. maxima is the ancestor of G. gigantea then we get a length of 954.87 steps (see Fig. 4), whereas we get a length of 958.24 steps if we force G. gigantea to be the ancestor of G. maxima.In the first case, our ADR hypothesis only costs 7.62 dB, while in the second case it costs 10.99 dB.Neither hypothesis exceeds the threshold, but the first one is less costly and is therefore accepted while the second one is rejected.The fact that the resulting phylogenetic tree seems less parsimonious is an illusion caused by the lack of penalty accounting for unobserved ancestors.We should actually subtract 13 dB and realize that we have just won 5.38 dB.Simple ADRs like the above one should always be resolved first, before tackling more complex cases.
The second example I am now going to detail is the genus Fuscobryum, comprising three species.This case is simple to resolve because there are no polytomies, all dichotomies are supported at 13 dB or more.There are therefore only three tests to conduct: is any these three species the same as the node it is supposed to derive from?The three trees corresponding to F. perobtusum, F. nigrescens, and F. subandreaeoides have respectively A B certainly at least two independent lineages.G. fallax may be the most primitive species in the genus Geheebia.Indeed, if E. asperifolium is assumed to be descended from G. ferruginea then G. fallax as potential ancestral species is not rejected anymore, at 8.89 dB.
We need however to keep in mind that a hypothesis that is not rejected is not necessarily the best solution.Specifically, the pairwise tests are carried out in particular phylogenetic contexts, so that any modification in the neighbouring topology may change the results of the tests.All possible rearrangements were tried to place G. maschalogena, G. tophacea and G. leskeoides in the right phylogenetic positions.Surprisingly, it appeared that the best score was obtained with G. tophacea as the sister species of G. fallax, both descended from an unknown founding mother species of the genus.The species G. maschalogena is finally best considered a direct descendant of G. tophacea while G. leskeoides is probably a direct descendant of G. fallax, just like G. ferruginea.The genera Vinealobryum and Didymodon sensu stricto have also been studied but the detailed calculations are not shown here since the approach is exactly the same as above.Many rearrangements were tried and the best caulogram found is presented in Fig. 6.

The Meaning of Parsimony
The length of the best caulogram found is 873.11steps, which is 98.55 more steps than the most parsimonious cladogram at 774.56 steps.However, there are only 7 unobserved species instead of the 23 necessary internal nodes of the cladogram.Thus, we economized by eliminating 16 ad hoc entities.Since the procedure we used is equivalent to the fact of considering each additional entity as having a value of 13 dB, we can say that we economized 208 dB, which compensate a length of 969.46 steps, 963.66 steps and 970.15 steps.Compared to the previous best tree of 954.87 steps, these hypotheses have a cost of 14.59 dB, 8.79 dB and 15.28 dB.The first and the third hypotheses exceed the threshold and are therefore rejected, but the second hypothesis is well below and is accepted.This means that F. nigrescens is the extant ancestor of F. subandreaeoides, but the last common ancestor of the three species remains unknown.It appears that this scenario is not exactly the one favoured by Zander (2014c) who inferred that F. nigrescens was the last common ancestor of the other two species.
The case of the genus Trichostomopsis is also simple to resolve.Only four tests are needed to assess potential ancestral species.All of them were rejected at around 25 dB except T. australasiae whose status of ancestor of T. umbrosa costs nothing at all.Zander (2014c) concluded that T. australasiae is the ancestor of both T. umbrosa and T. revoluta, but my test rejected this hypothesis at 24.34 dB.Unless the cladogram was misleadingly distorted by an artefact of long branch attraction we must conclude that T. australasiae and T. revoluta are derived from a shared unknown common ancestor.The case of D. acutus, D. rigidulus and D. icmadophilus is more ambiguous.Any of the three species may be the ancestor of the other two, with the respective costs 5.51 dB, 3.64 dB and 9.15 dB.There is however a small hint in favour of D. rigidulus, so we accept this hypothesis.These three species may be better considered subspecies rather than distinct species, but more data on morphology and possible reproductive isolation is needed to conclude definitively.
The clade Geheebia-Exobryum is a quite big polytomy and needs many tests in order to resolve it.I estimated the cost of ADR for each pair of species (excluding G. gigantea which we already know is directly derived from G. maxima).The results presented in Table V show us that G. ferruginea is certainly the ancestor of G. maxima (with a cost of 0 dB).In fact, both G. ferruginea and G. Fallax could be the ancestor of all other species.However, G. ferruginea seems to be the direct ancestor of E. asperifolium while G. fallax would be the one of G. leskeoides and G. maschalogena.The species G. tophacea seems slightly closer to G. ferruginea but has clear affinities with both G. leskeoides and G. maschalogena, which leads to the suspicion of convergent evolution.Both G. ferruginea and G. fallax were tested as a potential direct ancestor of all remaining ones, but these hypotheses were rejected at 26.41 dB and 20.76 dB respectively.This means that there are 28, 31, 36, and 41.Three of them (24, 28, 31) concern the shape of different cells.This indicates that cell shape is generally not a good phylogenetic marker.The general shape of the leaf ( 9) is also to be considered a poorly reliable character.However, in both cases it is hard to know if this comes from a real tendency to evolve frequently or from the lack of a precise and reproducible morphometric measure (in which case the dataset should be corrected and re-analysed).The absence of a sporophyte (36) is not a reliable character either; therefore it seems that the loss of sexuality is very easy to evolve in Pottiaceae.Finally, two character transformations are unique to one species each (autapomorphies).These are character 8 for F. subandreaeoides which uniquely have dimorphic leaves, and character 22 for V. nevadense which uniquely have multi-layered photosynthetic cells on the ventral costal surface.These two unique traits strongly indicate that these two species cannot be an ancestral to another one, which is also the case in Zander's analysis (see below).

Comparison with Zander's Results and Methods
These results only slightly differ from those of Zander (1998Zander ( , 2013Zander ( , 2014aZander ( , b, c, 2016)).For example, the phylogenetic position of E. asperifolium in these previous studies, basal to the genus Geheebia, is arguably due a long branch artefact.The construction of a UPGMA tree the previous loss.Our caulogram is therefore 109.45 steps more parsimonious than the most parsimonious cladogram.

Character reliability
In the dataset, although many characters may be considered quite reliable at about 8 or 10 dB, very few seem to be extremely reliable.Indeed, only three non autapomorphic characters have a weight of 13 dB (see Table IV).These are characters 14, 20 and 21, or respectively margin ornamentation of the leaf, the presence or absence of a bulge on the abaxial face, and the presence or absence of a thin-walled pad of cells on the adaxial face.The first one is a synapomorphy defining the large clade made of the genera Vinealobryum, Didymodon sensu stricto, Geheebia, Exobryum and Trichostomopsis, but not Fuscobryum.The second one defines the clade of the genus Didymodon sensu stricto.And the third one defines the clade made of the genus Trichostomopsis and the species V. nevadense.Thus, this species may be better considered a member of this genus (morphological convergence seems unlikely in this context because this trait has only evolved once with a support of at least 13 dB).
On the other hand many characters are very labile and are accordingly weighted at less than 2 dB (see Table IV).These are the characters 9, 16, 23, 24,  Укр. бот. журн., 2017, 74(2) confirmed here).In its spirit, Zander's methodology is quite similar to continuous track analysis (Alroy, 1995), but it is far more holistic.
We must always seek a way to formalize our implicit (expert) reasoning into an explicit one so as to make it reproducible by others.My methodology necessitates no aprioristic expertise because it is more "mechanical", i.e. more algorithmically constrained.It may thus be qualified as more reductionist because I do not use some kind of informations like distribution or environment, and also because the measures of lability cannot be nuanced by some kind of a priori complex knowledge.It is therefore perhaps more reproducible, but above all completely doable by a computer.Yet, expertise is still needed a posteriori in order to interpret the results and evaluate their plausibility.Indeed, an expert can suspect a bias if for example the results are nonsensical even though they are numerically strongly supported (Hołyński, 2010).

Perspectives on Post-Phylogenetic Systematics (aka Modern Evolutionary Systematics)
Phylogenetic reconstruction methods are classically classified as either model-based or not.In the first case we have maximum likelihood (ML) and Bayesian inference (BI), while in the second case there exist maximum parsimony and compatibility technique, for example (Felsenstein, 1978(Felsenstein, , 1984;;Farris, 1983).However, the term "model-based" is really ill-chosen.It misleadingly suggests that the classical cladistic analysis does not assume any evolutionary hypothesis and is therefore model-free as opposed to other techniques.This is certainly false (Friday, 1989), but we may still distinguish between those that explicitly specify a parametrized evolutionary dynamics and call them stochastic models, and those that do not and therefore call them non-parametric methods.
The main assumptions that all the above cited techniques share are that no ancestor was sampled and that speciation is strictly dichotomous, which are very strong assumptions.Even if the latter is not always lucidly claimed, it is a rather obvious consequence of the cladistic algorithm: since one cannot (in this framework) distinguish between a genuine polytomy and a lack of resolution then the data are always forced into artificial dichotomies (Hull, 1979).The method presented in this paper, as well as Zander's, may be considered non-parametric like the classical cladistic analysis.Even if weights are used here, they do not quantify a part of the evolutionary process but our confidence in our inferences.However, the two assumptions that no indeed revealed that this species does not cluster within the genus Geheebia (data not shown), which is probably due to an accelerated evolution.The species V. vineale is here revealed as being derived, not ancestral to all the other species, but its rather conservative morphology explains the previous conclusion.The prime ancestor of this complex seems extinct or pseudoextinct, i.e. anagenetically transformed into another species.It is really not surprising given that this species is supposed to be quite old: the more time passes, the less likely a species remains unchanged.On the contrary, the more recent ancestors of this complex of species are still alive.In fact, exactly half of them ( 12) have unobserved direct ancestor.
The other differences with Zander's results are minor and are certainly due to the different set of data I used.They may merit a careful re-examination but I shall not comment them any further since the purpose of this study is primarily methodological.I will therefore not make any formal taxonomic decision.However, it seems now unclear that E. asperifolium deserves its own genus.The results also suggest that V. nevadense may be better treated as belonging to Trichostomopsis.A patrocladistic analysis may be conducted in order to test the consistency of the remaining genera (Stuessy and König, 2008).It seems that they fit more or less the definition of dissilience (Zander, 2013), that is a core species with several radiative species bursting from this core.The scheme seems however more complex than previously thought, including not only distinct lineages or stirps radiating from the same core, but also stirps arising from otherstirps.
The main difference between Zander's methodology (Zander, 2014a, b, c) and mine is that he attempts to assess ADRs by seeking among available species which one is the more likely candidate to the status of ancestral species.The contrast between SGR and ADR hypotheses is not done explicitly.However, the ancestral species are not found directly, but through the successive elimination of the less probable candidates, that is those with obvious derived traits.The weighting of the different traits is also done in a Bayesian framework; however, he follows an intuitive scheme which ultimately relies on expertise, i.e. a long-standing experience with regard to trait stability.Moreover, he does not use only morphological data, but also other kinds of information like distribution or environment.For example T. umbrosa occurs in human environments contrary to the case with the other Trichostomopsis species, which indicates that it is probably not the ancestral species we are seeking (as it is Ukr. Bot. J., 2017, 74(2) evolutionary dynamics of a genus or a family (Sepkoski, 1996).

Conclusion
The results of this study support the following major conclusions: Another obvious consequence of this study is that autapomorphies or "uninformative" cladistic characters and character states should never be pruned from matrices.This would strongly bias the data for any post-cladistic analysis.In the same manner, labile and even very labile characters should be studied and added to matrices so that the data are as complete as possible.These requirements are also needed in order to not bias stochastic cladogram reconstructions such as likelihood methods or Bayesian inference.
My procedure, as well as Zander's, are limited heuristics and are not guaranteed to find optimal solutions.They are rather constraining guides that help organize the data, so that the systematists can reconstruct an evolutionary scenario and make taxonomic decisions accordingly.My study clearly revealed that the length of a cladogram is not the sole parameter we need to minimize, but that the minimization of unobserved entities is also an important parameter to take into account.This naturally led to an equivalence relationship between these two parameters, which can be translated into a new objective function that a specialized algorithm could minimize by trying many rearrangements of the possible topologies.This function is simply as it follows: S = L + 13n S is the score we want to minimize, L is the total length of the tree (on the deciban scale) and n is the number of unobserved ancestral species required by the tree topology.The minimization of this function is not ancestor was sampled and that speciation is strictly dichotomous are explicitly rejected because they are not realistic.Our post-phylogenetic analyses are therefore based on a distinct evolutionary model which is more empirically grounded (Zander, 2013).A stochastic approach that would also reject these two assumptions is conceivable and should actively be sought.
Some cladists have insisted that ancestral species cannot be recognized as such, and that ADRs are therefore unknowable (Nelson, 1973;Farris, 1976).These claims seem to be fundamentally based on a particular version of nominalist epistemology (Aubert, 2015).As a consequence, anybody that does not endorse this very philosophy could simply refuse these assertions without further justification, scientific realists for example (Sankey, 2001).Hull (1979) warned: "In general, I think it is very bad strategy for proponents of a particular scientific research program to stake their future on epistemological considerations, especially on our inability to know something."Indeed, epistemology should not be seen as an a priori set of constraints that dictate what science can do or cannot do.Epistemology should be handled in a more empirical manner, mainly in order to take a global view a posteriori on the achievements of science.A one-way relationship between epistemology and science is really a kind of sterilizing metaphysics which has its modern origins in German Idealism (especially Kant's Critique of Pure Reason).Only a genuine dialogue between both can be fruitful: this is dialectics, not metaphysics.

Fig. 2 .
Fig. 2. The concept of overfitting.With only two parameters, a linear function can only approximate the six depicted points.With six parameters, a polynomial function can go exactly through each of the six points.The function is thus more precise, but obviously less accurate.In our case, each unnecessary ancestor can be regarded as a supernumerary parameter.

Fig. 3 .
Fig. 3.The principle of superoptimization.The goal of superoptimization is to remove unnecessary entities by deleting insufficiently supported terminal or internal branches.Like in classical character mapping, several solutions may exist (ACCTRAN or DELTRAN).The letters "A", "B" and "C" indicate species, the numbers indicate characters being transformed, and the letter "R" indicates a reversion.

Fig. 4 .
Fig. 4. How to force PAUP* to draw a caulogram.These are parts of phylograms drawn by PAUP*.This software is able to map characters' transformations according to several optimization algorithms (ACCTRAN, DELTRAN or MINF)but a particular branch cannot be directly forced to have a length of zero.However, if we add a new OTU identical to the putative ancestor (here «anc1» has exactly the same character states as G. maxima) in an unresolved trichotomy then PAUP* is obliged to infer that the last common ancestor of these three OTUs had the same character states as the majority of them (that is, «anc1» and G. maxima), thus drawing two branches with a length of zero and increasing the length of the remaining branches.The tests must be conducted in the context of the tree, not in isolation.Because the result may change according to this very context, huge polytomies (like the one including G. fallax, G. ferruginea, etc.) necessitate trying many rearrangements of outgroups (not just pairwise tests like those of TableV) in order to find the best superoptimization.

Fig. 5 .
Fig. 5.The consensus trees.(A) Most parsimonious tree found after a heuristic search with successive weighting (stable after two iterations).(B) Strict consensus tree built from all the 1474 trees that are less than 12.99 dB away from the most parsimonious score.

Fig. 6 .
Fig. 6.The commagram depicting the ancestor-descendant relationships among the species of Dydimodon sensu lato.The corresponding caulogram has a length of 873.11 steps.The seven unknown predicted ancestral species are represented by question marks.

Table I . The deciban scale of the Bayesian weight of evidence.
Only the most salient values are psychologically interpreted.

Table II . The consistency index understood as Bayesian evidence
. The letters m and s indicate respectively the minimal and the actual number of transformations of a character on a particular cladogram.The consistency index is defined as c = m/s.The values of z are probabilities while the values of w are expressed in decibans (see text for formulas).Ukr.Bot.J., 2017, 74(2)

Table III . The matrix of characters of Didymodon s. l.
The species Barbula unguiculata is the outgroup.The 42 characters used are presented in the annex.

Table V . The evaluation of ADRs among species of the genera Geheebia and Exobryum.
Each ADR hypothesis is tested against the corresponding SGR hypothesis.Their rejections are expressed in decibans, the threshold of 13 dB (> 95%).ADRs that were not rejected are highlighted.