### New Website and Blog

I have a new professional website at gandenberger.org. I will resume my blogging activity there.

I have a new professional website at gandenberger.org. I will resume my blogging activity there.

In his book-length argument for likelihoodism, Royall (1997, 3) distinguishes the following three questions:^{[1]}

(1) What should you believe?

(2) What should you do?

(3) What does the present evidence say?

Likelihoodist methods are only intended to answer Question (3). In my view, the trouble with likelihoodism is that an answer to Question (3) is useful only insofar as it aids in answering Question (1) or (2), and likelihoodism does not provide an alternative to Bayesian and frequentist methods for answering Questions (1) and (2). Thus, while likelihoodism may be true, it is not a viable genuine alternative to Bayesian and frequentist methodologies.

I provide a more detailed argument for this view here. The next step in my project is to further develop this argument, primarily by considering possible likelihoodist responses. I currently have on my radar the following three responses:

(I) Characterizing one’s data as evidence is valuable in itself.

(II) The Law of Likelihood does in fact answer Royall’s questions (1) and (2), but only

ceteris paribus.(III) The Law of Likelihood provides a “frequentist methodology,” that is, a methodology that is justified by its operating characteristics.

My next series of posts will develop objections to these responses.

[1]The wording I use for these questions is different from Royall’s and comes from (Sober 2008, 3).

Royall, R.M. 1997. *Statistical Evidence: A Likelihood Paradigm*. Monographs on Statistics and Applied Probability. London: Chapman & Hall.

Sober, Elliott. 2008. *Evidence and Evolution: The Logic Behind the Science*. Cambridge University Press.

A draft of the second chapter of my dissertation is available here. This chapter provides new responses to purported counterexamples to the Likelihood Principle due to Fitelson, Armitage, and Stein that I take to be stronger in some respect than previous responses.

A draft of my complete response to Fitelson’s counterexample to the Law of Likelihood is available ~~here~~.

**Update:** This draft has been superseded by the version presented here.

In my previous post, I suggested that we might distinguish between two “evidential favoring” relations. One is a “competitive” notion appropriate only for mutually exclusive hypotheses which the Law of Likelihood explicates. The other is a “comparative” notion that is appropriate for any pair of hypotheses which is explicated by a measure of confirmation together with the following “Bridge Principle:”

(†) Evidence E favors hypothesis H

_{1}over hypothesis H_{2}if and only if it confirms H_{1}more than H_{2}.

I now prefer a cleaner approach.

In a previous post, I argued for restricting the Law of Likelihood[1] to mutually exclusive hypotheses and claimed that this restriction would not exclude cases of genuine scientific interest because scientists don’t test hypotheses against one that are not mutually exclusive. This restriction seems natural and is sufficient to address a counterexample due to Fitelson (2007). Steel (2007) considers the same restriction but rejects it for reasons that I consider inadequate, as I explain in this post. Chandler (2013) argues for the restriction and defends it against an objection. I suggest an amendment to that defense in this post.

Fitelson criticizes the proposal to restrict the Law of Likelihood to mutually exclusive hypotheses in his (2012). One worry he expresses is that in the context of statistical model selection, it is supposed to be one of strengths of likelihoodism, as opposed to Bayesianism, that it is capable of testing nested models against each other. Thus, a likelihoodist who restricts the Law of Likelihood to mutually exclusive hypotheses thereby gives up a key support for his or her view.

In a previous post, I argued for restricting the Law of Likelihood[1] to mutually exclusive hypotheses and claimed that this restriction would not exclude cases of genuine scientific interest because scientists don’t test hypotheses against one that are not mutually exclusive.

Chandler (2013) argues for this restriction as well. He considers the objection that Forster and Sober (2004) claim that in the context of model selection, scientists sometimes say that data favor a given model over a logically weaker one. For instance, they might say that a set of observations of a variable Y taken for different X values that fall roughly along a straight line favor the model (LIN) according to which Y is a linear function of X plus a noise term over the model (QUAD) according to which Y is a quadratic function of X plus a noise term. But (LIN) is a special case of (QUAD) obtained by setting the coefficient of the X^{2} term in (QUAD) to zero. Thus, the scientists who speak in this way are speaking of an evidential favoring relation between compatible hypotheses. As a result, restricting the Law of Likelihood to apply only to mutually exclusive hypotheses excludes cases of genuine scientific interest.

Chandler’s response to this objection is that, “as Forster and Sober use the term, ‘E favors model M_{1} over model M_{2}’ is actually shorthand for ‘E favours the likeliest (in the technical sense) disjunct L(M_{1}) of model M_{1} over the likeliest distinct of model M_{2},’ with L(M_{1})∩L(M_{2})=∅.“ This reply may be true, but it does address the whole problem. Forster and Sober may use the phrase “E favors model M_{1} over model M_{2}” in this way, but scientists use it in other ways as well. For instance, they might use the phrase “E favors (LIN) over model (QUAD)” to mean their data favors the claim that the element of (QUAD) that is closest to the true model in some sense has a nonzero coefficient for the X^{2} term over its negation. If they are a bit more sophisticated then they will realize that it is implausible in typical applications that the coefficient of the X^{2} term is *exactly *zero. What they might mean instead is that their data favors over its negation the hypothesis that a nonzero coefficient for the X^{2} term is necessary for producing a statistically adequate curve, meaning roughly a curve the residuals of which look like white noise to a degree that is adequate for their aims.

On both of these interpretations, “E favors (LIN) over (QUAD)” actually means “E favors (LIN) over (QUAD)\(LIN).” In general, when scientists talk about evidence favoring one model over another when those models are nested, they typically mean either that it favors the best element of the first over the second or that it favors the smaller model over the set-theoretic difference of the larger model minus the smaller model or vice versa. Unless there are cases in which scientists clearly mean what they say when they speak of evidence favoring one model over another when those models are nested, such examples do not indicate that there are cases of scientific interest that involve testing compatible hypotheses against one another.

In the previous post, I discussed a counterexample to the Law of Likelihood due to Fitelson (2007). Again, the Law of Likelihood says (LL) says that datum *x* favors hypothesis *H*_{1} over*H*_{2 }if and only if the likelihood ratio *k*=Pr(*x*;*H*_{1})/Pr(*x*;*H*_{2}) is greater than 1, with *k* measuring the degree of favoring. Fitelson’s counterexample is as follows:

…we’re going to draw a single card from a standard (well-shuffled) deck…. E=the card is a spade, H

_{1}=the card is the ace of spades, and H_{2}=the card is black. In this example… P(E|H_{1})=1>Pr(E|H_{2})=1/2, but it seems absurd to claim that E favors H_{1}over H_{2}, as is implied by the (LL). After all,E guarantees the truth of H, but E provides only non-conclusive evidence for the truth of H_{2}_{1}.

I argued for blocking this counterexample by building into the Law of Likelihood the requirement that H_{1} and H_{2} be mutually exclusive.

Steel (2007) suggests a similar maneuver in response to a variant on the “tacking paradox.” The tacking paradox has been presented as an objection to theories of confirmation which imply that E confirms H to the same degree that it confirms H conjoined with an irrelevant proposition A. The Law of Likelihood is compatible with the “contrastivist” view that there is no such thing as confirmation for a single hypothesis in isolation, so it is not subject to the tacking paradox in this form. However, it is subject to a slight variant of the paradox because it implies that E is evidentially neutral between H and H conjoined with an irrelevant proposition A.

Fitelson’s example is not an instance of the tacking paradox because neither H_{1} nor H_{2} is the conjunction of the other with an irrelevant proposition: H_{1} is the conjunction of H_{2} with itself, which is not irrelevant. But there are instances of the tacking paradox that are like Fitelson’s example in that they show that the Law of Likelihood without a restriction to mutually exclusive hypotheses violates the following intuitive restriction on accounts of evidential favoring:

(*) If E provides conclusive evidence for H

_{1}, but non-conclusive evidence for H_{2}(where it is assumed that E, H_{1}, and H_{2}are all contingent claims), then E favors H_{1}over H_{2}.

For instance, change H_{1} in Fitelson’s example to “the card is black and the price of tea in China was higher on January 1, 2013 than it was on January 1, 2012.“ Then (adapting Fitelson’s words above), we have P(E|H_{1})=Pr(E|H_{2})=1, but it seems absurd to claim that E is evidentially neutral between H_{1} and H_{2}, as is implied by the (LL). After all, E guarantees the truth of H_{2} but provides only non-conclusive evidence for the truth of H_{1}.

Steel points out that restricting the Law of Likelihood to mutually exclusive hypotheses prevents violations of (*). However, he claims that an additional restriction is needed to address fully the concern raised by the tacking paradox because one could tack an irrelevant proposition A onto one of a mutually exclusive pair of hypotheses H_{1} and H_{2}. The restriction he proposes is that H_{1} and H_{2} be “structurally identical,” meaning that they assign values to the same set of random variables. That restriction prevents tacking on because A would not be irrelevant if its conjunction with one of H_{1} and H_{2} were structurally identical to H_{1} and H_{2}.

Steel points out that statisticians typically work with sets of hypotheses that are mutually exclusive and structurally identical (partitions of some possibility space). However, he claims that “there are many scientifically interesting cases that do not involve the comparison of structurally identical alternatives,” such as the comparison between Newtonian mechanics and Einstein’s general theory of relativity, that are often discussed in the Bayesian confirmation literature. In addition, there are cases that involve comparing mutually consistent hypotheses. For instance, one might want to know whether the evidence supports incorporating A as part of the hypothesis H. One might for the purpose of deciding this issue consider whether E confirms better H or the conjunction of H and A.

I personally don’t find the tacking paradox paradoxical, and regard the question whether it is paradoxical or not uninteresting because it does not arise in either scientific practice or everyday life. If the strongest objection to a given theory of confirmation or evidential favoring is that it is susceptible to the tacking paradox, then I would regard that theory as a success. For that reason, I will not restrict the Law of Likelihood to structurally identical alternatives.

Let us then pass on to Steel’s claim that one might want to consider whether E confirms better H or the conjunction of H and A in order to assess whether the evidence supports incorporating A as part of the hypothesis H. What is immediately relevant is actually a slightly different claim, namely that one might want to consider whether E favors H over the conjunction of H and A and A in order to assess whether the evidence supports incorporating A as part of the hypothesis H.

A real example may help in eliciting intuitions. Let H be Darwin’s theory of evolution, A be the claim that birds are descended from dinosaurs, and E be the first archaeopteryx fossil discovered. Suppose you accept Darwin’s theory and want to assess whether the first archaeopteryx fossil discovered supports incorporating the claim that birds are descended from dinosaurs as part of that theory. Is the relevant question for this purpose whether or not the fossil favors Darwin’s theory over the conjunction of Darwin’s theory with the claim that birds are descended from dinosaurs?

I think it’s clear that the answer is “no.” If there is a relevant question here that the Law of Likelihood addresses, it is not about the “competition” between Darwin’s theory and Darwin’s theory conjoined with the claim that birds are descended from dinosaurs, but rather about the competition between Darwin’s theory conjoined with the claim that birds are descended from dinosaurs and Darwin’s theory conjoined with *the negation of* the claim that birds are descended from dinosaurs. If this competition is inconclusive, then one may wish to accept Darwin’s theory while remaining agnostic about the ancestry of birds.

In summary, Steel considers restricting the Law of Likelihood to mutually exclusive and structurally identical hypotheses but argues that those restrictions exclude cases of scientific interest. I see no need for the restriction to structurally identical hypotheses because I am not bothered by the tacking paradox. I do see a need for the restriction to mutually exclusive hypotheses to block counterexamples like Fitelson’s. It is fortunate, then, that I find Steel’s argument for the claim that that restriction excludes cases of genuine scientific interest unconvincing.

[1] A frequentist would say “parameters” rather than “random variables.”

The Law of Likelihood (LL) says that datum *x* favors hypothesis *H*_{1} over *H*_{2 }if and only if the likelihood ratio *k*=Pr(*x*;*H*_{1})/Pr(*x*;*H*_{2}) is greater than 1, with *k* measuring the degree of favoring. Fitelson (2007) offers the following as a counterexample to the Law of Likelihood:

…we’re going to draw a single card from a standard (well-shuffled) deck…. E=the card is a spade, H

_{1}=the card is the ace of spades, and H_{2}=the card is black. In this example… P(E|H_{1})=1>Pr(E|H_{2})=1/2, but it seems absurd to claim that E favors H_{1}over H_{2}, as is implied by the (LL). After all,E guarantees the truth of H, but E provides only non-conclusive evidence for the truth of H_{2}_{1}.

I agree with Fitelson that it seems absurd to claim that E favors H_{1} over H_{2} in this case. However, it also seems odd to speak in any way of evidence favoring one hypothesis over another when those hypotheses are not mutually exclusive. The Law of Likelihood is supposed to address questions about “what the evidence says about the competition between two hypotheses“ (Sober 2008, 34), but hypotheses that are not mutually exclusive are not truly in competition with one another. Thus, a natural response to this counterexample is to stipulate that the Law of Likelihood only applies to pairs of hypotheses that are mutually exclusive. There is no threat of a slightly modified version of the counterexample with mutually exclusive hypotheses: if H_{1} and H_{2} are mutually exclusive, then E cannot guarantee the truth of H_{2} if it has nonzero probability on H_{1} (assuming that nothing else guarantees the falsity of H_{1}).

One reason the Law of Likelihood is usually stated without reference to a restriction to mutually exclusive hypotheses is that statisticians typically have in mind *statistical *hypotheses to the effect that some observable random variable X has a particular (objective) probability distribution. Two such hypotheses are automatically mutually exclusive if they are distinct.[1]

The hypotheses in the example, by contrast, are *substantive* hypotheses that generate probability distributions by conditioning but are not themselves probability distributions. I see no objection to extending the Law of Likelihood to substantive as well as statistical hypotheses provided that the substantive hypotheses are mutually exclusive.

One might think that requiring mutually exclusive hypotheses is unduly restrictive because scientists often test hypotheses that are not mutually exclusive against one another. For instance, according to Machery (2013), in Greene et al. (2001), Greene and his colleagues are best understood as using a likelihoodist methodology to test the following hypotheses against one another:

H

_{1}: People respond differently to moral-personal and moral-impersonal dilemmas because the former elicit more emotional processing than the latter.H

_{2}: People respond differently to moral-personal and moral-impersonal dilemmas because the single moral rule that is applied to both kinds of dilemmas (for example, the doctrine of double effect) yields different permissibility judgments.

H_{1} and H_{2} thus stated are highly ambiguous. There are plausible ways of fleshing them out that make them mutually consistent. For instance, one could understand H_{1} as the claim that the true causal graph for the set of variables {Dilemma Type [personal, impersonal], Emotional Processing Elicited [0-10], Judgment Type [consequentialist, deontological]}, say, has arrows from Dilemma Type to Emotional Processing Elicited to Judgment Type, and similarly for H_{2}. There are other ways of fleshing them out that make them mutually exclusive. For instance, one could understand H_{1} as asserting that appealing to differences in the amount of emotional processing elicited will allow one to account for a wider range of possible experiments concerning differences in responses to moral-personal and moral-impersonal dilemmas in a more satisfactory way than appealing to moral rules, and understand H_{2} to assert the opposite.

My claim is that for any disambiguations H_{1}’ and H_{2}’ of H_{1} and H_{2}, respectively, it makes sense to talk about testing H_{1}’ against H_{2}’ only if H_{1}’ and H_{2}’ are mutually exclusive. Perhaps it would be best to think of H_{1} and H_{2} themselves not as hypotheses, but rather as merely programmatic expressions of the stances of their respective research programs; and to think of particular experiments in this domain not as testing a disambiguation of H_{1} against a disambiguation of H_{2} directly, but rather as testing a specific hypothesis in the spirit of H_{1} against an incompatible hypothesis in the spirit of H_{2}. H_{1} and H_{2} themselves are not tested against one another directly, but rather judged by their fruitfulness, empirical adequacy, and so on, in the light of many such experiments.

In any case, my claim that the Law of Likelihood does not apply to H_{1} and H_{2} themselves does not seem to be in serious conflict with Machery’s treatment of Greene et al.’s work. I would merely add to Machery’s treatment that, properly speaking, what Greene et al. in fact test against one another are not H_{1} and H_{2} themselves but rather more specific, incompatible pairs of hypotheses that are “affiliated with” H_{1} and H_{2}, respectively, in possibly various ways. I conjecture that a similar treatment will be possible whenever scientists are doing something sensible that looks like testing non-mutually-exclusive hypotheses against one another, and thus that restricting the scope of the Law of Likelihood in the way I suggest will not limit its applicability.

Unsurprisingly, I am not the first to propose restricting LL to mutually exclusive hypotheses. I will consider previous presentations of this proposal and Fitelson’s objections to it in subsequent posts.

[1] Probability density functions that differ only on sets of measure zero are mathematically distinct but “empirically compatible” in the sense that they imply all the same probabilities for observable events. Such probability density functions are not distinct in the sense that matters for statistical practice.

I argued in this post that likelihoodism fails to provide a viable alternative to Bayesian and frequentist methodologies because likelihoodists have not provided a way to use likelihood functions for purposes of inference or decision that has an attractive justification yet lies outside both Bayesian and frequentist frameworks.

A possible likelihoodist response to this argument is that likelihoodism provides a *ceteris paribus* norm of inference and decision: all else being equal, one should prefer H_{1} to H_{2} upon learning x if and only if the Law of Likelihood says that x favors H_{1} over H_{2} (i.e., Pr(x;H_{1})/ Pr(x;H_{2})>1).

An obvious objection to this claim is that it is extremely vague. Moreover, the obvious way to precisify it is to spell out “all else being equal” in terms of prior probabilities and perhaps utilities in a way that a Bayesian would accept, which would not validate likelihood as a genuine alternative to Bayesianism.

There is another objection to this claim that is less obvious but perhaps more conclusive: if having background knowledge and utilities that are symmetric with respect to H_{1} and H_{2} is sufficient to satisfy the *ceteris paribus* clause, then *ceteris paribus* likelihoodism implies that one should be indifferent between a pair of estimators one of which dominates the other. For there is hypothetical scenario in which for any value x of some random variable X whose probability distribution is parameterized by θ, one’s background knowledge and utilities are completely symmetric with respect to the hypotheses θ=θ*(x) and θ=θ’(x), and Pr(x;θ*(x))/ Pr(x;θ’(x))=1, yet θ*(X) dominates θ’(X).

One could claim in response that being indifferent between the estimates θ*(x) and θ’(x) of θ for all values x of X is not the same as being indifferent between the estimators θ*(X) and θ’(X), but those two attitudes are indistinguishable behaviorally and have the same bad pragmatic consequences.

A hypothetical scenario in which this phenomenon arises is Stone’s (1976) “Flatland” example. The gist of the example is as follows (see this blog post by Larry Wasserman and Stone’s paper for more details). A sailor takes a number of steps along a two-dimensional grid, buries a treasure, takes one more step in a direction determined by the outcome of a roll of a fair four-sided die, and then dies. He carried with him a string that he kept taut. One’s datum x is the path of that string. The parameter one wishes to estimate is the location of the treasure θ. Pr(x;θ)=1/4 for θ one step north, south, east or west of the end of the string and 0 for all other values of θ. Thus, the Law of Likelihood implies that x is neutral between the hypothesis θ*(x) that θ is one step back along the path and the hypothesis θ’(x) that θ is one step forward along the path in the direction of the final step. In addition, one’s background knowledge is symmetric with respect to θ*(X) and θ’(X) because the scenario says nothing about how θ is generated, and we can simply stipulate that one’s utilities are symmetric with respect to θ*(X) and θ’(X), say payoff 0 or 1 according to whether one’s estimate is true or not. However, for any sequence of θs (random or nonrandom), the estimator θ*(X) which says that θ is one step back along the path gets the right answer ¾ of the time in the long run, while the estimator θ’(X) that θ is one step forward along the path in the direction of the final step gets the right answer ¼ of the time in the long run: Pr(θ*(X)=θ;θ)=3/4, while Pr(θ’(X)=θ;θ)=1/4.

The argument just given does not work against orthodox (countably additive) Bayesianism because orthodox Bayesians must give higher posterior to θ*(x) than to θ’(x) for some possible values x of X. Thus, it does not work against the Law of Likelihood understood simply as the claim that it is appropriate to use the phrase “the degree to which *x* favors *H*_{1} over *H*_{2}” for the ratio of the posterior odds of *H*_{1} to *H*_{2} given *x* to the prior odds of *H*_{1} to *H*_{2} given *x*. Of course, the acceptability of the Law of Likelihood understood in that way does nothing to vindicate likelihoodism as a genuine alternative to Bayesianism.

The weakness of this argument is that examples like “Flatland” require infinite sample spaces. Real sample spaces (unlike our idealized models of them) are finite. Even if nature is continuous and/or unbounded, our measuring instruments have finite precision and range. Thus, a likelihoodist could reasonably claim that the Law of Likelihood is a *ceteris paribus* norm of inference for data from any experiment that we could actually perform. The fact that this claim encounters difficulties in idealized thought experiments is irrelevant to practical methodology.