T = the vacuous proposition.
A probability function P on an algebra is an assignment of numbers to propositions such that:
Proof:
P(e|h) = P(e &
h)/P(h) [definition
of conditional probability].
P(e & h) = P(e|h)P(h)
[multiply both sides by P(h)].
P(h|e) = P(h
& e)/P(e) [definition
of conditional probability].
P(h|e) = P(e|h)P(h)/P(e)
[substitution].
Proof:
e = ORi (e & hi)
[logic]
P(ORi (e & hi)) =
SUMi P(e & hi) [second axiom of probability]
P(e|hi) = P(e & hi)/P(hi) [definition
of conditional probability]
P(e & hi) = P(e|hii)P(hi)
[multiply both sides by P(h)].
P(e) = P(ORi (e & hi))
= SUMi P(e & hi)
= SUMi P(e|hii)P(hi)
[preceding lines]
The preceding definition seems rather technical. The following recharacterization makes intuitive sense. Probabilistic independence is irrelevance: learning the one event wouldn't change your belief in the other.
Theorem: probabilistic independence is informational irrelevance:
e is independent of h if and only if P(h|e) = P(h).
Proof: Suppose P(h|e) = P(h). Then P(h & e) = P(h|e)P(e)
= P(h)P(e).
Suppose P(h & e) = P(h)P(e). Then P(h|e) = P(h & e)/P(e)
= P(h)P(e)/P(e) = P(h).
By Bayes' theorem, the new degree of belief in h after seeing e is
Prior probability of h = P(h). This may be quite subjective, reflecting a theory's initial "plausibility" prior to scientific investigation. This plausibility depends on such factors as intelligibility, simplicity, and whether the mechanism posited by the theory has been observed to operate elsewhere in nature (e.g., uniformitarian vs. catastrophist geology). In the 19th c. it was proposed that only causes observed to operate in nature could be invoked in new theories. This reflects prior probability.
Prior probability of e = P(e). This is subjective and very hard to specify. Using total probability,
P(h|e)/P(h'|e) = [P(h)/P(h')][P(e|h)P(e|h')].The ratio [P(h)/P(h')] is the prior ratio and the ratio [P(e|h)P(e|h')] is the likelihood ratio. Changes in relative probability between competing theories are governed entirely by the likelihood ratio, since the prior ratio is a fixed constant.
Refutation is fatal: If consistent e is inconsistent with h, then P(h|e) = 0.
Diminishing returns of repeated testing: Once P(e) is expected, by the preceding argument confirmation is reduced.
Strong explanations are good, initial plausibilities being similar: The ratio P(h|e)/P(h'|e) changes through time entirely as a function of the ratio of relative strength of explanation P(e|h)/P(e|h'), for
P(h1|e)/P(h2|e) = [P(h1)/P(h2)][P(e|h1)/P(e|h2)] = k[P(e|h1)/P(e|h2)].Unification is good, initial plausibilities being similar: A unified theory explains some regularity that the disunified theory does not. For example, Copernicus' theory entails that the total number of years must equal the total number of synodic periods + the total number of periods of revolution. To see this, suppose that data e, e' are independent a priori, so
P(e & e') = P(e)P(e').Now suppose that e and e' remain independent given h1 but are completely dependent given h2 so that
P(e & e'|h1) = P(e'|h1)P(e|h1) andSo
P(e & e'|h2) = P(e|h2).
P(h1|e & e')/P(h2|e & e') =Now there is no reason to suppose that P(e|h1), P(e'|h1), and P(e|h2) are high, so the disunified theory has to overcome the effect of a product of low numbers while the unified theory does not. The more disunified phenomena a theory unifies compared to a competitor, the bigger this advantage becomes (suppose the likelihoods are all less that .5. Then the degree of belief drops exponentially in the number of disunified phenomena.
[P(h1)/P(h2)][P(e & e'|h1)/P(e & e'|h2)] =
k[P(e & e'|h1)/P(e & e'|h2)] =
k[P(e|h1)P(e'|h1)/P(e|h2)].
Saying more lowers probability: h entails h' ==> P(h) < P(h').
Conflict turns explanatory strength into an asset: Didn't we just say that strong explanations are good??? That is true if the initial plausibilities are similar. But if one theory entails the other, they won't be. Thus, unification-style arguments only work if the competing theories are mutually contradictory!
Scientific method should not consider subjective, prior plausibilities. That's just the kind of blind, pre-paradigm science Kuhn ridicules as being sterile. Without prior plausibilities to guide inquiry, no useful experiments would ever be performed.
Priors should be flat. What is flat? If we are uncertain about the size of a cube, should we be indifferent about
It isn't clear that numbers like P(e) even exist. One can respond with a protocol for eliciting such numbers, but in practice it doesn't always work. One can say that the subjects are "irrational", but the audience can always blame Bayesianism instead of the subjects.
The old evidence problem. If e is already known, then P(h|e) = P(h) P(e|h)/P(e) = P(h). So old evidence never "confirms" a hypothesis.
Responses: