The Many Worlds of Probability, Reality and Cognition: Chapter 2

Insufficient reason
Purpose of probability numbers
Overview of key developments on probability
On measure theory

Footnotes
https://manyworlds784.blogspot.com/p/footnotes.html

Insufficient reason
A modern "insufficient reason" scenario:

Alice tells you that she will forward an email from Bob, Christine or Dan within the next five minutes. In terms of prediction, your knowledge can then be summarized as p(X = 1/3) where X = B,C,D. Whether Alice has randomized the order is unknown to you and so to you any potential permutation is just as good as any other. The state of your knowledge is encapsulated by the number 1/3. In effect, you are assuming that Alice has used a randomization procedure that rules out "common" permutations, such as BCD, though you can argue that you are assuming nothing. This holds on the 0th trial.

In this sort of scenario, it seems justifiable to employ the concept of equiprobability, which is a word reflecting minimum knowledge. We needn't worry about what Alice is or isn't doing when we aren't looking. We needn't worry about a hidden influence yielding a specific bias. All irrelevant in such a case (and here I am ignoring certain issues in physics that are addressed in the noumena sections (Part VI, see sidebar) and in Toward).

We have here done an exercise in classical probability and can see how the principle of insufficient reason (called by John Maynard Keynes "the principle of indifference") is taken for granted as a means of describing partial, and, one might say, subjective knowledge. We can say that in this scenario, randomness4 is operative. (Keynes agreed with German scientists that this principle contains an inconsistency, though I suggest the problem was inadequate definition of parameters. See Appendix A on Keynes and an incongruity .)

However, once one trial occurs, we may wish to do the test again. It is then that "maximum entropy" or well-shuffling may be called for. Of course, before the second trial, you have no idea whether Alice has changed the order. If you have no idea whether she has changed the permutation, then you may wish to look for a pattern that discloses a "tendency" toward nonrandom shuffling. This is where Simon Laplace steps in with his controversial rule of succession, which is intended as a means of determining whether an observed string is nonrandom; there are of course other, modern tests for nonrandomness.

If, however, Alice tells you that each trial is to be independent, then you are accepting her word that an effective randomization procedure is at work. We now enter the realm of the frequentist. Nonparametric tests or perhaps the rule of succession can test the credibility of her assertion. It is here -- in the frequentist doctrine -- where the "law of large numbers" enters the picture. So here we suggest randomness1 and perhaps randomness2; randomness3 concerning quantum effects also applies, but is generally neglected.

We should add that there is some possibility that, despite Alice's choice of permutation, the emails will arrive in an order different from her choice. There are also the possibilities that either the sender's address and name are garbled or that no message arrives at all. Now these last possibilities concern physical actions over which neither sender nor receiver has much control. So one might argue that the subjective "principle of insufficient reason" doesn't apply here. On the other hand, in the main scenario, we can agree that not only is there insufficient reason to assign anything but 1/3 to any outcome, but that also our knowledge is so limited that we don't even know whether there is a bias toward a "common" permutation, such as BCD.

Thus, application of the principle of insufficient reason in a frequentist scenario requires some care. In fact, it has been vigorously argued that this principle and its associated subjectivism entails flawed reasoning, and that only the frequentist doctrine is correct for science.

We can think of the classical probability idea as the 0th trial of a frequency scenario, in which no degree of expertise is required to obtain the number 1/3 as reflecting the chance that you will guess right as to the first email.

Purpose of probability numbers
Though it is possible, and has been done, to sever probability conceptions from their roots in the decision-making process, most of us have little patience with such abstractions, though perhaps a logician might find such an effort worthwhile. However, we start from the position that the purpose of a probability assignment is to determine a preferred course of action. So there may be two courses of action, and we wish to know which, on the available evidence, is the better. Hence what is wanted is an equality or an inequality. We estimate that we are better off following Course A (such as crediting some statement as plausible) than we are if we follow course B (perhaps we have little faith in a second statement's plausibility). We are thus ordering or ranking the two proposed courses of action, and plan to make a decision based on the ranking.

This ranking of proposed actions is often termed the probability of a particular outcome, such as success or failure. The ranking may be made by an expert, giving her degrees of confidence, or it may be made by recourse to the proportions of classical probability, or the frequency ratios of repeated trials of an "equivalent" experiment. However, probability is some process of ranking, or prioritizing, potential courses of action. (We even have meta-analyses, in which the degrees of several experts are averaged.) Even in textbook drills, this purpose of ranking is implied. Many think that it is "obvious" that the frequency method has a built-in objectivity and that observer bias occurs when insufficient care has been taken to screen it out. Hence, as it is seemingly possible to screen out observer bias, what remains of the experiment must be objective. And yet that claim is open to criticism, not least of which is how a frequency ratio is defined and established.

In his 1955 paper, How Does the Brain Do Plausible Reasoning?, the physicist E.T. Jaynes wrote:

If a communications engineer says, "The statistical properties of the message and noise are known," he means only that he has some knowledge about the past behavior of some particular set of messages and some particular sample of noise. When he infers that some of these properties will hold also in the future and designs a communication system accordingly, he is making a subjective judgment of exactly the type accounted for in Laplacian theory, and the sole purpose of statistical analysis of past events was to obtain that subjective judgment.

What we really have here is the old subjectivity/objectivity bugbear of philosophers and quantum physicists. It just turns up everywhere.

In any case, we must not overlook the concept of risk, to which we turn our attention. Gerd Gigerenzer in his Calculated Risks (9) presents a number of cases in which medical professionals make bad decisions based on what Gigerenzer sees as misleading statements of risk.

He cites a 1995 Scotland coronary prevention study press release, which claimed that:

"People with high cholesterol can rapidly reduce" their risk of death by 22% by taking a specific drug.
Of 1,000 people with high cholesterol who took the drug over a five-year period, 32 died.
Of 1,000 people who took a placebo over the five-year period, 41 died.

There are three ways, says Gigeranzer, to present the benefit:

1. Absolute risk reduction.

  (41-32)
   1000 = 0.9 %

2. Relative risk reduction.

(Absolute risk reduction)
(Number who die who haven't been treated) = 9/41 = 22%

3. Number needed to treat (NNT). Number of people who must participate in a treatment in order to save one life. In this case: 9/1000 =~ 1/111, meaning 111 people is the minimum.

When such reasoning was used to encourage younger women to get breast X-rays, the result was an imposition of excessive anxiety, along with radiation risk, on women without a sufficient reason, Gigerenzer writes.

John Allen Paulos on the mammogram controversy
http://www.nytimes.com/2009/12/13/magazine/13Fob-wwln-t.html?_r=1&emc=tnt&tntemail1=y#

New knowledge may affect one's estimate of a probability. But how is this new knowledge rated? Suppose you and another are to flip a coin over some benefit. The other provides a coin, and you initially estimate your chance of winning at 1/2, but then another person blurts out: "The coin is loaded."

You may now have doubts about the validity of your estimate, as well as doubt about whether, if the claim is true, the coin is mildly or strongly biased. So, whatever you do, you will use some process of estimation which may be quite appropriate but which might not be easily quantifiable.

And so one may say that the purpose of a probability ranking is to provide a "subjective" means of deciding on a course of action in "objective reality."

Eminent minds have favored the subjectivist viewpoint. For example, Frank P. Ramsey (10) proposed that probability theory represent a "logic of personal beliefs" and notes: "The degree of a belief is just like a time interval [in relativity theory]; it has no precise meaning unless we specify more exactly how it is to be measured."

In addressing this problem, Ramsey cites Mohs' scale of hardness, in which 10 is arbitrarily assigned to a diamond, etc. Using a psychological perspective, Ramsey rates degrees of belief by scale of intensity of feeling, while granting that no one feels strongly about things he takes for granted [unless challenged, we add]. And, though critical of Keynes's A Treatise on Probability, Ramsey observed that all logicians -- Keynes included -- supported the degrees of belief viewpoint whereas statisticians in his time generally supported a frequency theory outlook.

Yet, as Popper points out, "Probability in physics cannot be merely an indication of 'degrees of belief,' for some degrees lead to physically wrong results, whereas others do not" (11).

Overview of key developments on probability
We begin with the track of probability as related to decisions to be made in wagering during finite games of chance. Classical probability says we calculate an estimate based on proportions. The assumption is that the urn's content is "well mixed" or the card deck "well shuffled."

What the classical probabilist is saying is that we can do something with this information, even though the information of the system is incomplete. The classical probabilist had to assume maximum information entropy with proper shuffling, even though this concept had not been developed. Similarly, the "law of large numbers" is implicit. Otherwise, why keep playing a gambling game?

We show in the discussion on entropy below that maximum entropy means loss of all memory or information that would show how to find an output value in say, less than n steps, where n is the number of steps (or, better, bits) in the shuffling algorithm. We might say this maximum Kolmogorov-Chaitin entropy amounts to de facto deterministic irreversibility.

That is, the memory loss or maximum entropy is equivalent to effective "randomization" of card order. Memory loss is implicit, though not specified in Shannon information theory -- though one can of course assert that digital memory systems are subject to the Second Law of Thermodynamics.

But even if the deck is highly ordered when presented to an observer, as long as the observer doesn't know that, his initial probability estimate, if he is to make a bet, must assume well-shuffling, as he has no reason to suspect a specific permutation.

Despite the fact that the "law of large numbers" is implicit in classical thinking, there is no explicit statement of it prior to Jacob Bernoulli and some may well claim that classical probability is not a "frequentist" theory.

Yet it is only a short leap from the classical to the frequentist conception. Consider the urn model, with say 3 white balls and 2 black balls. One may have repeated draws, with replacement, from one urn. Or one may have one simultaneous draw from five urns that each have either a black or white ball. In the first case, we have a serial, or frequentist, outlook. In the second, we have a simple proportion as described by the classical outlook.

The probability of two heads in a row is 1/4, as shown by the table.

   HH
   TH
   HT
   TT

Now suppose we have an urn in which we place two balls and specify that there may be 0 to 2 black balls and 0 to 2 white balls. This is the same as having 4 urns, with these contents:

   BB
   BW
   WB
   WW

One then is presented with an urn and asked the probability it holds 2 black balls. The answer is 1/4.

Though that result is trivial, the comparison underscores how classical and frequentist probability are intertwined.

So one might argue that the law of large numbers is the mapping of classical probabilities onto a time interval. But if so, then classical probability sees the possibility of maximum entropy as axiomatic (not that early probabilists necessarily thought in those terms). In classical terms, one can see that maximum entropy is equivalent to the principle of insufficient reason, which to most of us seems quite plausible. That is, if I hide the various combinations of balls in four urns and don't let you see me doing the mixing, then your knowledge is incomplete, but sufficient to know that you have one chance in four of being right.

But, one quickly adds, what does that information mean? What can you do with that sort of information as you get along in life? It is here that, if you believe in blind chance, you turn to the "law" of large numbers. You are confident that if you are given the opportunity to choose over many trials, your guess will turn out to have been right about 25% of the time.

So one can say that, at least intuitively, it seems reasonable that many trials, with replacement, will tend to verify the classically derived probability, which becomes the asymptotic limiting value associated with the law of large numbers. Inherent in the intuition behind this "law" is the notion that hidden influences are random -- if we mean by random that over many trials, these influences tend to cancel each other out, analogous to the fact that Earth is essentially neutrally charged because atoms are, over the large, randomly oriented with respect to one another, meaning that the ionic charges virtually cancel out. Notice the circularity problem in that description.

Nevertheless, one could say that we may decide on, say 10 trials of flipping a coin, and assume we'd be right on about 50% of guesses -- as we have as yet insufficient reason to believe the coin is biased. Now consider 10 urns, each of which contains a coin that is either head up or tail up. So your knowledge is encapsulated by the maximum ignorance in this scenario. If asked to bet on the first draw, the best you can do is say you have a 50% chance of being right (as discussed a bit more below). The notion that, on average, hidden force vectors in a probabilistic scenario cancel out might seem valid if one holds to conservation laws, which imply symmetries, but I am uncertain on this point; it is possible that Noether's theorem applies.

So it ought to be apparent that Bernoulli's frequency ideas were based on the principle of insufficient reason, or "subjective" ignorance. The purpose of the early probabilists was to extract information from a deterministic system, some of whose determinants were unknown. In that period, their work was frowned upon because of the belief that the drawing of lots should be reserved for specific moral purposes, such as the settling of an argument or discernment of God's will on a specific course of action. Gambling, though a popular pastime, was considered sinful because God's will was being ignored by the gamers.

Such a view reflects the Mosaic edict that Israel take no census, which means this: To take a census is to count one's military strength, which implies an estimate of the probability of winning a battle or a war. But the Israelis were to trust wholly in their God for victory, and not go into battle without his authorization. Once given the word to proceed, they were to entertain no doubt. This view is contrary to the custom of modern man, who, despite religious beliefs, assesses outcomes probabilistically, though usually without precise quantification.

To modern ears, the idea that probabilities are based on a philosophy that ejects divine providence from certain situations sounds quite strange. And yet, if there is a god, should we expect that such a being leaves certain things to chance? Does blind chance exist, or is that one of the many illusions to which we humans are prone? (I caution that I am not attempting to prove the existence of a deity or to give a probability to that supposition.)

In classical and early frequentist approaches, the "maximum entropy" or well-mixing concept was implicitly assumed. And yet, as Ludwig Boltzmann and Claude Shannon showed, one can think of degrees of entropy that are amenable to calculation.

Von Mises has been called the inventor of modern frequentism, which he tried to put on a firm footing by making axioms of the law of large numbers and of the existence of randomness, by which he meant that, over time, no one could "beat the odds" in a properly arranged gambling system.

The Von Mises axioms

1. The axiom of convergence: "As a sequence of trials is extended, the proportion of favorable outcomes tends toward a definite mathematical limit."

2. The axiom of randomness: "The limiting value of the relative frequency must be the same for all possible infinite subsequences of trials chosen solely by a rule of place selection within the sequence (i.e., the outcomes must be randomly distributed among the trials)."

Alonzo Church on the random sequences of Von Mises
http://www.elemenat.com/docs/vonMisesKollektiv.pdf

Of course, the immediate objection is that declaring axioms does not necessarily mean that reality agrees. Our collective experience is that reality does often seem to be in accord with Von Mises's axioms. And yet, one cannot say that science rests on a testable foundation, even if nearly all scientists accept these axioms. In fact, it is possible that these axioms are not fully in accord with reality and only work within limited spheres. A case in point is Euclid's parallel postulate, which may not hold at the cosmic scale. In fact, the counterintuitive possibilities for Riemann space demonstrate that axioms agreed to by "sensible" scientists of the Newtonian mold are not necessarily the "concrete fact" they were held to be.

Consider the old syllogism:

1. All men are mortal.
2. Socrates is a man.
3. Hence, Socrates is mortal.

It is assumed that all men are mortal. But suppose in fact 92% of men are mortal. Then conclusion 3 is also not certain, but only rather probable.

Following in David Hume's track, we must concede to having no way to prove statement 1, as there might be some exception that we don't know about. When we say that "all men are mortal," we are relying on our overwhelming shared experience, with scientists proceeding on the assumption that statement 1 is self-evidently true.

If we bring the study of biology into the analysis, we might say that the probability that statement 1 holds is buttressed by both observational and theoretical work. So we would assign the system of members of the human species a propensity of virtually 1 as to mortality. That is, we take into account the systemic and observational evidence in assigning an a priori probability.

And yet, though a frequency model is implicit in statement 1, we cannot altogether rule out an exception, not having the power of perfect prediction. Thus, we are compelled to accept a degree of confidence or degree of belief. How is this to be arrived at? A rough approximation might be to posit a frequency of less than 1 in 7 billion, but that would say that the destiny of everyone alive on Earth today is known. We might match a week's worth of death records over some wide population against a week's worth of birth records in order to justify a statement about the probability of mortality. But that isn't much of a gain. We might as well simply say the probability of universal mortality is held to be so close to 1 as to be accepted as 1.

The difficulties with the relative frequency notion of probability are well summarized by Hermann Weyl (13). Weyl noted that Jacob Bernoulli's earlier parts of his Ars Conjectandi were sprinkled with words connoting subjective ideas, such as "hope" and "expectation." However, in the fourth part of that book, Bernoulli introduces the seemingly objective "law of large numbers," which he established with a mathematical proof. However, says Weyl, the logical basis for that law has remained murky ever since.

Wikipedia article on 'Ars Conjectandi'
https://en.wikipedia.org/wiki/Ars_Conjectandi

Yes, true that Laplace emphasized the aspect of probability with the classical quantitative definition: the quotient of the number of favorable cases over the number of all possible cases, says Weyl. "Yet this definition presupposes explicitly that the different cases are equally possible. Thus, it contains as an aprioristic basis a quantitative comparison of possibilities."

The conundrum of objectivity is underscored by the successful use of inferential physics in the hard and soft sciences, in the insurance business and in industry in general, Weyl points out.

Yet, if probability theory only concerns relative frequencies, we run into a major problem, Weyl argues. Should we not base this frequency interpretation directly on trial series inherent in the law of large numbers? We might say the limiting value is reached as the number of trials increases "indefinitely." But even so it is hard to avoid the fact that we are introducing "the impossible fiction of an infinity of trials having actually been conducted." He adds, "Moreover, one thereby transcends the content of the probability statement. Inasmuch as agreement between relative frequency and probability p is predicted for such a trial series with 'a probability approaching certainty indefinitely,' it is asserted that every series of trials conducted under the same conditions will lead to the same frequency value."

The problem, as Weyl sees it, is that if one favors "strict causality," then the methods of statistical inference must find a "proper foundation in the reduction to strict law" but, this ideal seems to run into the limit of partial acausality at the quantum level. Weyl thought that perhaps physics could put statistical inference on a firm footing, giving the physical example of equidistribution of gas molecules, based on the notion that forces among molecules are safe to ignore in that they tend to cancel out. But here the assumption behind this specimen of the law of large numbers has a physical basis -- namely Newtonian physics, which, in our terms, provides the propensity information that favors the equiprobabilities inherent in equidistribution.

However, I do not concede that this example proves anything. Really, the kinetic gas theories of Maxwell, Boltzmann and Gibbs tend to assert the Newtonian mechanics theory, but are based on the rough and ready relative-frequency empirico-inductive perception apparatus used by human beings and other mammals.

How does one talk about frequencies for infinite sets? In classical mechanics, a molecule's net force vector might point in any direction and so the probability of any specific direction equals zero, leading Weyl to remark that in such continuous cases one can understand why measure theory was developed.

On measure theory
https://files.nyu.edu/eo1/public/Book-PDF/pChapterBBB.pdf

Two remarks:

1. In fact, the force vector's possible directions are limited by Planck's constant, meaning we have a large population of discrete probabilities which can very often be treated as an infinite set.

2. Philosophically, one may agree with Newton and construe an infinitesimal as a discrete unit that exists in a different realm than that of the reals. We see a strong echo of this viewpoint in Cantor's cardinal numbers representing different orders of infinity.

An important development around the turn of the 19th century was the emergence of probabilistic methods of dealing with error in observation and measurement. How does one construct a "good fit" curve from observations which contain seemingly random errors? By "random" an observer means that he has insufficient information to pinpoint the source of the error, or that its source isn't worth the bother of determining. (The word "random" need not be used only in this sense.)

The binomial probability formula is simply a way of expressing possible proportions using combinatorial methods; it is a logical tool for both classical and frequentist calculations.

Now this formula (function) can be mapped onto a Cartesian grid. What it is saying is that finite probabilities are highest for sets with the highest finite numbers of elements, or permutations. As a simple example, consider a coin-toss experiment. Five flips yields j heads and k tails, where j or k = 0 to 5.

This gives the binomial result:

₅C₅ = 1, ₅C₄ = 5, ₅C₃ = 10, ₅C₂ = 10, ₅C₁ = 5, ₅C₀ = 1.

You can, obviously, visualize a symmetrical graph with two center bars of 10 units high flanked on both sides by bars of diminishing height.

Now if the probability of p = 1/2 and q = 1/2 (the probability of occurrence and the probability of non-occurrence), we get the symmetrical graph:

1/32, 5/32, 10/32, 10/32, 5/32, 1/32

We see here that there are 10 permutations with 3 heads and 2 tails and 10 with 3 tails and 2 heads in which chance of success or failure is equal. So, if you are asked to bet on the number of heads turning up in 5 tosses, you should -- assuming some form of randomness -- choose either 3 or 2.

Clearly, sticking with the binomial case, there is no reason not to let the number of notional tosses go to infinity, in which case every specific probability reduces to zero. Letting the binomial graph go to infinity gives us the Gaussian normal curve. The normal curve is useful because calculational methods have been worked out that make it more convenient than binomial (or multinomial) probability calculation. And, it turns out that as n increases in the binomial case, probabilities that arise from situations where replacement is logically required are nevertheless well approximated by probabilities arising with the no-replacement assumption. [Please see Appendix E for a proof that the set of finite samples of a population is normally distributed.]

So binomial probabilities are quite well represented by the Gaussian curve when n is large enough. Note that, implicitly, we are assuming "well mixing" or maximum entropy.

So the difference between the mean and the next unit shrinks with n.

₅₀C₂₅/₅₁C₂₅ = 0.509083922

and if we let n run to infinity, that ratio goes exactly to 0.5.

So it made sense, as a useful calculational tool, to use the Gaussian curve, where n runs to infinity.

Yet one should beware carelessly assuming that such a distribution is some form of "objective" representation of reality. As long as no one is able to fully define the word "random" in whatever aspect, then no one can say that the normal curve serves as a viable approximate representation of some arena of reality. Obviously, however, that distribution has proved to be immensely productive in certain areas of science and industry -- though one should not fail to appreciate its history of misuse. At any rate, a great advantage of the normal curve is that it so well represents the binomial distribution.

Certainly, we have here an elegant simplification, based on the assumption of well-mixing or maximum entropy. As long as we use the normal distribution to approximate the possibilities for a finite experiment, that simplification will be accepted by many as reasonable. But if the bell curve is meant to represent some urn containing an infinitude of potential events, then the concept of normal distribution becomes problematic. That is, any finite cluster of, say, heads, can come up an infinity of times. We can say our probability of witnessing such a cluster is low, but how do we ensure well mixing to make sure that that belief holds? If we return to the urn model, how could we ensure maximally entropic shuffling of an infinitude of black and possibly white balls? We have no recourse but to appeal to an unverifiable Platonic ideal or perhaps to say that the principle of insufficient reason is, from the observer's perspective, tantamount to well mixing. (Curiously, the axiom of choice of Zermelo-Fraenkel set theory enters the picture here, whereby one axiomatically is able to obtain certain subsets of an infinitude.)

Keynes takes aim at the principle of indifference (or, in our terms, zero propensity information) in this passage:

"If, to take an example, we have no information whatever as to the area or population of the countries of the world, a man is as likely to be an inhabitant of Great Britain as of France, there being no reason to prefer one alternative to the other.

"He is also as likely to be an inhabitant of Ireland as of France. And on the same principle he is as likely to be an inhabitant of the British Isles as of France. And yet these conclusions are plainly inconsistent. For our ﬁrst two propositions together yield the conclusion that he is twice as likely to be an inhabitant of the British Isles as of France. Unless we argue, as I do not think we can, that the knowledge that the British Isles are composed of Great Britain and Ireland is a ground for supposing that a man is more likely to inhabit them than France, there is no way out of the contradiction. It is not plausible to maintain, when we are considering the relative populations of diﬀerent areas, that the number of names of subdivisions which are within our knowledge, is, in the absence of any evidence as to their size, a piece of relevant evidence.

"At any rate, many other similar examples could be invented, which would require a special explanation in each case; for the above is an instance of a perfectly general diﬃculty. The possible alternatives may be a, b, c, and d, and there may be no means of discriminating between them; but equally there may be no means of discriminating between (a or b), c, and d" (14).

Modern probability texts avoid this difficulty by appeal to set theory. One must properly define sets before probabilities can be assigned.

Two points:

1. For most purposes, no one would gain knowledge via applying probability rankings to Keynes's scenario. However, that doesn't mean no situation will ever arise when it is not worthwhile to apply probabilistic methods, though of course the vagueness of the sets makes probability estimates equally vague.

2. If we apply set theory, we are either using naive set theory, where assumptions are unstated, or axiomatic set theory, which rests on unprovable assertions. In the case of standard ZFC set theory, Goedel's incompleteness theorem means that the formalism is either incomplete or inconsistent. Further, it is not known whether ZFC is both incomplete and inconsistent.

Randomness4 arises when willful ignorance is imposed as a means of obtaining a classical form of probability, or of having insufficient reason to regard events as other than equiprobable.

Consider those exit polls that include late sampling, which are the only exit polls where it can be assumed that the sample set yields a quantity close to the ratio for the entire number of votes cast.

This is so, it is generally believed, because if the pollsters are doing their jobs properly, the pollster's selection of every nth person leaving a polling station screens out any tendency to select people who are "my kind."

In fact, the exit poll issue underscores an existential conundrum: suppose the exit poll ratio for candidate A is within some specified margin of error for a count of the entire vote. That is to say, with a fairly high number of ballots there is very likely to be occasional ballot-count errors, which, if random, will tend to cancel. But the level of confidence in the accuracy of the count may be only, say, 95%. If the exit poll has a low margin of error in the counting of votes -- perhaps the pollsters write down responses with 99% accuracy -- then one may find that the exit poll's accuracy is better than the accuracy of the entire ballot count.

A recount may only slightly improve the accuracy of the entire ballot count. Or it may not provably increase its accuracy at all, if the race is especially tight and the difference is within the margin of error for ballot counting.

A better idea might be to have several different exit polls conducted simultaneously with an average of results taken (the averaging might be weighted if some exit pollsters have a less reliable track record than others).

So as we see -- even without the theorems of Kurt Goedel and Alan Turing and without appeal to quantum phenomena -- some statements may be undecidable. It may be impossible to get definitive proof that candidate A won the election, though in most cases recounts, enough attention to sources of bias would possibly drastically alter the error probabilities. But even then, one can't be certain that in a very tight races the outcome wasn't rigged.

It is important to understand that when randomness4 is deployed, the assumption is that influences that would favor a bias tend to cancel out. In the case of an exit poll, it is assumed that voters tend to arrive at and leave the polling station randomly (or at least pseudorandomly). The minor forces affecting their order of exiting tend to cancel, it is believed, permitting confidence in a sample based on every nth voter's disclosure of her vote.

In another important development in probability thinking, Karl Popper in the mid-20th century proposed the propensity idea as a means of overcoming the issue of "subjectivity," especially in the arena of quantum mechanics. This idea says that physical systems have elementary propensities or tendencies to yield a particular proposition about some property. In his thinking, propensity is no more "occult" a notion than the notion of force. The propensity can be deduced because it is an a priori (a term Popper disdains) probability that is fundamental to the system. The propensity is as elementary a property as is the spin of an electron; it can't be further reduced or described in terms of undetected vibrations (though he didn't quite say that).

The propensity probability can be approximated via repeated trials, but applies immediately on the first trial. By this, Popper avoids the area of hidden variables and in effect quantizes probability, though he doesn't admit to having done so. What he meant to do was minimize quantum weirdness so as to save appearances, or that is, "external" reality.

Popper wasn't always clear as to what he meant by realism, but it is safe to assume he wanted the laws of physics to hold whether or not he was sleeping. Even so, he was forced to concede that it might be necessary to put up with David Bohm's interpretation of quantum behaviors, which purports to save realism only by sacrificing bilocality, and agreeing to "spooky action at a distance."

The notion of propensity may sometimes merge with standard ideas of statistical inference. Consider this passage from J.D. Stranathan's history of experimental physics:

"In the case of the Abraham theory the difference between the observed and calculated values of m/m_o are all positive; and these differences grow rather large at the higher velocities. On the Lorentz theory the differences are about as often positive as negative; the sum of the positive errors is nearly equal to the sum of negative ones. Furthermore, there is no indication that the error increases at the higher velocities. These facts indicate that the errors are not inherent in the theory; the Lorentz theory describes accurately the observed variation" (15).

Hendrik A. Lorentz was first to propose relativistic mass change for the electron only, an idea generalized by Einstein to apply to any sort of mass. Max Abraham clung to a non-relativistic ether theory.

Go to Chapter 3 HERE.

The Many Worlds of Probability, Reality and Cognition

Chapter 2

No comments:

Post a Comment

Chapter 10

Report Abuse