The Many Worlds of Probability, Reality and Cognition: Chapter 5

The empirico-inductive concept
More on induction
More on causality

Footnotes
https://manyworlds784.blogspot.com/p/footnotes.html

The empirico-inductive concept
On induction and Bayesian inference, John Maynard Keynes wrote:

To take an example, Pure Induction can be usefully employed to strengthen an argument if, after a certain number of instances have been examined, we have, from some other source, a finite probability in favour of the generalisation, and, assuming the generalisation is false, a finite uncertainty as to its conclusion being satisfied by the next hitherto unexamined instance which satisfies its premise.

He goes on to say that pure induction "can be used to support the generalisation that the sun will rise every morning for the next million years, provided that with the experience we have actually had there are finite probabilities, however small, derived from some other source, first, in favour of the generalisation, and, second, in favour of the sun's not rising to-morrow assuming the generalisation to be false," adding: "Given these finite probabilities, obtained otherwise, however small, then the probability can be strengthened and can tend to increase towards certainty by the mere multiplication of instances provided that these instances are so far distinct that they are not inferable one from another" (35).

Keynes's book is highly critical of a theory of scientific induction presented by a fellow economist, William Stanley Jevons, that uses Laplace's rule of succession as its basis. Keynes argues here that it is legitimate to update probabilities as new information arrives, but that -- paraphrasing him in our terms -- the propensity information and its attendant initial probabilities cannot be zero. In the last sentence in the quotation above, Keynes is giving the condition of independence, a condition that is often taken for granted, but that rests on assumptions about randomness that we will take a further look at as we proceed. Related to that is the assumption that two events have enough in common to be considered identical. However, this assumption must either be accepted as a primitive, or must be based on concepts used by physicists. We will question this viewpoint at a later point in our discussion. Other Bayesian probabilists, such as Harold Jeffreys, agree with Keynes on this.

One might say that the empirico-inductive approach, at its most unvarnished, assumes a zero or near-zero information value for the system's propensity. But such an approach yields experimental information that can then be used in probability calculations. A Bayesian algorithm for updating probabilities based on new information -- such as Laplace's rule of succession -- might or might not be gingerly accepted for a specific case, such as one of those posed by J. Richard Gott III.

Article on Gott
http://en.wikipedia.org/wiki/J._Richard_Gott

It depends on whether one accepts near-zero information for the propensity. How does one infer a degree of dependence without any knowledge of the propensity? If it is done with the rule of succession, we should be skeptical. In the case of sun risings, for all we know, the solar system -- remember, we are assuming essentially no propensity information -- is chaotic and only happens to be going through an interval of periodicity or quasi-periodicity (when trajectories cycle toward limit points -- as in attractors -- ever more closely). Maybe tomorrow a total eclipse will occur as the moon shifts relative position, or some large body arrives between the earth and sun. This seems preposterous, but only because in fact we are aware that the propensity information is non-zero; that is, we know something about gravity and have observed quite a lot about solar system dynamics.

But Gott argues that we are entitled to employ the Copernican principle in some low-information scenarios. The "information" in this principle says that there is no preferred orientation in space and time for human beings. With that in mind, the "doomsday argument" follows, whereby the human race is "most likely" to be about halfway through its course of existence, taking into account the current world population. We note that the doomsday argument has commonalities with Pascal's wager on the existence of God. That is, Pascal assigned a probability of 1/2 for existence and against existence based on the presumption of utter ignorance. Yet, how is this probability arrived at? There are no known frequencies available. Even if we use a uniform continuous distribution from 0 to 1, the prior isn't necessarily found there, system information being ruled out and expert opinion unwelcome. That is, how do we know that an initial probability exists at all?

The doomsday argument
http://en.wikipedia.org/wiki/Doomsday_argument

With respect to the doomsday scenario, there are frequencies, in terms of average lifespan of an arbitrary species (about a million years), but these are not taken into account. The fact is that the doomsday scenario's system "information" that we occupy no privileged spacetime position is, being a principle, an assumption taken as an article of faith. If we accept that article, then we may say that we are less likely to be living near the beginning or ending of the timeline of our species, just as when one arrives at a bus stop without a schedule, one expects that one probably won't have to wait the maximum possible time interval, or something near the minimum, between buses. Similarly, we hazard this guess based on the idea that there is a low-level of interaction between one's brain and the bus's appearance, or based on the idea that some greater power isn't "behind the scenes" controlling the appearance of buses. If one says such ideas are patently absurd, read on, especially the sections on noumena (Part VI, see sidebar).

Another point: an inference in the next-bus scenario is that we could actually conduct an experiment in which Bus 8102 is scheduled to arrive at Main and 7th streets at 10 minutes after the hour and a set of random integers in [0,60] is printed out; the experimenter then puts his "arrival time" as that randomly selected number of minutes past the hour; these numbers are matched against when the 8102 actually arrives, the results compiled and the arithmetic mean taken. Over a sufficient number of tests, the law of large numbers suggests that the average waiting time is a half hour. Because we know already that we can anticipate that result, we don't find it necessary to actually run the trials. So, is that Bayesianism or idealistic frequentism, using imaginary trials?

In the case of the plausibility that we are about halfway through the lifespan of our species, it is hard to imagine even a fictional frequency scenario. Suppose we have somehow managed to obtain the entire population of Homo sapiens sapiens who have ever lived or who ever will live. From that finite set, a person is chosen randomly, or as close to randomly as we can get. What is the probability our choice will come from an early period in species history? There is no difference between that probability and the probability our choice came from about the halfway mark. Of course, we hasten to concede that such reasoning doesn't yield much actionable knowledge. What can anyone do with a probability-based assessment that the human species will be extinct in X number of years, if X exceeds one's anticipated lifetime?

Concerning imaginary trials, E.T. Jaynes (36), a physicist and Bayesian crusader, chided standard statistics practitioners for extolling objectivity while using fictional frequencies and trials. A case in point, I suggest, is the probability in a coin toss experiment that the first head will show up on an odd-numbered flip. The probability is obtained by summing all possibilities to infinity, giving an infinite series limit of 2/3. That is Σ a_i = Σ 1/2_i = 2/3 as n goes infinite. That probability isn't reached after an infinitude of tosses, however. It applies immediately. And one would expect that a series of experiments would tend toward the 2/3 limit. However, such a set of experiments is rarely done. The sum is found by use of the plus sign to imply the logical relation "exclusive or." The idea is that experiments with coins have been done, and so independence has been well enough established to permit us to make such a calculation without actually doing experiments to see whether the law of large numbers will validate the 2/3 result. That is, we say that the 2/3 result logically follows if the basic concept of independence has been established for coin tossing in general.

Jaynes was an ardent backer of a "common sense" view of probability theory, which he tried to establish using a symbolic logic. [See Appendix B: Jaynes and 'common sense.']

Concerning the doomsday argument, we note that in the last few hundred years the human population has been increasing exponentially. Prior to that, however, its numbers went up and down in accord with Malthusian population dynamics as indicated by the logistic differential equation, which sets a threshold below which a population almost certainly goes extinct. That harsh Malthusian reality becomes ever more likely as the global population pushes the limits of sustainability. So this tends to rule out some normally distributed population over time -- whereby the current population is near the middle and low population tails are found in past and future -- simply because the population at some point may well return from its current exponential distribution to a jaggedly charted curve reminiscent of stock market charts.

The bus stop scenario can serve to illustrate our expectation that random events tend to cancel each other out on either side of a mean. That is, we expect that the randomly chosen numbers below the mean of 30 to show low correlation with the randomly chosen numbers above the mean, but that nevertheless we think that their average will be 30. This is tantamount to defining the interval [-30,30] where 0 represents the median 30 in the interval [0,60]; we suspect that if we add all the randomly chosen numbers the sum will be close to zero. Why do "randoms" tend to cancel? Elsewhere, we look into "hidden variables" views.

Of course, our various conceptions of randomness and our belief in use of such randomness as a basis for predictions would be greatly undermined by even one strong counterexample, such as an individual with a strong "gift" of telekinesis who is able to influence the computer's number selection. At this point, I am definitely not proposing that such a person exists. However, if one did come to the attention of those whose mooring posts are the tenets of probability theory and inferential statistics, that person (or the researcher reporting on the matter) would come under withering fire because so many people have a strong vested emotional interest in the assumptions of probability theory. They would worry that the researcher is a dupe of the Creationists and Intelligent Design people. However, let us not tarry too long here.

More on induction
Jevons saw the scientific method as based on induction and used Laplace's rule of succession as a "proof" of scientific induction. Yet neither he, nor Pearson, who also cited it favorably, included a mathematical explanation of Laplace's rule, unlike Keynes, who analyzed Laplace's rule and also offered an amended form of it. Jevons, Pearson and Keynes all favored forms of Bayesian reasoning, often called "the inverse method."

Jevons and probability
http://plato.stanford.edu/entries/william-jevons/

On causality and probability, Jevons wrote: "If an event can be produced by any one of certain number of different causes, the probabilities of the existence of these causes as inferred from the event, are proportional to the event as derived from the causes," adding: "In other words, the most probable cause of an event which has happened supposing the cause to exist; but all other possible causes are also taken into account with probabilities proportional to the probability that the event would have happened if the cause existed" (37).

Jevons then uses standard ball and urn conditional probability examples.

A point of dispute here is the word "cause." In fact, in the urn and ball case, we might consider it loose usage to say that it is most probable that urn A is the cause of the outcome.

Still, it is fair to say that urn A's internal composition is the most probable relevant predecessor to the outcome. "Causes" are the hidden sub-vectors of the net force vector, which reaches the quantum level, where causation is a problematic idea.

The problem of causation deeply occupied both Pearson and Fisher, who were more favorably disposed to the concept of correlation as opposed to causation. We can see here that their areas of expertise would tend to promote such a view; that is to say that they, not being physicists, would tend to favor positing low propensity information. Philosophically, they were closer to the urn of nature concept of probability than they might have cared to admit.

And that brings us back to the point that a probability method is a tool for guidance in decision-making or possibly in apprehending truth, though this second item is where much ambiguity arises.

One must fill in the blanks for a particular situation. One must use logical reasoning, and perhaps statistical methods, to go from mere correlation to causation, with the understanding that the problem of cause and effect is a notorious philosophical conundrum.

Cause-effect is in many respects a perceptual affair. If one steps "outside" the spacetime block (see section on spacetime), where is cause-effect?

Also, consider the driver who operates his vehicle while under the influence of alcohol and becomes involved in an auto accident. He is held to be negligent as if by some act of will, whereby his decision to drink is said to have "caused" the accident. First, if a person's free-will is illusory, as seems to be at least partly true if not altogether true, then how do we say his decision caused anything? Second, some might term the decision to drink and drive the "proximate" cause of the accident. However, there are many other influences (causes?) that sum to the larger "cause." The interlinked rows of dominos started falling sometime long ago -- if one thinks in a purely mechanistic computer-like model. How does one separate out causes? We address this issue from various perspectives as we go along.

Brain studies tend to confirm the point that much that passes for free will is illusory. And yet, this very fact seems to argue in favor of a need for a core "animating spirit," or amaterial entity: something that is deeper than the world of phenomena that includes somatic functions; such a liberated pilot spirit would, it seems to me, require a higher order spirit to bring about such a liberation. I use the word "spirit" in the sense of amaterial unknown entity and do not propose a mystical or religious definition; however, the fact that the concept has held for centuries suggests that many have come intuitively to the conclusion that "something is in there."

I realize that here I have indulged in "non-scientific speculation" but I argue that "computer logic" leads us to this means of answering paradoxes. That is, we have a Goedelian argument that points to a "higher frame." But in computer logic, the frames "go all the way up" to infinity. We need something outside, or that is greater than and fundamentally different from, the spacetime block with which to have a bond.

Pearson in The Grammar of Science (38) makes the point that a randomized sequence means that we cannot infer anything from the pattern. But, if we detect a pattern, we can then write an algorithm for its continuation. So we can think of the program as the cause, and it may or may not give a probability 1 as to some number's existence at future step n.

I extend Pearson's idea here; he says the analogy should not be pressed too far but I think it makes a very strong point; and we can see that once we have an algorithm, we have basic system information. The longer the recognizable sequence of numbers, the higher the probability we assign it for non-randomness; see my discussion on that:

A note on periodicity and probability
http://kryptograff5.blogspot.com/2013/08/draft-1-please-let-me-know-of-errors.html

Now when we have what we suspect is a pattern, but have no certain algorithm, then we may find ways to assign probabilities to various conjectures as to the underlying algorithm.

A scientific theory serves as a provisional algorithm: plug in the input values and obtain predictable results (within tolerances).

If we see someone write the series

1,2,4,8,16,32

we infer that he is doubling every previous integer. There is no reason to say that this is definitely the case (for the moment disregarding what we know of human learning and psychology), but with a "high degree of probability," we expect the next number to be 64.

How does one calculate this probability?

The fact the series climbs monotonically would seem to provide a floor probability at any rate, so that a nonparametric test would give us a useful value. Even so, what we have is a continuum. Correlation corresponds to moderate probability that A will follow B, causation to high probability of same. In modern times, we generally expect something like 99.99% probability to permit us to use the term "cause." But even here, we must be ready to scrap our assumption of direct causation if a better theory strongly suggests that "A causes B" is too much of a simplification.

For example, a prosecutor may have an apparently air-tight case against a husband in a wife's murder, but one can't completely rule out a scenario whereby CIA assassins arrived by black helicopter and did the woman in for some obscure reason. One may say that the most probable explanation is that the husband did it, but full certainty is rare, if it exists at all, in the world of material phenomena.

And of course the issue of causation is complicated by issues in general relativity -- though some argue that these can be adequately addressed -- and quantum mechanics, where the problem of causation becomes enigmatic.

Popper argued that in "physics the use of the expression 'causal explanation' is restricted as a rule to the special case in which universal laws have the form of laws of 'action by contact'; or more precisely, of 'action at a vanishing distance' expressed by differential equations" (39) [Popper's emphasis].

The "principle of causality," he says, is the assertion that any event that is amenable to explanation can be deductively explained. He says that such a principle, in the "synthetic" sense, is not falsifiable. So he takes a neutral attitude with respect to this point. This relates to our assertion that theoretic systems have mathematical relations that can be viewed as cause and effect relations.

Popper sees causality in terms of universal individual concepts, an echo of what I mean by sets of primitives. Taking up Popper's discussion of "dogginess," I would say that one approach is to consider the ideal dog as an abstraction of many dogs that have been identified, whereby that ideal can be represented by a matrix with n unique entries. Whether a particular object or property qualifies as being associated with a dogginess matrix depends on whether that object's or property's matrix is sufficiently close to the agreed universal dogginess matrix. In fact, I posit that perception of "reality" works in part according to such a system, which has something in common with the neural networks of computing fame.

(These days, of course, the ideal dog matrix can be made to correspond to the DNA sequences common to all canines.)

But, in the case of direct perception, how does the mind/brain know what the template, or matrix ideal, of a dog is? Clearly, the dogginess matrix is compiled from experience, with new instances of dogs checked against the previous matrix, which may well then be updated.

A person's "ideal dog matrix" is built up over time, of course, as the brain integrates various percepts. Once such a matrix has become "hardened," a person may find it virtually impossible to ignore that matrix and discover a new pattern. We see this tendency especially with respect to cultural stereotypes.

Still, in the learning process, a new encounter with a dog or representation of a dog may yield only a provisional change in the dogginess matrix. Even if we take into account all the subtle clues of doggy behavior, we nevertheless may be relating to something vital, in the sense of nonphysical or immaterial, that conveys something about dogginess that cannot be measured. On the other hand, if one looks at a still or video photograph of a dog, nearly everyone other than perhaps an autistic person or primitive tribesman unaccustomed to photos, agrees that he has seen a dog. And photos are these days nothing but digital representations of binary strings that the brain interprets in a digital manner, just as though it is using a neural matrix template.

Nevertheless, that last point does not rule out the possibility that, when a live dog is present, we relate to a "something" within, or behind, consciousness that is nonphysical (in the usual sense). That is, the argument that consciousness is an epiphenomenon of the phenomenal world cannot rule out something "deeper" associated with a noumenal world.

The concept of intuition must also be considered when talking of the empirico-inductive method. (See discussion on types of intuition in the "Noumenal world" section of Part VI; link in sidebar.)

More on causality
David Hume's argument that one cannot prove an airtight relation between cause and effect in the natural world is to me self-evident. In his words:

"Matters of fact, which are the second objects of human reason, are not ascertained in the same manner [as are mathematical proofs]; nor is our evidence of their truth, however great, of a like nature with the foregoing. The contrary of every matter of fact is still possible, because it can never imply a contradiction, and is conceived by the mind with the same facility and distinctness, as if ever so conformable to reality. That the sun will not rise tomorrow is no less intelligible a proposition, and implies no more contradiction, than the affirmation, that it will rise. We should in vain, therefore, attempt to demonstrate its falsehood. Were it demonstratively false, it would imply a contradiction, and could never be distinctly conceived by the mind..."

To summarize Hume:

I see the sun rise and form the habit of expecting the sun to rise every morning. I refine this expectation into the judgment that "the sun rises every morning."

This judgment cannot be a truth of logic because it is conceivable that the sun might not rise. This judgment cannot be conclusively established empirically because one cannot observe future risings or not-risings of the sun.

Hence, I have no rational grounds for my belief, but custom tells me that its truthfulness is probable. Custom is the great guide of life.

We see immediately that the scientific use of the inductive method itself rests on the use of frequency ratios, which themselves rest on unprovable assumptions. Hence a cloud is cast over the whole notion of causality.

This point is made by Volodya Vovk: "... any attempt to base probability theory on frequency immediately encounters the usual vicious cycle. For example, the frequentist interpretation of an assertion such as Pr(E) = 0.6 is something like: we can be practically (say, 99.9%) certain that in 100,000 trials the relative frequency of success will be within 0.02 of 0.6. But how do we interpret 99.9%? If again using frequency interpretation, we have infinite regress: probability is interpreted in terms of frequency and probability, the latter probability is interpreted in terms of frequency and probability, etc" (40).

Infinite regress and truth

We have arrived at a statement of a probability assignment, as in: Statement S: "The probability of Proposition X being true is y."

We then have:

Statement T: "The probability that Statement Q is true is z."

Now what is the probability of Q being true? And we can keep doing this ad infinitum?

Is this in fact conditional probability? Not in the standard sense, though I suppose we could argue for that also.

Statement S is arrived at in a less reliable manner than statement T, presumably, so that such a secondary can be justified, perhaps.

This shows that, at some point, we simply have to take some ancillary statement on faith.

Pascal's wager and truth

"From nothing, nothing" is what one critic has to say about assigning equal probabilities in the face of complete ignorance.

Let's take the case of an observer who claims that she has no idea of the truthfulness or falseness of the statement "God exists."

As far as she is concerned, the statement and its negation are equally likely. Yes, it may be academic to assign a probability of 1/2 to each statement. And many will object that there are no relevant frequencies; there is no way to check numerous universes to see how many have a supreme deity.

And yet, we do have a population (or sample space), that being the set of two statements {p, ~p}. Absent any other knowledge, it may seem pointless to talk of a probability. Yet, if one is convinced that one is utterly ignorant, one can still take actions:

1. Flip a coin and, depending on the result, act as though God exists, or act as though God does not exist.
2. Decide that a consequence of being wrong about existence of a Supreme Being is so great that there is nothing to lose and a lot to gain to act as though God exists (Pascal's solution).
3. Seek more evidence, so as to bring one closer to certainty as to whether p or ~p holds.

In fact, the whole point of truth estimates is to empower individuals to make profitable decisions. So when we have a set of equiprobable outcomes, this measures our maximum ignorance. It is not always relevant whether a large number of trials has established this set and its associated ratios.

That is, one can agree that the use of mathematical quantifications in a situation such as Pascal's wager is pointless, and yields no real knowledge. But that fact doesn't mean one cannot use a form of "probabilistic reasoning" to help with a decision. Whether such reasoning is fundamentally wise is another question altogether, as will become apparent in the sections on noumena (Part VI, see sidebar).

There have been attempts to cope with Hume's "problem of induction" and other challenges to doctrines of science. For example, Laplace addressed Hume's sun-rising conundrum with the "rule of succession," which is based on Bayes's theorem. Laplace's attempt, along with such scenarios as "the doomsday argument," may have merit as thought experiments, but cannot answer Hume's basic point: We gain notions of reality or realities by repetition of "similar" experiences; if we wish, we could use frequency ratios in this respect. But there is no formal way to test the truthfulness of a statement or representation of reality.

Go to Chapter 6 HERE.

The Many Worlds of Probability, Reality and Cognition

Chapter 5

No comments:

Post a Comment

Chapter 10

Report Abuse