Chapter 3

Law of large numbers
Probability distributions
Footnotes
https://manyworlds784.blogspot.com/p/footnotes.html

Law of large numbers
In my view, a basic idea behind the "law of large numbers" is that minor influences tend to cancel each other out asymptotic to infinity. We might consider these influences to be small force vectors that can have a butterfly effect as to which path is taken. If we map these small vectors onto a sine wave graph, we can see heuristically how the little bumps above the axis tend to be canceled by the little bumps below the axis, for partially destructive interference. We can also see how small forces so mapped occasionally superpose in a constructive way, where, if the amplitude is sufficient, a "tipping point" is reached and the coin falls head side up.

In fact, two forms -- the weak and the strong -- of this law have been elucidated. This distinction however doesn't address the fundamental issues that have been raised.

On the law of large numbers
http://www.mhhe.com/engcs/electrical/papoulis/graphics/ppt/lectr13a.pdf

The strong law
http://mathworld.wolfram.com/StrongLawofLargeNumbers.html

The weak law
http://mathworld.wolfram.com/WeakLawofLargeNumbers.html

Keynes raises, from a logician's perspective, strong objections to the law of large numbers, though he considers them minor from the physicist's point of view. His solution is to eschew explaining reproducible regularities in terms of accumulations of accidents. "...for I do not assert the identity of the physical and the mathematical concept of probability at all; on the contrary, I deny it" (15).

This amounts to tossing out the logical objections to the "law," and accepting that "law" on an ad hoc or axiomatic basis. However, he makes an attempt at a formalistic resolution.

Neither is that law always valid, he says. "The rule that extreme probabilities have to be neglected ... agrees with the demand for scientific objectivity." That is, there is the "obvious objection" that even an enormous improbability always remains a probability, however small, and that consequently even the most "impossible" processes -- i.e., those which we propose to neglect -- will someday happen. And that someday could be today.

Keynes, to make his point, cites some extraordinarily improbable distributions of gas molecules in the Maxwell's demon thought experiment. "Even if a physicist happened to observe such a process, he would be quite unable to reproduce it, and therefore would never be able to decide what had really happened in this case, and whether he had made an observational mistake" (16).

Citing Arthur Eddington's statement to the effect that some things in nature are impossible, while other things don't happen because of their remote probability, Keynes says that he prefers to avoid non-testable assertions about whether extremely improbable things in fact occur. Yet, he observes that Eddington's assertion agrees well with how the physicist applies probability theory (17).

I note that if a probability is so remote as to be untestable via experiment, then, as E.T. Jaynes says, a frequentist model is not necessarily hard and fast. It can only be assumed that the probability assignments are adequate guides for some sort of decision-making. Testing is out of the question for extreme cases.

So, I suggest that Keynes here is saying that the scientific basis for probability theory is intuition.

A problem with Keynes's skepticism regarding highly improbable events is that without them, the notion of randomness loses some of its power.

The mathematics of the the chaos and catastrophe theories make this clear. In the case of a "catastrophe" model of a continuously evolving dynamical system, sudden discrete jumps to a new state are inevitable, though it may not be so easy to say when such a transition will occur.

Concerning catastrophe theory
http://www.physics.drexel.edu/~bob/PHYS750_NLD/Catastrophe_theory.pdf

Nonlinear dynamics and chaos theory
http://www2.bren.ucsb.edu/~kendall/pubs_old/2001ELS.pdf

We also must beware applying the urn of nature scenario. An urn has one of a set of ratios of white to black balls. But, a nonlinear dynamic system is problematic for modeling by an urn. Probabilities apply well to uniform, which is to say, for practical purposes, periodic systems. One might possibly justify Laplace's rule of succession on this basis. However, quasi-periodic systems may well give a false sense of security, perhaps masking sudden jolts into atypical, possibly chaotic, behavior. Wasn't everyone carrying on as usual when in 2004 a tsunami killed 230,000 people in 14 countries bordering the Indian Ocean?

So we must be very cautious about how we use probabilities concerning emergence of high-information systems. Here is why: A sufficiently rich mix of chemical compounds may well form a negative feedback dynamical system. It would then be tempting to apply a normal probability distribution to such a system, and that distribution very well may yield reasonable results for a while. But, if the dynamical system is nonlinear -- which most are -- the system could reach a threshold, akin to a chaos point, at which it crosses over into a positive feedback system or into a substantially different negative feedback system.

The closer the system draws to that tipping point, the less the normal distribution applies. In chaotic systems, normal probabilities, if applicable, must be applied with great finesse. Hence to say that thus and such an outcome is highly improbable based on the previous state of the system is to misunderstand how nonlinearities can work. In other words, a Markov process (see below) is often inappropriate for predicting "highly improbable" events, though it may do as a good enough approximation in many nonlinear scenarios.

It is noteworthy that Keynes thought that the work of Pafnuty Chebyshev and Andrey Markov should replace Laplace's rule, implying that he thought a Markov process adequate for most probabilistic systems (18). Certainly he could not have known much of what came to be known as chaos theory and nonlinear dynamics.

Another issue is the fact that an emergent property may not be obvious until it emerges (echoes of David Bohm's "implicate order"). Consider the Moebius band. Locally, the surface is two-sided, such that a vector orthogonal to the surface has a mirror vector pointing in the opposite direction. Yet, at the global scale, the surface is one-sided and a mirror vector is actually pointing out from the same surface as its partner is.

If a paper model of a Moebius strip were partially shown through a small window and an observer were asked if she thought the paper was two-sided, she would reply: "Of course." Yet, at least at a certain scale at which thickness is ignored, the paper strip has one side.

What we have in the case of catastrophe and chaos events is often called pseudorandomness, or effectively incalculable randomness. In the Moebius band case, we have a difficulty on the part of the observer of conceptualizing emergent properties, an effect also found in Ramsey order.

We can suggest the notion of unfoldment of information, thus: We have a relation R representing some algorithm.

Let us suppose an equivalence relation such that

(i) aRrefa < -- > aRrefa (reflexivity).

(ii) aRsymb < -- > bRsyma (symmetry) .

(iii) aRtrnb and bRtrnc -- > aRtrnc (transitivity).

The redundancy, or structural information, is associated with R. So aRa corresponds to 0 Shannon information in the output. The reflexivity condition is part of the structural information for R, but this redundancy is irrelevant for Rreflex. The structural information is relevant in the latter two cases. In those cases, if we do not know the structure or redundancy in R, we say the information is enfolded. Once we have discovered some algorithm for R, then we say the information has been revealed and is close to zero, but not quite zero, as we may not have advance knowledge concerning the variables.

Some would argue that what scientists mean by order is well summarized by an order relation aRb, such that A X B is symmetric and transitive but not reflexive. However, I have yet to convince myself on this point.

Ramsey order

John Allen Paulos points out an important result of network theory that guarantees that some sort of order will emerge. Ramsey proved a "strange theorem," stating that if one has a sufficiently large set of geometric points and every pair of them is connected by either a red line or a green line (but not by both), then no matter how one paints the lines, there will always be a large subset of the original set with a special property. Either every pair of the subset's members will be connected by a red line or every pair of the subset's members will be connected by a green line.

"If, for example, you want to be certain of having at least three points all connected by red lines or at least three points all connected by green lines, you will need at least six points," says Paulos.

"For you to be certain that you will have four points, every pair of which is connected by a red line, or four points, every pair of which is connected by a green line, you will need 18 points, and for you to be certain that there will be five points with this property, you will need -- it's not known exactly -- between 43 and 55. With enough points, you will inevitably find unicolored islands of order as big as you want, no matter how you color the lines," he notes.

Paulos on emergent order
http://abcnews.go.com/Technology/WhosCounting/story?id=4357170&page=1

In other words, no matter what type or level of randomness is at work, "order" must emerge from such networking. Hence one might run across a counterintuitive subset and think its existence highly improbable, that the subsystem can't have fallen together randomly. So again, we must beware the idea that "highly improbable" events are effectively nonexistent. Yes, if one is applying probabilities at a near-zero propensity, and using some Bayesian insuffucient reason rationale, then such an emergent event would be counted as virtually impossible. But, with more knowledge of the system dynamics, we must parse our probabilistic questions more finely.

On the other hand, intrinsic fundamental randomness is often considered doubtful except in the arena of quantum mechanics -- although quantum weirdness does indeed scale up into the "macro" world (see noumena sections in Part VI, link in sidebar). Keynes of course knew nothing about quantum issues at the time he wrote Treatise.

Kolmogorov used his axioms to try to avoid Keynesian difficulties concerning highly improbable events.

Kolmogorov's 1933 book (19) gives these two conditions:
A. One can be practically certain that if C is repeated a large number of times, the relative frequency of E will differ very little from the probability of E. [He axiomatizes the law of large numbers.]

B. If P(E) is very small, one can be practically certain that when C is carried out only once, the event E will not occur at all.
But, if faint chances of occurrences are ruled out beyond some limit, doesn't this really go to the heart of the meaning of randomness?

Kolmogorov's 'Foundations' in English
http://www.mathematik.com/Kolmogorov/index.html

And, if as Keynes believed, randomness is not all that random, we lose the basic idea of independence of like events, and we bump into the issue of what is meant by a "regularity" (discussed elsewhere).

Statisticians of the 19th century, of course, brought the concept of regularity into relief. Their empirical methods disclosed various recurrent patterns, which then became fodder for the methods of statistical inference. In those years, scientists such as William Stanley Jevons began to introduce probabilistic methods. It has been argued that Jevons used probability in terms of determining whether events result from certain causes as opposed to simple coincidences, and via the method of the least squares. The first approach, notes the Stanford Encyclopedia of Philosophy, "entails the application of the 'inverse method' in induction: if many observations suggest regularity, then it becomes highly improbable that these result from mere coincidence."

Encyclopedia entry on Jevons
http://plato.stanford.edu/entries/william-jevons/

Jevons also employed the method of least squares to try to detect regularities in price fluctuations, the encyclopedia says.

Statistical regularities, in my way of thinking, are a reflection of how the human mind organizes the perceived world, or world of phenomena. The brain is programed to find regularities (patterns) and to rank them -- for the most part in an autonomic fashion -- as an empirico-inductivist-frequentist mechanism for coping.

Yet, don't statistical regularities imply an objective randomness which implies a reality larger than self? My take is that the concept of intrinsic randomness serves as an idealization, which serves our desire for a mathematical, formalistic representation of the phenomenal world and in particular, serves our desire to predict properties of macro-states by using the partial, or averaged, information we have of the micro-states, as when we obtain the macro-state information of the threshold line for species extinction, which serves to cover the not-very-accessible information for the many micro-states of survival and mortality of individuals.

Averaging however does not imply intrinsic randomness. On the other hand, the typical physical assumption that events are distinct and do not interact without recourse to known physical laws implies independence of events, which in turn implies effectively random influences. In my estimation, this sort of randomness is a corollary of the reductionist, isolationist and simplificationist method of typical science, an approach that can be highly productive, as when Claude Shannon ignored the philosophical ramifications of the meaning of information.

The noted probability theorist Mark Kac gives an interesting example of the relationship of a deterministic algorithm and randomness [35].

Consider the consecutive positive integers {1, 2, ... ,n} -- say with n = 104 and corresponding to each integer m in this range, and then consider the number f(m) of the integer's different prime factors.

Hence, f(1) = 0, f(2) = f(3) = f(5) = 1, f(4) = 22 = 1, f(6) = f(2*3) = 2, f(60) = f(22*3*5) = 3, and so forth.

Kac assigns these outputs to a histogram of the number of prime divisors, using ln (ln n) and adjusting suitably the size of the base interval. He obtains an excellent approximation -- which improves as n rises -- to the normal curve. The statistics of the number of prime factors is, Kac wrote, indistinguishable from the statistics of the sizes of peas or the statistics of displacement in Brownian motion. And yet, the algorithm is fully deterministic, meaning from his perspective that there is neither chance nor randomness.

We note that in classical dynamical systems, there is also no intrinsic randomness, and that probabilities are purportedly determined by activities below the threshold of convenient observation. And yet the fact that prime factors follow the normal curve is remarkable and deserving of further attention. There should, one would think, be a relationship between this fact and Riemann's conjecture.

Interestingly primes fall in the holes of the sieve of Eratosthenes, implying that they do not follow any algebraic formula (which they do not except in a very special case that is not applicable to the general run of algebraic formulas). Is it so surprising that primes occur "erratically" when one sees that they are "anti-algebraic"? In general, non-algebraic algorithms produce outputs that are difficult to pin down exactly for some future state. Hence, probabilistic methods are called for. In that case, one hopes that some probability distribution/density will fit well enough.

But, the fact that the normal curve is the correct distribution is noteworthy, as is the fact that the samples of prime factors follow the central limit theorem.

My take is that the primes do not fall in an equally probable pattern, a fact that is quite noticeable for low n. However, as n increases, the dependence tends to weaken. So at 104 the dependence among prime factors is barely detectable, making their detections effectively independent events. In other words, the deterministic linkages among primes tend to cancel or smear out, in a manner similar to sub-threshold physical variables tending to cancel.

In a discussion of the Buffon's needle problem and Bertrand's paradox, Kac wishes to show that if probability theory is made sufficiently rigorous, the layperson's concerns about its underlying value can be answered. He believes that sufficient rigor will rid us of the "plague of paradoxes" entailed by the different possible answers to the so-called paradoxes.

However, 50 years after Kac's article, Bertrand's paradox still stimulates controversy. The problem is often thought to be resolved by specification of the proper method of setting up an experiment. That is, the conceptual probability is not divorced from the mechanics of an actual experiment, at least in this case.

And because actual Buffon needle trials can be used to arrive at acceptable values of pi, we have evidence that the usual method of computing the Buffon probabilities is correct, and further that the notion of equal probabilities is for this problem a valid assumption, though Kac argued that only a firm mathematical foundation would validate that assumption.

A useful short discussion on Bertrand's paradox is found here:

Wikipedia article on Bertrand's paradox
https://en.wikipedia.org/wiki/Bertrand_paradox_%28probability%29

At any rate, whether the universe ( = both the phenomenal and noumenal worlds) follows the randomness assumptions above is open to debate. Note that the ancients had a law of gravity (though not articulated as such). Their empirical observations told them that an object falls to the ground if not supported by other objects. The frequency ratio was so high that any any exception would have been regarded as supernatural. These inductive observations led to the algorithmic assessments of Galileo and Newton. These algorithmic representations are very successful, in limited cases, at prediction. These representations are deductive systems. Plug in the numbers, compute, and, in many cases, out come the predictive answers. And yet the highly successful systems of Newton and Einstein cannot be used, logically, as a means of excluding physical counterexamples. Induction supports the deductive systems, and cannot be dispensed with.

A statement such as "The next throw of a die showing 5 dots has a probability of 1/6" is somewhat inadequate because probabilities, says Popper, cannot be ascribed to a single occurrence of an event, but only to infinite sequences of occurrences (i.e., back to the law of large numbers). He says this because he is saying that any bias in the die can only be logically ruled out by an infinity of trials (20). Contrast that with Weyl's belief (21) that symmetry can provide the basis for an expectation of zero bias (see Part IV) and with my suggestion that below a certain threshold, background vibrations may make no difference.

One can see the harbinger of the "law of large numbers" in the urn model's classical probability: If an urn contains say 5 white balls and 2 black, then our ability to predict the outcome is given by the numbers 2/5 and 3/5. But why is that so? Answer: it is assumed that if one conducts enough experiments, with replacement, guesses for black or white will asymptotically approach the ratios 2/5 and 3/5. Yet, why do we consider that assumption reasonable in "real life" and without bothering with formalities? We are accepting the notion that the huge aggregate set of minor force vectors, or "causes," tends to be neutral. There are two things to say about this:
1. This sort of randomness excludes the operation of a God or superior being. At one time, the study of probabilities with respect to games of chance was frowned upon on grounds that it was blasphemous to ignore God's influence or to assume that that influence does not exist (Bernoulli was prudently circumspect on this issue). We understand that at this point, many react: "Aha! Now you are bringing in religion!" But the point here is that the conjecture that there is no divine influence is an article of faith among some scientifically minded persons. This idea of course gained tremendous momentum from Darwin's work.

2. Results of modern science profoundly challenge what might be called a "linear perspective" that permits "regularities" and the "cancelation of minor causes." As we show in the noumena sections (Part VI, see sidebar), strange results of both relativity theory and quantum mechanics make the concept of time very peculiar indeed, meaning that causality is stood on its head.
Keynes tells his readers that Siméon Denis Poisson brought forth the concept of the "law of large numbers" that had been used by Bernoulli and other early probabilists. "It is not clear how far Poisson's result [the law of large numbers as he extended it] is due to a priori reasoning, and how far it is a natural law based on experience; but it is represented as displaying a certain harmony between natural law and the a priori reasoning of probabilities."

The French statistician Adolph Quetelet, says Keynes, did a great deal to explain the use of statistical methods. Quetelet "belongs to the long line of brilliant writers, not yet extinct, who have prevented probability from becoming, in the scientific salon, perfectly respectable. There is still about it for scientists a smack of astrology, of alchemy" (21a). It is difficult to exorcise this suspicion because, in essence, the law of large numbers rests on an unprovable assumption, though one that tends to accord with experience.

This is not to say that various people have not proved the weak and strong forms once assumptions are granted, as in the case of Borel, who was a major contributor to measure theory, which he and others have used in their work on probability. Yet, we do not accept that because a topological framework exists that encompasses probability ideas, it follows that the critical issues have gone away.

On Adolph Quetelet
http://mnstats.morris.umn.edu/introstat/history/w98/Quetelet.html

Keynes makes a good point about Poisson's apparent idea that if one does enough sampling and analysis, "regularities" will appear in various sets. However, notes Keynes, one should beware the idea that "because the statistics are numerous, the observed degree of frequency is therefore stable."

Keynes's insight can be appreciated with respect to iterative feedback functions. Those which tend to stability (where the iterations are finitely periodic) may be thought of in engineering terms as displaying negative feedback. Those that are chaotic (or pre-chaotic with spurts of instability followed by spurts of stability) are analogous to positive feedback systems. So, here we can see that if a "large" sample is drawn from a pre-chaotic system's spurt of stability, a wrong conclusion will be drawn about the system's regularity. And again we see that zero or near-zero propensity information, coupled with the assumption that samples represent the population (which is not to say that samples are not normally distributed), can yield results that are way off base.

Probability distributions
If we don't have at hand a set of potential ratios, how does one find the probability of a probability? If we assume that the success-failure model is binomial, then of course we can apply the normal distribution of probabilities. With an infinite distribution, we don't get the probability of a probability, of course, though we would if we used the more precise binomial distribution with n finite. But, we see that in practice, the "correct" probability distribution is often arrived at inductively, after sufficient observations. The Poisson distribution is suited to rare events; the exponential distribution to radioactive decay. In the latter case, it might be argued that along with induction is the deductive method associated with the rules of quantum mechanics.

Clearly, there is an infinitude of probability distributions. But in the physical world we tend to use a very few: among them, the uniform, the normal and the exponential. So a non-trivial question is: what is the distribution of these distributions, if any? That is, can one rationally assign a probability that a particular element of that set is reflective of reality? Some would argue that here is the point of the Bayesians. Their methods, they say, give the best ranking of initial probabilities, which, by implication suggest the most suitable distribution.

R.A. Fisher devised the maximum likelihood method for determining the probability distribution that best fits the data, a method he saw as superior to the inverse methods of Bayesianism (see below). But, in Harold Jeffreys's view, the maximum likelihood is a measure of the sample alone; to make an inference concerning the whole class, we combine the likelihood with an assessment of prior belief using Bayes's theorem (22).

Jeffreys took maximum likelihood to be a variation of inverse probability with the assumption of uniform priors.

In Jae Myung on maximum likelihood
http://people.physics.anu.edu.au/~tas110/Teaching/Lectures/L3/Material/Myung03.pdf

For many sorts of data, there is the phenomenon known as Benford's law, in which digit probabilities are not distributed normally but logarithmically. Not all data sets conform to this distribution. For example, if one takes data from plants that manufacture beer in liters and then converts those data to gallons, one wouldn't expect that the distribution of digits remains the same in both cases. True, but there is a surprise.

In 1996, Theodore Hill, upon offering a proof of Benford's law, said that if distributions are taken at random and random samples are taken from each of these distributions, the significant digit frequencies of the combined samples would converge to the logarithmic distribution, such that probabilities favor the lower digits in a base 10 system. Hill refers to this effect as "random samples from random distributions." As Julian Havil observed, "In a sense, Benford's Law is the distribution of distributions!" (23).

MathWorld: Benford's law
http://mathworld.wolfram.com/BenfordsLaw.html

Hill's derivation of Benford's law
http://www.gatsby.ucl.ac.uk/~turner/TeaTalks/BenfordsLaw/stat-der.pdf

Though this effect is quite interesting, it is not evident to me how one would go about applying it in order to discover a distribution beyond the logarithmic. Nevertheless, the logarithmic distribution does seem to be what emerges from the general set of finite data. Even so, however, Hill's proof appears to show that low digit bias is an objective artifact of the "world of data" that we commonly access. The value of this distribution is shown by its use as an efficient computer coding tool.

It is not excessive to say that Benford's law, and its proof, encapsulates very well the whole of the statistical inference mode of reasoning. And yet plainly Benford's law does not mean that "fluke" events don't occur. And who knows what brings about flukes? As I argue in Toward, the mind of the observer can have a significant impact on the outcome.

Another point to take into consideration is the fact that all forms of probability logic bring us to the self-referencing conundrums of Bertrand Russell and Kurt Goedel. These are often dismissed as trivial. And yet, if a sufficiently rich system cannot be both complete and consistent, then we know that there is an enforced gap in knowledge. So we may think we have found a sublime truth in Benford's law, and yet we must face the fact that this law, and probabilistic and mathematical reasoning in general, cannot account for all things dreamt of, or undreamt of, in one's philosophy.

Go to Chapter 4 HERE.

No comments:

Post a Comment

Chapter 10

The importance of brain teasers The Monty Hall problem The scenario's opener: The contestant is shown three curtains and told that behin...