The Many Worlds of Probability, Reality and Cognition: Chapter 8

What exactly is entropy?
More on entropy

Footnotes
https://manyworlds784.blogspot.com/p/footnotes.html

What exactly is entropy?
Entropy is a big bone of contention among physicists and probability theorists. Consider: does nature provide automatic "objective" shuffling processes or do we have here an artifact of human observational limitations? (52)

Can information be lost? From the viewpoint of an engineer, Shannon information is conserved. I' = I - I_c, where I is the total information, I' the new information and I_c the information in the constraints, or structural information, or propensity information.

When reflecting on that question, perhaps it would be helpful to look beyond the usual idea of Shannon information in a transitory data stream and include the ideas of storage and retrieval, being careful to grant that those concepts can easily be accommodated in standard information theory. But what of the difficulty in retrieving some bit string? Using a Kolmogorov-Chaitin sort of measure, we have the ratio of input bits to output bits, meaning that we regard maximum entropy, or equilibrium entropy, in this case as occurring when the ratio is near 1. We may or may not get effective reversibility.

A composite number contains the information implying constituent primes. But the information needed to multiply the primes is much less than that needed, usually, to find the primes in the composite. "Operational information" is lost once two primes are encoded as a composite. That is, what one person encodes as a composite in all but trivial cases no one else can decode with as little computation as went into the multiplication.

It is often convenient "for practical purposes" to think in terms of a classical mechanistic dynamic system, which is an approximation of a system in nature. But, we must also acknowledge that another way to look at the ability to retrieve information stems from the Heisenberg Uncertainty Principle. In a typical ensemble there is, at the quantum level, an inherent unpredictability of a specific path. So, one obviously can't reverse a trajectory which isn't precisely known. Again we discover that at the quantum level, the "arrow of time" is blurred in both "directions." Once a detection is made, we may think we know the incoming trajectory; however, Richard Feynman devised his path integral formalism specifically in order to account for the large (infinite?) number of possible trajectories.

As soon as we measure a particle, the HUP gives a measure of information for one property and a measure of entropy for the corresponding property. (One may accept this point as a "principle of correspondence.") Before measurement, the entropy ( = the uncertainty) is the HUP relation.

Before proceeding further, let us pause to consider an important insight given by Bruce Hood.

Hood's statement for Edge
http://edge.org/response-detail/11275

"As a scientist dealing with complex behavioral and cognitive processes, my deep and elegant explanation comes not from psychology (which is rarely elegant) but from the mathematics of physics. For my money, Fourier's theorem has all the simplicity and yet more power than other familiar explanations in science. Stated simply, any complex pattern, whether in time or space, can be described as a series of overlapping sine waves of multiple frequencies and various amplitudes" (53aa).

Hood's observation neatly links the observer's brain to the physics of "external" nature, as the brain must have some way to filter out the signal from the noise. And, further, even the noise is a product of filtration from what we characterize as a phenomenal input (see Toward).

One might think of entropy as the decreasing probability of observing a net force vector composed of coherent sub-vectors. Map a force vector onto a wave graph and consider the possible decomposition unit waves. At the unit level, a few wave forms may completely interfere constructively or destructively, but most are out of phase with each other, especially when frequencies vary. The higher the overall amplitude of the composite waveform, the less likely that sub-waveforms are all precisely in phase or that half are precisely one phase over (deconstructively interfering) from the other half.

We are using a proportion definition of probability here, such that over a particular interval, the set of mixed waveforms (that don't interfere coherently) is much larger than the set of completely coherent waveforms. In fact, we may regard the population of waveforms that fits on an interval to be composed of sample waveforms, where the sample is a sub-form composed of more sub-forms and eventually composed of unit forms. By the Central Limit Thereom, we know the set of samples is normally distributed. So it then becomes evident that the coherent waveforms are represented by normal curve tails for constructive and deconstructive coherence and the central region under the curve represents "incoherent" waveforms. Mixed waveforms have degrees of coherence, measured by their standard deviations.

Knowing that waveform coherence is normally distributed, we then say that equilibrium, or maximum, entropy occurs at the mean of waveform coherence.

In fact, this isn't quite right, because the normal curve strictly speaking represents infinity. However, as sub waveforms are added, and assuming that none has an outrageously high amplitude, the irregularities smooth out and the amplitude goes to zero, a peculiar form of destructive interference. Of course, the energy does not go to zero although in such an ideal scenario it does go to infinity.

Now if the number of unit waveforms is constant, then, no matter how irregular the composite waveform, there exists at least one composite waveform that must be periodic. Hence, one might claim that a truly closed system perforce violates the principle of entropy (a point that has been a source of much discussion). We can see here an echo of Ramsey order -- discussed elsewhere -- which is an important consideration when thinking about entropy.

So when talking of entropy -- and of probability in general -- we are really saying that the observer is most likely stationed near the mean of the probability distribution. Do you see the Bayesian bogeyman hiding in the shadows?

It seems to me that there is no need to account for the information, entropy or energy represented by Maxwell's demon. What we have is a useful thought experiment that establishes that there is a small, but finite probability that dissipation could spontaneously reverse, as we can rarely be sure of the macro-propensity of a system based on its micro-states. After all, even in disorderly systems, occasionally the wave forms representing the net force vector cohere.

The job of Maxwell's imp, it will be recalled, was to open a door between two gas-filled containers whenever he spotted a swifter molecule. In this way, Maxwell said, the creature would "without expenditure of work raise the temperature of B and lower that of A in contradiction to the second law of thermodynamics" (54a).

Plainly, Maxwell was deliberately disregarding the work done by the demon.

This seems an opportune point to think about the concept of work, which is measured in units of energy. Let us consider work in terms of a spherical container filled with a gas at equilibrium. The total work, in terms of pressure on the container wall, is 0. The potential work is also 0. Suppose our demon magically vaporizes the container wall without influencing any gas molecules. The net force vector (found by summing all the micro-force vectors) is, with very high probability, nearly 0. Hence, the potential power is nearly 0. (Of course, if one takes an incremental wedge of the expanding gas, one can argue that that wedge contains conditions necessary for work on that scale. That is, a number of molecules are moving collectively by inverse square and so can exchange kinetic energy with other molecules and push them outward. Regarded in this way, a bomb performs work on surrounding objects, even though at the instant of explosion the total work, both actual and potential, is -- in an idealized scenario -- zero.)

But if a valve is opened, a set of molecules rushes to the region of lower pressure and the net force vector for this exiting gas, if contained in a pipe, is associated with non-zero power and is able to do work (such as push a piston). That is, the probability is fantastically high that the net vector is not zero or close to it, assuming sufficient initial pressure. Hence work, whether potential or actualized, requires a non-zero net force vector, which is tantamount to saying that its waveform representation shows a great deal of constructive interference (coherence).

In classical terms, dispersion means that for most ensembles of gas molecules, the initial arrangement of the micro-states (when we take a notional snapshot) is most probably asymmetric by the simple proportion that the set of symmetric micro-states to the set of asymmetric micro-states is minute. Now an asymmetric ensemble of micro-states, in general, takes far, far longer to return to the snapshot condition than is the case for a symmetric ensemble of micro-states. (Think of an asymmetric versus a perfectly symmetric break of a rack of pool balls on a frictionless surface.) Hence, the probability that an observer will see a violation of the second law -- even in the fictional classical case -- is remote.

The interesting thing about using a chaos analogy for entropy is that there is information in the chaos. That is, the attractor, (i.e., the set of limit points) gives information analogous to the information provided by, say, the extinction threshold of a continuous logistic equation.

Now we must expect statistical fluctuations that give order without employing a chaos model. That is, over infinity we expect any randomly generated run to recur an infinitude of times. But in that case, we are assuming absolute randomness (and hence, independence). There are cosmological models that posit an infinity of Paul Conants but in general the cosmos and all its subsystems are regarded as finite, and the cosmos is regarded as closed (though this property is also a bone of contention as physicists posit existence of a super-universe containing numerous cosmoses).

At any rate, one might argue that in a very large system, coherent waveforms (representing various levels of complexity) are "quite likely" to occur sometime or somewhere.

What does all this say about the presumed low entropy near the big bang versus maximum entropy in the far future (the heat death of the universe)? I would say it is conceivable that the true topology of the universe will show that "beginning" and "end" are human illusions. This point is already evident when modeling the cosmos as a four-dimensional spacetime block. The deck of cards model of entropy is OK at a certain range, but seems unlikely to apply at the cosmic level.

At any rate, classical ideas imply maximum entropy in the mixing process, but also imply that a closed system holding a finite number of particles is periodic, though a period may be huge.

In the case of the infinitely roomy urn, we are compelled to agree on a level of granularity in order to assess the probability of a particular cluster (run). So it is difficult to utterly justify "entropy" here.

Another point of controversy is the ergodic hypothesis, explained by Jan Von Plato thus: "The ergodic (quasi-ergodic) hypothesis is the assumption that a mechanical system goes through all states (comes arbitrarily close to all states) in its evolution. As a consequence of Liouville's theorem, this amounts to giving the same average kinetic energies to all degrees of freedom of the system (equipartition of energies). In the case of specific heats, this is contradicted, so that the ergodic hypothesis has to be given up in many cases" (54).

The ergodic hypothesis has been deemed important to the physics of gas molecules and hence entropy, though Jaynes argued that Shannon's maximum entropy sufficed for Jaynes's Bayesian calculations and methods (55).

The ergodic hypothesis
https://en.wikipedia.org/wiki/Ergodic_hypothesis

I note that the ergodic assumption is violated at the quantum level in the sense of "borrowing" of large amounts of energy, for conservation violation, which is presumably reconciled when longer time intervals are regarded (the "borrowings" are "paid off").

So an alternative notion is that the ergodic principle is from time to time violated, though probabilities can be used in such cases, we have the issue of circularity: it would seem that physical probability assumptions rest on the ergodic hypothesis, which in turn rests on probability assumptions. In addition, the ergodic conjecture fails to take proper account of quantum indeterminism, which is important in the gas scenario.

Jaynes argues that a Laplacian principle of insufficient reason should be replaced by his definition of maximum entropy.

"The max-entropy distribution may be asserted for the positive reason that it is uniquely determined as the one which is maximally non-commital with regard to measuring information, instead of the negative one that there was no reason to think otherwise," Jaynes writes, adding: "Thus, the concept of entropy supplies the missing criterion of choice which Laplace needed to remove the apparent arbitrariness of the principle of insufficient reason..."

Jaynes notes that thermodynamic and Shannon entropy are identical except for Boltzmann's constant, and suggests that Boltzmann's constant be made equal to 1, in line with Jaynes's program to make entropy "the primitive concept with which we work, more fundamental even than energy." This idea is reminiscent of Popper making propensity an archaic property on an equal footing with force.

In the classical mechanics conception, says Jaynes, "the expression 'irreversible process' represents a semantic confusion; it is not the physical process that is irreversible, but rather our ability to follow it. The second law of thermodynamics then becomes merely the statement that although our information as to the state of the system may be lost in a variety of ways, the only way in which it can be gained is by carrying out of further measurements."

Brillouin asserts that an argument of Joseph Loschmidt and another of Ernst Zermelo and Henri Poincare regarding reversibility do not apply in the physical world because of what we now call "butterfly effect" unpredictability and the impossibility of exact measurement (55aa).

As said, quantum uncertainty is an important consideration for entropy. Jaynes comments: "Is there any operational meaning to the statement that the probabilities of quantum mechanics are objective?" (56)

In the case of the urn with a finite number of possible draws, maximum entropy is equivalent to maximum homogenization, which corresponds to the classical probability of cases/possibilities, where that proportion is all the knowledge available. What the observer is then doing when calculating the probability of a specific property on a draw is stating the fact that that number encapsulates all the information available.

A point that bears repetition: The observer is forced to assume "maximum mixing" even though he doesn't know this has actually been done. Perhaps all or most of the black balls have been layered atop most of the white balls. As long as he doesn't know that, he must assume maximum mixing, which essentially means that if all combinations are considered, particular runs are unlikely using various parametric tests or by the consideration that binomial curve logic applies.

Even if he knows there are 99 black balls and 1 white ball, he cannot be sure that the white ball hasn't been strategically placed to greatly increase his chance of drawing it. But he can do nothing but assume this is not the case, at least for the first draw.

So, again I repeat that, if possible, he would like to hear that there is no such bias, that the mixture is "fair." By fair, in this example, is meant maximum entropy: the observer has been reassured that it is objectively true that no maneuver has been employed to introduce bias and that measures have been taken to "randomize" the order of balls. Perhaps the urn has been given a good shake and the balls bounce around in such a way that the results of chaos theory apply. It becomes effectively impossible to predict order. In this case, the lone white ball might end up atop the black balls; but it would happen pseudorandomly. So we regard maximum entropy as occurring when more shaking does not substantially lengthen any calculation that might backtrack the trajectories of the bouncing balls. Once the Kolmogorov-Chaitin limit has been essentially reached, we are at entropy equilibrium.

Though it is true that at some point in the far future continued shaking yields a return to the initial order, this fact carries the caveat that the shake net force vector must be a constant, an impossibility in the real world.

Now let us consider what constitutes good mixing for an ordered 52-card deck. That is, to "most likely" get rid of the residuals of order, while conceding that order is a subjective concept.

Rather than talk of repeated shuffles, we suggest that a random number generator (perhaps tied to a Geiger counter) chooses numbers between 0 and 53. If a number shows up more than once, subsequent instances are ignored, a process that is terminated on the 51st card. We see that we have maximally randomized the card order and so have maximum entropy insofar as an observer who is presented with the deck and turns the first card over. Assuming draws without replacement, the maximum entropy ( = minimum information) changes on each draw, of course.

On the first draw, his chance of picking an ace of spades is (52!)^(-1). This chance he posits on the notion of fairness, or maximum entropy. This assumption is distinct from the principle of indifference where he may be presented with a deck and asked to estimate the probability of an ace of spades. From his perspective, he may say (52!)^(-1) because he has no reason to believe the deck has been stacked; but this assumption is not the same as being told that the deck has been well shuffled.

At any rate, once we have shuffled the deck with a random number generator, we gain nothing by using our random number generator to reshuffle it. The deck order is maximally unpredictable in case 1, or at maximum, or equilibrium entropy.

Nevertheless, it may be agreed that given enough shuffles our randomization algorithm yields a probability 1 of a return to the initial order (which in this case is not the same as certainty).

The number of shuffles that gives a probability of 0.99 of returning to the original permutation is, according to my calculation on WolframAlpha, on the order of 10⁶⁴. That is, in the abstract sense, maximum entropy has a complementary property that does indeed imply a probability near 1 of return to the original order in a finite period of time.

So one can take entropy as a measure of disorder, with an assumption of no bias in the mixing process, or uncertainty, where such an assumption cannot be made. Yet, one might conclude that we do not have a complete measure for dispersion or scattering. On the other hand, as far as I know, nothing better is available.

Consider manual shuffling of cards by a neutral dealer. Because of our observation and memory limitations, information available to us is lost. So we have the case of observer-centric entropy.

Consider two cards, face up. They are turned face down and their positions swapped once. An attentive observer will not lose track.

If we now go to three cards, we have 3!, or 6 permutations of card order. If the dealer goes through the same permutations in the same order repeatedly, then the observer can simply note the period and check to see whether the shuffle ends with remainder 0, 1 or 2, do a quick mental calculation, and correctly predict the cards before they are turned over.

But suppose the permutation changes with each shuffle. The observer now finds it harder to obtain the period, if any. That is, it may be that the initial permutation returns after k shuffles. But what if the dealer -- perhaps it is a computer program -- is using an algorithm which gives permutation A for the first shuffle, B for the second ... F for the sixth, and then reorders the shuffles as FABCDE, and again EFABCD and so on.

The work of following the shuffles becomes harder. Of course, this isn't the only routine. Perhaps at some point, the cycle is changed to EBCDA, followed by AEBCD, followed by DEBCA followed by ADEBC and so on.

With 52 cards, of course, the possibility of keeping track is exponentially difficult. In this sense, maximum or equilibrium entropy occurs when the observer cannot predict the position in the deck of an arbitrary card. That is, he has become maximally ignorant of the position of the card other than knowing it must be in there somewhere.

Some deterministic algorithms require output information that is nearly as large as the input information (including the information describing the algorithm). When this information ratio is near 1, the implication is that we are maximally ignorant in this respect. There are no calculational shortcuts available. (Even if we include the potential for quantum computing, the issue of minimum computational complexity is simply "pushed back" and not resolved.)

Such an algorithm in the card-shuffling case might, for example, form meta-permutations and meta-meta-permutations and so on. That is, we have 6 permutations of 3 cards. So we shuffle those 6 permutations before ordering ("shuffling") the cards. But there are 6! (720) permutations of the previous set; there are 720! permutations of that set, and so on. In order to determine which order to place the 3 cards in, we first perform these "meta-orderings." So it will take the computer quite some time and work to arrive at the final order. In that case, an observer, whose mind cannot follow such calculations, must content herself with knowing that an arbitrary card is in one of three positions.

That is, we have maximum or equilibrium entropy. Of course, 720! is the fantastic quantity 2.6 x 10¹⁷⁴⁶ (according to WolframAlpha), an absurd level of difficulty, but it illustrates the principle that we need not be restricted in computational complexity in determining something as straightforward as the order of 3 cards. Here we exceed Chaitin complexity (while noting that it is usual to describe the minimum complexity of a computation by the shortest program for arriving at it).

This amounts to saying that the entropy is related to the amount of work necessary to uncover the algorithm. Clearly, the amount of work could greatly exceed the work that went into the algorithm's computation.

Note that the loss of information in this respect depends upon the observer. If we posit some sort of AI sentience that can overview the computer's immense computation (or do the computation without recourse to an earthly machine), in that case the information is presumably not lost.

In the case of gas molecules, we apply reasoning similar to that used in card shuffling. That is, given sufficient time, it is certain that "classical" gas in a perfect container will reach a state in which nearly all molecules are in one corner and the remainder of the container holds a vacuum. On the other hand, the total energy of the molecules must remain constant. As there is no perfect container, the constant total kinetic energy of the molecules will gradually diminish as wall molecules transmit some kinetic energy to the exterior environment ("the enropy of the universe goes up").

So is the universe a perfect container? As it evidently has no walls, we cannot be sure that the various kinetic energies do or do not return to some original state, especially in light of the fact that the Big Bang notion cannot be tracked back into the Planck time interval.

And what of the intrinsic randomness inherent in quantum measurements, which makes it impossible to track back all trajectories, as each macro-trajectory is composed of many quantum "trajectories." The macro-trajectory is assumed to be what one gets with the decoherence of the quantum trajectories, but it would only take one especially anomalous "decoherence" to throw off all our macro-trajectory calculations. In addition, no one has yet come up with a satisfactory answer to the Schroedinger's cat thought experiment (see Noumena II, Part VI).

Maximum entropy can be viewed in terms of a binomial, success/failure scenario in which, as we have shown, the mean simply represents the largest set of permutations in a binomial distribution.

We should recognize that:

1. If we know the distribution, we have at hand system information that we can use to guide how we think about our samples.

2. The fact that the set of finite samples of a population is normally distributed gives us meta-information about the set of populations (and systems). (For a proof, see Appendix.)

So this fact permits us to give more credibility to samples close to the mean than those in the tails. However, our unstated assumption is that the cognitive process does not much affect the credibility (= probability area) of a specific outcome.

At any rate, these two points help buttress the case for induction, but do not nail that case down by any means.

More on entropy
One may of course look at a natural system and see that entropy is increased, as when an egg breaks upon falling to the floor. But how about when a snowflake melts in the sun and then its puddle refreezes after dusk? Has the organization gone down by much? Crystals come apart in a phase transition and then others are formed in another phase shift. Yes, over sufficient time the planet's heat will nose toward absolute zero, perhaps leaving for some eons a huge mass of "orderly" crystals. A great deal depends on how precisely we define our system -- and that very act of definition requires the mind of an observer.

Roger Penrose says people tend to argue that information is "lost" to an observer once it gets past a black hole's event horizon (58). But, I note, it never gets past the event horizon with respect to the observer. So, in order to make the case that it is lost, one needs a super-observer who can somehow manage to see the information (particles) cross the event horizon.

It is certainly so that those physicists who believe in a single, discrete, objective cosmic reality, accept as a matter of course the idea that Shannon information can, in theory, precisely describe the entire content of the cosmos at present, in the past and in the future -- even if such a project is technically beyond human power. By this, they are able to screen out as irrelevant the need for an observer to interpret the information. And, those who believe in objective information tend to be in the corner of those who believe in "objective" probabilities, or, that is, physical propensities. Yet, we should be quick to acknowledge a range of thought among physicists on notions of probability.

In 2005, Steven Hawking revived a long-simmering argument about black holes and entropy.

"I'm sorry to disappoint science fiction fans, but if information is preserved, there is no possibility of using black holes to travel to other universes. If you jump into a black hole, your mass energy will be returned to our universe but in a mangled form which contains the information about what you were like but in a state where it can not be easily recognized. It is like burning an encyclopedia. Information is not lost, if one keeps the smoke and the ashes. But it is difficult to read. In practice, it would be too difficult to re-build a macroscopic object like an encyclopedia that fell inside a black hole from information in the radiation, but the information preserving result is important for microscopic processes involving virtual black holes."

Another view is that of Kip S. Thorne, an expert on general relativity (56a), who believes black holes provide the possibility of wormholes connecting different points in Einstein spacetime (more on this in Noumena I in Part VI).

Hawking's updated black hole view
http://www.nature.com/news/2004/040712/full/news040712-12.html

Hawking paper on information loss in black holes
http://arxiv.org/pdf/hepth/0507171.pdf

What does it mean to preserve information in this scenario? Preservation would imply that there is some Turing machine that can be fed the scrambled data corresponding to the physical process and reconstruct the original "signal." However, this can't be the case.

Quantum indeterminism prevents it. If one cannot know in principle which trajectory a particle will take, then neither can one know in principle which trajectory it has taken. Those "trajectories" are part of the superposed information available to an observer. The Heisenberg uncertainty principle ensures that some information is indeed lost, or at least hidden in superposed states. So, thinking of entropy in terms of irreversibility, we have not so much irreversibility as inability, in principle, to calculate the previous state of the particles when they were organized as an encyclopedia.

"When I hear of Schoedinger's cat, I reach for my gun," is a quotation attributed to Hawking. That is, as an "objective realist," he does not fully accept information superposition with respect to an observer. Hence, he is able to claim that information doesn't require memory. (It is noteworthy that Hawking began his career as a specialist in relativity theory, rather than quantum theory.) Of course, if one were to find a way out of the sticky superposition wicket, then one might argue that the information isn't lost, whether the black hole slowly evaporates via Hawking radiation, or whether the particularized energy is transmitted to another point in spacetime via "tunneling." If bilocality of quantum information is required by quantum physics anyway, why shouldn't bilocality be posited for black hole scenarios?

So, assuming the superposition concept, does information superposition mean that, objectively, the particles won't one day in the far distant future reassemble as an encyclopedia? If the cosmos can be represented as a perfect container, the answer is yes with probability 1 (but not absolute certainty), given enough time. But this is the sort of probability assessment we have disdained as an attempt to use a probability tool at an inappropriate scale.

Consider just one "macro" particle in a vacuum inside a container with perfectly elastic walls in a zero-gravity field. If we fire this particle with force F at angle theta we could calculate its number of bounces and correspending angles as far into the future as we like. If we are given the force and angle data for any particular collision with a wall, we can in principle calculate as far backward in time as we like, including the points when and where the particle was introduced and with what force.

But even if we fire a quantum level particle at angle theta with force F, there is no guarantee that it will bounce with a classically derived angle. It interacts with a quantum particle in the wall and shoots toward an opposing wall's position in accord with a quantum probability amplitude. Assuming no quantization of space with regard to position, we then have a continuous number of points covered by the probability of where the next "bounce" will occur. And this continuity -- if it holds -- reinforces probability zero of reversibility.

At any rate, informational entropy is guaranteed by the HUP where equilibrium is "formal" incalculability in reverse.

Claude Shannon did not specify an observer for his form of entropy. Nevertheless, one is implicit because he was spotlighting signal versus noise. A signal without an observer is just so much background clutter. So we have a legitimate issue of information as a subjective phenomenon -- even though, from an engineering standpoint, it was a brilliant idea to ignore the property of cognition.

Leon Brillouin defines Shannon information as abstract "free information" and the data associated with physical entropy as "bound information." Shannon entropy is a term he eschews, reserving the term "entropy" for thermodynamic systems. Brillouin is then able to come up with inequalities in which the two types of information intersect. For example the "negentropy" of a physical system plus Shannon information must be greater than or equal to zero.

In fact, Brillouin finds that the smallest possible amount of negentropy required in an observation equals k ln 2, which is about 0.7 k, which is equivalent to 10^-16 Kelvin, in cgs units. That is, he says, one bit of information cannot be obtained for less than that negentropy value.

In his system, information can be changed into negentropy, and vice versa. If the transformation is reversible, there is no loss.

Any experiment that yields information about a physical system produces on average an increase in the entropy of the system or its surroundings. This average increase is greater than or equal to the amount of information obtained. In other words, information must always be paid for by negentropy.

My take is that he means that negentropy corresponds to the system's advance (propensity) information. He then feels able to dispose of Maxwell's imp by saying the critter changes negentropy into information and back into negentropy.

In a close parallel to algorithmic information's output to input ratio, Brillouin defines the efficiency of an experimental method of observation as the ratio of information obtained Δ I to the cost in negentropy
| Δ N |, which is the entropy increase Δ S accompanying the observation. Δ N = -- Δ S.

We note that work is defined in energy units and so there is no distinction, other than conceptually, between work and any other form of energy transformation. So I suggest that negentropy is a means of measuring a property associated with a capacity for work.

Entropy is inescapable, Brillouin says, because -- even without the Heisenberg uncertainty relation -- exact measurement of physical processes is impossible. Points may exist on a Cartesian plane, but they can never be precisely located, even with interferometers or other precision methods. If that is so, the same holds for measurement of time intervals. Hence there is always a degree of uncertainty which is tantamount to intrinsic disorder.

However, Brillouin, in an echo of Popper's intrinsic propensity, writes that "the negentropy principle of information, is actually a new principle, and cannot be reduced to quantum and uncertainty relations."

To Brillouin, free information is encoded in pure thought. He gives this scenario:

A person possesses free information in his mind.

He tells a friend about it in English, requiring a physical process. So the information has been transformed from free to bound, via sound waves and-or electromagnetic waves. If there were errors in his mind's coding of the transmission, some free information will be lost.
Further, distortion and thermal noise in the communication channel will result in loss of some bound information.
The friend is hard of hearing, and he misses a few words. Bound information is lost, his hearing organs being part of the physical process. Yet, once this pared information is in the friend's mind, it is now free information.
After a while, the friend forgets some of the information, representing a loss of free information.

It seems evident that Brillouin suspects a mind/mody dischotomy. Otherwise, the information held in the mind would be associated with a physical system, whereby the mind is modeled as a software program deployed in the brain's hardware (56b).

I am guessing that Brillouin's purpose is to find some way to discriminate between the principle of insufficient reason and what he takes to be the objective reality of dissipation processes described in terms of probability distributions (or densities).

Brillouin distinguishes between absolute information and "distributive" (relative) information. Absolute information would exist platonically in The Book, to borrow Paul Erdos's whimsical term for the transcendental place where mathematical theorems are kept. Relative information must be potentially of use, even though the thoughts of the consumers are disregarded.

In my view, absolute information is conveyed via ideal channels, meaning the physical entropy of the channel is disregarded. Relative information may well be construed to travel an ideal channel, with the proviso that that channel have an idealized physical component, to wit: a lower bound for physical entropy, which Brillouin in his thorough analysis provides.

Let us consider the Shannon information in the observer's brain.

We regard the brain's operating system as a circuit that carries information. We have the information in the circuit parts and the information being carried in the circuit at some specific time. Considering first the latter case, it is apparent that the operating system is a feedback control system (with quite a number of feedback loops), which, at least notionally, we might model as a software program with many feedback loops. Because of the feedback loops, the model requires that the program be treated as both transducer and receiver. So we can think of the brain's "software" as represented by a composite, iterative function, f(x-1) = x.

We may address the complexity of the data stream by relating this concept to the added redundancy given by each feedback loop, or in computer terms, by each logic gate. Each logic gate changes the probabilities of characters in the data stream. Hence, the redundancy is associated with the set of logic gates.

Now it may be that much of the data stream remains in an internal closed, or almost closed loop (which might help us get a bit of insight into the need for sleep of sentient animals). However, in the time interval [t_a,t_b], f(x_a) serves to represent the transducer and f(x_b) the receiver, with the channel represented here by the algorithm for the function. We must synchronize the system, of course, setting t_unit to correspond to the state in which f(x_a) has done its work, yielding x_a+1.

By such constraints, we are able to view the brain and the brain's mind, if modeled as a software program running on a hardware system that is not representable as a universal Turing machine, but must be viewed as a special TM with a "long" description number.

However, as Penrose and others have said, it is not certain that the computing model applies fully to the conscious mind, though it does seem to fit nicely with learned and instinctive autonomic behaviors, including -- at least in part -- unconscious emotional motivations.

What prevents perfect error correction in machine and in nature?

By error in nature we mean a sentient being misreading the input data. The "erroneous" reading of data requires an assessment by some observer. If we model a brain as a Turing machine -- in particular as a TM modeling a feedback control system -- we see that the TM, no matter how seemingly complex, can't make calculational errors in terms of judgments, as a TM is fully deterministic and doesn't have room for the mystical process of judgment (unless you mean calculation of probabilities). Even if it miscalculates 2 + 3, it is not making a judgment error. Rather the feedback loop process is at a stage where it can't calculate the sum and resorts to a "guess." But, the guess occurs without choice. The guess is simply a consequence of a particular calculation at that stage.

So we can't at this juncture really discuss error made in what might be called the noumenal realm, and so "error" in machine and in nature, if excluding this noumenal realm, mean the same thing: the exact signal or data stream transmitted is not received.

In that case, consider the Shannon ideal of some perfect error correction code. Suppose the ideal turns out to require a convergent infinite series, as it well could? Truncation could reduce error vastly, but the Shannon existence theorem would in that case hold only in the infinite limit.

So, if, at root, we require information to have the property of observability, then black hole entropy might be undefined, as is the proposed black hole singularity.

One can say that the observable entropy of a black hole is low, whereas its observable order is high. But, on the other hand, the concept of entropy is easier to accept for a system involving many particles. Consider an ordinary light bulb's chaotic (low "order") emission of photons versus a laser's focused beam (high "order"). Both systems respond to the second law, though the second case is "farther from equilibrium" than the first. But what of a one-photon-at-a-time double slit experiment? We still have entropy, but must find it in the experimental machinery itself.

One might say that a black hole, from the outside, contains too few components for one to be able to say that it has a high information value.

In this respect, Brillouin uses Boltzmann's entropy formula where W has been replaced by P, for Planck's "complexions," or quantized possible orderings, for an insulated system containing two bodies of different heat in contact with each other. The total number of complexions for the composite system is P = P₁(E₁) x P₂(E₂). Using basic physical reasoning, he arrives at the well-known result that the most probable distribution of energies corresponds to the two bodies having the same temperature.

But then he notes: "The crucial point in this reasoning is the assumption that P₁(E₁) and P₂(E₂) are continuous functions with regular derivatives. This will be true only for large material systems, containing an enormous number of atoms, when the energies E₁ and E₂ are extremely large compared to the quantum discontinuities of energy." That is, infinity assumptions closely approximate the physical system when n is large enough (56bb). Of course, without considering Hawking radiation, we see that the low entropy of a black hole is associated with very few numbers.

Yet entropy is often thought of as a physical quantity. Though it is a quantity related to the physical world, its statistical nature is crucial. One might think of entropy heuristically in terms of chaos theory. In that area of study, it is possible to obtain information about the behavior of the system -- say the iterative logistic equation -- when we can predict transformations analogous to phase shifts at specific points (we jump from one period to another). However, Feigenbuam's constant assures that the system's behavior becomes chaotic: the "phase shifts" occur pseudorandomly. Aside from the attractor information, the system has close to zero predictive information for values past the chaos threshold.

On Feigenbaum's constant
http://mathworld.wolfram.com/FeigenbaumConstant.html

So here we see that the tendency to "disorder" rises to a point of "maximum entropy" measurable in terms of Feigenbaum's constant. In the case of the iterative logistic equation, chaos looms as we approach the initial value 3. Between 0 and 3, predictability occurs to varying degree. At 3, we have chaos, which we may regard as maximum entropy.

With respect to chaos and nonlinear dynamics, Kolmogorov-Sinai entropy enters the picture.

K-S entropy is associated with the probabilities of paths of a chaotic system. This entropy is similar to Shannon entropy but the probabilities on the kⁿ branching paths are in general not equal at the nth set of branches. In his discussion of K-S entropy, Garnett P. Williams (60) puts K-S entropy (61) in terms of the number of possible phase space routes, and he writes:

H_{_{Δ t}} = Σ P_s (1/P_s) running from i = 1 to N_t, which is the number of phase space routes; s denotes sequence probabilities associated with specific paths.

"K-S entropy, represents a rate," notes Williams. "The distinctive or indicative feature about a dynamical system isn't entropy by itself but rather the entropy rate," which is simply the above expression divided by time.

Williams points out that K-S entropy is a limiting value; for discrete systems of observations, two limits are operative: the number that occurs by taking t to infinity in the expression above and letting the data box size (formed by overlaying an nxm grid over the system's output graph) go to zero.

He underscores three interpretations of K-S entropy:

Average amount of uncertainty in predicting the next n events.
Average rate at which the accuracy of a prediction decays as prediction time increases.
Average rate at which information about the system is lost.

This last interpretation is important when discussing what we mean by physical information (absolute versus relative). Also note that the K-S entropy reflects the Shannon redundancy of the system. That is, even a chaotic system contains structural information.

Now suppose we regard any system as describable as a Turing machine, regarding the output tape's content as a signal.

Questions:

A. What is the ratio of the information in the algorithm to the information in the final output tape?

B. Can we confidently check the output and discern the algorithm (how does this question relate to entropy)?

With respect to B, if we only have the output and no other information, it is impossible to know for certain that it is a readout of a particular algorithm. For example, suppose we encounter the run 0101010101. We have no way of knowing that the next digit, if any, will be a zero. Perhaps a different sequence will be appended, such as 001100110011, or anything.

We can of course use statistical methods to estimate the probability that the next digit is a zero, and so we may think of entropy as related to the general success of our probabilistic method. The fuzzier the estimation, the greater the "entropy."

Recall that if we consider the binomial probability distribution, a rise in entropy accords with the central region of the bell curve. All this says is that systems about which our knowledge is less exact fall toward the center. Considering that the binomial curve converges with the Gaussian normal curve
-- which is good for much of energetics -- it seems that there is indeed a close relationship between physical entropy and Shannon entropy.

With respect to A, perhaps more significant is the point that if we have a Universal Turing Machine and start a program going for each consecutive integer (description number), most will crash: go into a closed loop before printing the first digit (which includes freezing on algorithm step 1) or run forever in a non-loop without printing the first digit.

So the overwhelming majority of integers do not express viable Turing machines, if we are interested, that is, in a computable number. We see immediately that order of this sort is rare. But of course we have here only a faint echo of the concept of entropy. At any rate, we can say that if a TM is nested (its output in the substring [m,n] is a TM description number), we can consider the probability of such a nested TM thus: consider the ratio of the output string length to the substring length; the closer that ratio is to 1, the more orderly or improbable we may evaluate it to be and so as that ratio goes to 0 we can think of this as being "entropy-like."

Also, see my

Note on Wolfram's principle of computational equivalence
http://paulpages.blogspot.com/2013/04/thoughts-on-wolframs-principle-of.html

And the more nested TMs we encounter, each with "low entropy," the more we tend to suspect nonrandom causation.

When we say nonrandom, what we really seem to mean is that the event is not consistent with an accepted probability distribution and so we wonder whether it is more consistent with a different distribution. Still, we might be more interested in the fact that a randomly chosen TM's output string is highly likely to have either a very long period, or to be aperiodic (with aperiodic strings swamping all other outputs). Additionally, not only is the output information likely to be high (in the strict sense), but the input information is also likely to be high with respect to what humans, even assisted by computers, normally grapple with.

As for an arbitrary member of the set of TMs that yields computables, the algorithmic information (in the Chaitin-Kolmogrov sense) is encapsulated by that TM's complexity, or ratio of input bits (which include specific initial values with the bit string describing the specific program) to output bits. In this respect, maximally complex TMs are found at the mean of the normal curve, corresponding to equilibrium entropy. This follows from the fact that there are far more aperiodic outputs than periodic ones. An aperiodic output of bit length n requires m steps such that at the infinite limit, m = n.

It is straightforward that in this case TMs of maximum complexity accord with maximum entropy found at the normal curve mean.

Note, however, that here maximum complexity is not meant to suggest some arbitrary string having probability 2^-n. An arbitrary string of that length, singled out from the set of possible combinations, can be construed as "maximally complex," as when talking of a lifeform describable (?) by a specific bit string of length n, versus the set of bit strings of that length.

Where we really come to grips with entropy is in signal-to-noise ratio, which is obvious, but what we want to do is consider any TM as a transducer and its output as a signal, thus apparently covering all (except for the cosmos itself) physical systems. What we find is that no computer hardware can print a software output with 100% infallibility. The reason is that quantum fluctuations in the circuit occur within constrained randomness, meaning occasionally a 1 is misrepresented as a 0, and this can't be prevented. As Shannon proved with his noiseless channel theorem, employment of error correction codes can drastically reduce the transmission of errors, but, I suggest, only in the ideal case can we have a noise-free channel.

Shannon's groundbreaking paper
http://plan9.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf

So, no matter how well designed a system, we find that quantum effects virtually guarantee noise in a sufficiently long signal. If the signal is reprocessed (and possibly compiled) so as to serve as an input value, that noise may well be amplified in the next output, or that is the next output waveform will have its own noise plus the original noise. We see here that an increase in entropy (noise) is strictly related to the rules of quantum mechanics, but is also related to what is defined as a system. (The relationship of noise to dissipated heat is apparent here.)

Yet, Shannon's noiseless channel theorem means that in the theoretical limit, information can be conserved. No one has come up with a perfectly efficient method of transmission, although even simple error-correction systems exceed 90 percent efficiency. However, the physical entropy of the transmitter and receiver guarantees that over time transmission is not noiseless.

If we use an electronic or optical cable for the channel, we have these sorts of noise that affect the data stream:

Impulse noise, which is noticeable for low intensity signals. It is noticeable at a scale where quantum fluctuations become important.
Shot noise, which tends to be Poisson distributed, is often construed as a sum of impulse noises.
Thermal noise (Johnson-Nyquist noise) resulting from excited atoms in the channel exchanging energy quanta with the electronic or photonic current.
1/f noise is not well understood, but has the characteristic that it follows a power law distribution, which can be obtained by analysis of the normal curve.
Crosstalk occurs when signals of two channels (or more) interfere, sending one or more unwanted sub-signals down one or both channels. The undesired data stream may be decodable by the receiver (you overhear a part of another conversation, for example).
Gaussian or white noise is normally distributed when the data stream is analyzed. Gaussian noise tends to be composed of a number of uncorrelated or weakly correlated sources, as in shot noise, thermal noise, black body noise (from warm objects such as the earth) and celestial noise, as in cosmic rays and solar particles. The lack of correlation accounts for the normal curve sort of randomness.

In the case of 1/f noise, we can approach this issue by considering noise to be an anti-signal, where the maximum entropy Gaussian noise defines a complete anti-signal. (We note in passing that an anti-signal corresponds to the condition of maximum Shannon information.) So a 1/f anti-signal is composed of units with conditional probabilities. From this, we argue that in general a 1/f anti-signal has anti-information ~I_1/f < ~I_white. This gives us ~I_white - I_c = ~I_1/f. That is, the 1/f anti-signal contains structural (or propensity) information that reduces the noise.

Also affecting the signal:

Attenuation, or decreasing power (or amplitude of a specific frequency) along the channel. This is a consequence of cumulative effects of such things as thermal noise.
The signal's wave packet is composed of different wavelengths, which correspond to different frequencies. These frequencies over time get out of synchrony with one another. Envelope delay distortion is a function of the amount of delay among frequency components.
Phase jitter occurs when the signal components are not in phase. If viewed on an oscilloscope, the signal appears to wiggle horizontally.
Amplitude jitter occurs when the amplitude varies.
Nonlinear distortion is seen when the harmonics of the fundamental signal frequency are not equally attenuated.
Transients occur when the signal's amplitude abruptly changes. Impulse noise is often the agent behind transients (56c).

So we can perhaps see Brillouin's point that there is for virtually all practical systems a lower bound of physical entropy which cannot be avoided, which decreases the signal information. We see that the physical entropy of the channel corresponds well with Shannon's entropy -- if we are not considering long-term entropy. Suppose some error-correction process is found that reduces signal loss to zero, in accord with Shannon's noiseless channel theorem. The device that does the error correcting must itself experience entropy ("degradation" of energy) over time, as must the channel, the transducer and the receiver.

Scholarpedia article on 1/f noise
http://www.scholarpedia.org/article/1/f_noise

Shot noise
https://www.princeton.edu/~achaney/tmve/wiki100k/docs/Shot_noise.html

This brings us to the assumption that the information "in" or representing some block of space or spacetime can be said to exist platonically, meaning that not only do we filter out the observer but we also filter out the transmitter, the receiver and the information channel. This seems to be what some physicists have in mind when they speak of the information of a sector of space or the entire cosmos. But this assumption then requires us to ignore the physical entropy of the equipment. Even if we use an arbitrary physical model, we should, I would say, still include a lower limit for the physical entropy. That is, even idealized information should be understood to be subject to idealized decay.

Another non-trivial point. We have much argument over whether the entropy of the universe was "very high" near the Big Bang, before quantum fluctuations pushed the system into asymmetries, hence increasing entropy. Yet, if we consider the cosmos to be a closed system, then the total entropy is constant, meaning that the current asymmetries will smooth out to nearly symmetrical. Of course, if space expands forever, then we must wonder whether, for entropy calculation, the cosmos system is closed. Supposing it is a higher-dimensional manifold which is topologically cohesive (closed), then it is hard to say what entropy means, because in an (n > 3 + t) spacetime block, the distances between points "transcend" space and time. Now as entropy is essentially the calculation of a large effect from the average of many tiny vectors representing points in space and time, we face the question of whether our probabilistic measuring rod is realistic at the cosmic scale. And, if that is the case, we must beware assuming that probabilistic reasoning is both consistent and complete (which, by Goedel, we already know it can't be).

Brillouin shows that Boltzmann's equation (using Planck's energy quanta rather than work energy) can be obtained by the assumption of continuity, which permits the taking of derivatives, which then establishes that the most probable final state of two blocks in thermal communication is equality of temperature for both blocks. The continuity assumption requires "large material systems, containing an enormous number of atoms, when the energies E₁ and E₂ are extremely large compared to the quantum discontinuities of energy" (57).

That is, the approximation that deploys an infinitude only works when the number of micro-states is very large. So then, when we're in the vicinity of a black hole or the Big Bang, the entropy is considered to be very low. There are few, if any, identifiable particles (excluding those outside the scope of investigation). Hence, there is a serious question as to whether the continuity approximation applies, and whether entropy -- in the physical sense -- is properly defined at this level.

A common view is that, as a direct implication of the Second Law, there had to be extreme organization present at the Big Bang.

Consider the analogy of a bomb before, or at the instant, of detonation. The bomb is considered to have high organization, or information, being representable as a bit string that passes various tests for nonrandomness, whereas if a snapshot is taken of the explosion at some time after initiation and it is expressed as a bit string, that string would almost certainly show apparent randomness. How is the bomb more ordered than the explosion? Upon seeing the bomb's string representation we might say we have reason to believe the string is part of a small subset of TM output tapes, as opposed to the blast particles, the representative string appearing to be part of a larger subset of TM tapes.

Or, we might say that the string representing the blast is found within a standard deviation of the normal curve mean and that the string representing the bomb is found in a tail and so might be indicative of some unknown probability distribution (assuming we wish to avoid the implication of design).

In Cycles, Penrose talks about scanning cosmic background radiation to see whether there is a "significant deviation from 'Gaussian behavior'" (58). So here the idea is that if we don't see a normal distribution, perhaps we may infer "higher order" -- or, really, a different probability distribution that we construe as special.

Cosmic-scale entropy poses difficulties:

A. If the cosmos cannot be modeled as a TM, then information and entropy become undefined at cosmic scale.

B. Nonrandom information implies either intelligence or an undetermined secondary probability distribution. We note that the Second Law came about as a result of study of intelligently designed machines. At any rate, with respect to the Big Bang, how proper is it to apply probabilities (which is what we do when we assign an entropy value) to an event about which we know next to nothing?

If the universe were fully deterministic (a Laplacian clockwork), has a sufficiently long life and is closed, every event will repeat, and so at some point entropy, as seen from here and now, must begin to decrease (which is the same as saying the system's equilibrium entropy is a constant). However, as said, quantum fluctuations ensure that exact deterministic replicability is virtually impossible. This is very easily seen in the Schroedinger cat thought experiment (discussed in Noumena II, Part VI).

On the arrow of time. There is no arrow without consciousness. Newton's absolute time, flowing equably like a river, is a shorthand means of describing the human interface with nature -- a simplification that expediently neglects the brain's role.

And of course, what shall dark matter and dark energy do to entropy? One can speculate, but we must await developments in theoretical physics.

Interestingly, like the Big Bang theory, the out-of-fashion steady state theories predicted that distant galaxies should be accelerating away from us.

Three features of the steady state universe are found in currently accepted observational facts: the constant density of the universe, its flat geometry, and its accelerating expansion (explained by steady staters as a consequence of the creation of new matter) (59).

Summarizing:

1. Our concept of informational entropy (noise) increases toward the past and the future because of limits of memory and records -- though "absolute realists" can argue that the effect of observer, though present, can be canceled.

2. Entropy also increases toward the past because of the quantum measurement problem. We can't be sure what "the past" is "really." In a quantum system, phase space may after all be aperiodic so that it is undecidable whether a low entropy state is returned to.

3. It is questionable whether the entropy concept is applicable to the whole universe. I suspect that Goedel's incompleteness theorem and Russell's paradox apply.

The Many Worlds of Probability, Reality and Cognition

Chapter 8

No comments:

Post a Comment

Chapter 10

Report Abuse