The Talk.Origins Archive Post of the Month: October 2000

Subject:    Re: The information smokescreen
Newsgroups: talk.origins
Date:       October 7, 2000
Message-ID: gordon-11739F.19062607102000@[127.0.0.1]

Ok, I'm feeling foolish, so I'll take the plunge and try to defend the idea that thermodynamics is connected to information theory (though the connection is probably not what you'd expect). I've been working on this essay for a while now, but I don't really consider it complete yet (I haven't finished the reading I should do, and there are several more things that I need to cover -- especially the question of entropy's status as a state function, and some side issues raised by RNA synthesis). I was hoping to have more time to work on it before posting, but since the topic's come up again, you're getting it as-is. Consider yourselves warned...

(Also, this may look like a post-and-run, because I write slowly and am amazingly unreliable about replying. But I've been reading this group for about 16 years now, and don't plan to vanish anytime soon.)

This is all going to get rather long-winded and technical, and I suspect most of you won't find it worth wading through, so I'll start with the important part: my conclusions.

First: the information-theoretic entropy (or Shannon-entropy) of a physical system contributes to its thermodynamic entropy, at the rate of 1 bit of Shannon-entropy => k (Boltzmann's constant) * ln 2 = 9.57e-24 Joule/Kelvin of thermodynamic entropy.

Second: my first conclusion above doesn't really matter because under realistic conditions, the information contribution to thermodynamic entropy is so small that it can be safely ignored. For example, one terabyte (2⁴³ bits) of information corresponds to only 8.42e-11 J/K of thermo-entropy, which is the same as the entropy difference between 1 cc of water (about a thimblefull) at a temperature of 300 Kelvin (about room temperature) and the same amount of water at a temperature of 300.000000060 Kelvin (also pretty dang close to room temperature). This is a huge amount of information, but a negligible amount of thermo- entropy.

Third: it's possible to take my first conclusion even further, and regard a physical system's entire thermodymanic (well, stat-mechian actually) entropy as the Shannon-entropy of its microstate (i.e. its precise physical state), multiplied by k ln 2. In fact, if you regard k ln 2 as simply a units conversion factor (from bits to thermo-style units) then the system's Boltzmann-entropy is exactly the Shannon- entropy of its microstate.

Fifth: there's a bit of a controversy about whether Shannon-entropy should be regarded as information, or the opposite of information. I'll argue that the former view makes more sense, although neither view gives a definition of "information" much like the commonsense definition most people use. (For the moment, I'll simply note that erasing a computer's memory decreases its Shannon-entropy. Does that sound like an increase in information to you?)

Sixth: the connection between information and thermodynamics is even more irrelevant to the creation(/intelligent design)/evolution argument than my second conclusion would suggest, because Shannon-entropy is not monotonically related to the sort of information that creationists(/ID- ists) think evolution can't produce. For example, consider 3 strands of DNA, all of the same length: A) one copied exactly from another DNA strand; B) one coding for a completely novel, but viable, living organism; and C) one with a completely random sequence of bases. Of these strands, C is the least constrained, and therefore has the highest Shannon- and thermo-entropies; A is the most constrained, and therefore has the least (new) Shannon-entropy, and the least (new) thermo-entropy. B is in between the other two, despite the fact that it's the only one that contains any novel information of the sort creationists(/ID-ists) are concerned about. Put another way: if thermodymanic entropy were the relevant constraint, it should be easier for a completely novel organism to appear, than for an existing organism to be replicated; it's not, therefore thermodynamic entropy is not the relevant constraint.

So, if this is all as irrelevant as I say it is, what's the point? Why even bother? Well, some people find this sort of angels-on-the-head-of-a-pin-counting fascinating, and I am one such person. Therefore, let's get down to business. I'll start with a historical approach to the problem.

(Note: most of this is summarized from Maxwell's Demon: Entropy, Information, Computing, by H. S. Leff and A. F. Rex (1990) and the papers they reprint; if you're interested in the subject, I recommend tracking down a copy of this book. It's out of print, but don't let that stop you...)

The connection between entropy and information was first made in analysing Maxwell's Demon. For those unfamiliar with this, it's a thought experiment James Clerk Maxwell came up with around 1867, which seemed to imply that an intelligent being (aka the demon) could decrease entropy. The demon guarded a very small door between two rooms full of gas (initially at the same temperature, pressure, etc). The demon could open and close the door, depending on the gas molecules approaching it at any given moment (remember, it's a very small door). In the original version, he opened it only when a slower-than-average molecule was approaching from (e.g.) the right, or when a faster-than-average molecule was approaching from the left. After he'd been at this a while, the gas in the room on the right would be hotter than the gas on the left, and the total entropy of the gas would have decreased. Unless there was a also a compensating entropy increase involved, this would violate the second law of thermodymanics.

(There are also many variants, such as one in which the demon only lets molecules pass in one direction, thus producing a pressure difference; it comes to much the same thing.)

(Note: some people seem to think that the second law of thermodynamics applies differently to intelligent beings and/or their creations, than to unintelligent undesigned things. This is incorrect; the second law in its standard formulations applies equally to humans, steam engines, and rocks. Maxwell's demon is interesting in part because it seems to imply an exception to this equality before the law.)

('Nother note: although this hypothetical being is generally referred to as a demon, this should not be taken to mean it's either supernatural or evil. It's just the name the wee beastie's known by, that's all.)

The apparent problem presented by Maxwell's demon has been "resolved" in various ways over the years. The first resolution of interest here is due to Leo Szilard (1929), who showed that there was an entropy increase associated with making and remembering the measurements (e.g. of approaching gas molecules) necessary for a Mazwell's demon to do its work. He also derived a lower limit for that entropy increase, which reduces to k ln 2 in the case of a symmetric binary measurement. Unfortunately, it's a bit hard to tell from his paper exactly how and where this entropy increase appears.

Leon Brillouin (1950a, 1950b, 1956) had a go at clarifying matters by showing that in order to see the incoming gas molecules against the background of thermal radiation put out by more distant gas molecules (and the walls, etc), the demon must illuminate the area around the door with a high-temperature light source, which heated the surrounding gas and thus produced enough entropy to satisfy the second law (he did not, as far as I can tell, consider the possibility that there might be more efficient ways to detect gas molecules). He then went (IMHO) completely overboard in identifying information with negative entropy, leading to much of the confusion on the subject we're used to seeing pop up from time to time in talk.origins (and elsewhere). To be fair, his conclusions are not unreasonable: given that information can only be gotten at the cost of an entropy increase elsewhere, and that one can then spend information (by opening/closing the door) to get a decrease in entropy elsewhere, it makes sense to think of information as a sort of anti-entropy. It just turns out to be wrong (or at least highly misleading).

The first bit of counterevidence came from Rolf Landauer's (1961) analysis of the limits of the thermodynamic efficiency of computation. He showed that logically irreversible operations -- essentially, those that destroyed information -- produced entropy. As Charles H. Bennett and Landauer (1985) put it:

Landauer also argued that logical irreversibility was a necessary feature of (useful) computation, since performing an entire computation reversibly would require preserving not only the input data, but also an impractical quantity of intermediate results from the entire course of the computation. Bennett (1973) found a way around this: by running the entire computation (keeping piles of intermediate results), copying the desired part of the result, then running the compatation in reverse to eat up those unwanted intermediate results, it would be possible to wind up with just the input data and desired output data. This opened up the theory of reversible computation which is even more thoroughly irrelevant than most of what I'm talking about, so I'll just refer the interested reader to Richard Feynman (1996) and Bennett (1982), and get back to the topic at hand...

Bennett (1982, section 5) also dropped the other shoe on Brillouin, by showing that there are more efficient ways to make measurements. As he puts it:

Ironically, while this moots Brillouin's analysis of the demon, it actually fits well as a refinement of Szilard's. To oversimplify just a bit, Szilard broke down the demon's phases of operation thus:

Szilard showed that phase 1 must produce entropy. Bennett broke it down further:

...and showed that the entropy increase comes in phase 1a, not 1b. This puts rather the opposite spin on things: entropy increase is associated, not with the production of information, but with its destruction; hence information is not the opposite of entropy, it's more-or-less the same thing as entropy. In fact, if the demon has a large memory capacity, it's possible to omit step 1a for a number of cycles of the demon's operation, and produce an arbitrarily large decrease in the gas's entropy at the cost of filling the demon's memory with the results of old measurements. In this way, it is possible to convert the gas's entropy into information. It's still entropy, mind you -- the entropy of the demon's memory banks increases with the amount of information stored there, so all we've really accomplished is to move entropy from the gas to the demon's memory, and change its form somewhat.

Actually, Bennett (still 1982, section 5; see also Feynman 1996, pp. 137-148) has an even more elegant way to accomplish this conversion. He imagines a heat engine that takes in blank data tape and heat, and produces work and tape full of random data. The principle is fairly general, but let's use a version in which each bit along the tape consists of a container with a single gas molecule in it, and a divider down the middle of the container. If the molecule is on the left side of the divider, it represents a zero; if it's on the right, it represents a one. The engine is fed a tape full of zeros, and what it does with each one is to put it in contact with a heat bath at temperature T, replace the divider with a piston, allow the piston to move slowly to the right, and then withdraw the piston and replace the divider in the middle (trapping the gas molecule on a random side). While the piston moves to the right, the gas does (on average) kT ln 2 of work on it, and absorbs (on average) kT ln 2 of heat from the bath (via the walls of the container). Essentially, it's a single-molecule ideal gas undergoing reversible isothermal expansion. And while the results on a single bit will vary wildly (as usual, you get thermal fluctuations on the order of kT, which is as big as the effect we're looking at), if you do this a large number of times, the average will tend to dominate, and things start acting more deterministic.

Now, the operation of this engine is thermodynamically reversible. That means that just as it can convert heat+blank tape into work+random tape, it can equally well be run in reverse to convert work+random tape into heat+blank tape. It also means we can apply the formula for entropy change in a reversible process, dS = dQ/T, to show that the random tape has k ln 2 per bit higher entropy than the blank tape.

At this point, I should probably digress a bit on the question of just what sort of information it is we're talking about here. By the commonsense definition most of us generally use, information is closely associated with meaning (one version I rather like, is that information is defined as data together with its interpretation). But while one can regard the "information" in the demon's memory banks as meaningful (i.e. it records the former state of the gas particles), it's rather hard to think of any way to claim that the random output tape of that heat engine means anything at all. The first thing you have to realize is that the relevant definitions of information are those from statistical information theory, which are quite different from the commonsense definition. For one thing, they take no notice at all of meaning. As Claude Shannon (1948) puts it,

... and Shannon's measure of information, which he dubbed entropy, is essentially a measure of the size of that set of messages (or, if you prefer, a measure of how unconstrained the message is). The random tape has more possible sequences (is more unconstrained) than the blank tape, and therefore has higher Shannon-entropy, and in this interpretation, more information.

This is not the only interpretation of Shannon-entropy, and the lack of constraint it measures. The other view, which I think originated with Norbert Weiner (1948) (and was adopted by Brillouin and many others), identifies information with decreases in Shannon-entropy, increases in constraint, the elimination of possibilities, and (as it is usually put) the resolution of uncertainty. The two views aren't as contradictory as they might appear; they both (mostly) use the same math, they just interpret the various quantities differently. I think the difference can best be illustrated by an example: suppose Alice has some information (I don't particularly care what it is -- the result of a measurement, a coin flip, or something she wrote -- whatever), which she keeps secret from everyone else. Weiner would say that to Alice, her secret is information, and to everyone else it's the opposite of information. I dislike this approach for several reasons, one of which is its subjectivity; I don't like having the amount of information associated with something depend on who you ask. I'd rather say that Alice's secret is information; from Alice's point of view it's information she does have (i.e. knowledge), and from everyone else's point of view it's information they don't have (i.e. ignorance, or uncertainty). Weiner views knowledge as the opposite of ignorance; I regard them as the same thing, viewed from opposite directions (much like, say, forward and backward).

Another objection to Weiner's interpretation is that, while Shannon- entropy is certainly useful as a measure of uncertainty, it's also often useful as a measure of other things, some of which even Weiner would agree are information. Quoting Shannon (1948) again:

Information theory was originally developed for analysing communication, so let's look at an example of communication: Suppose Alice sends Bob an email message consisting of two enclosed documents, which I'll call X and Y. Suppose some mail server along the route loses the second attachment, so Bob only receives X. To simplify things, assume X and Y are statistically independent, that the mail server's behavior is deterministic, and that the message carries no information other than the two attachments. Let's look at some of the relevant measures of information/uncertainty/whatever involved:

So, since X is the portion of the message that was successfully communicated, and the entropy of X is the amount of information successfully communicated, how am I to escape the conclusion that the amount of information carried by X is precisely the entropy of X?

I should perhaps emphasize how little either view of Shannon-entropy has to do with what we usually call "information". Consider three possibly information-containing symbol sequences:

B is the least constrained, and thus has the highest Shannon-entropy, so by my interpretation it has the most information. C is the most constrained, so it has the lowest Shannon-entropy, and by Weiner's interpretation the highest information (or maybe he would say we have the most apriori information about its sequence). But A is the only one that means anything (oh, come now, some of this must mean something), and thus the only one carrying any commonsense-information. But its Shannon-entropy is in between B and C, and about the same as D.

I should also point out how little all of this matters. If there were no connection between Shannon-entropy and thermo-entropy:

The connection creates exceptions to both of these principles, but the exceptions are so small that they can usually be safely ignored:

Now, these effects can get significant when dealing with atomic-scale memory devices, such as DNA and RNA. Bennett (1982, section 5) points out that the way RNA strands are synthesized in a sequence-specific way (by RNA polymerase) and degraded in a sequence-nonspecific way (by enzymes such as polynucleotide phosphorylase) wastes about kT ln 4 (~= 1.4 kT) of free energy per base. This is a small, but not completely insignificant, part of the free energy consumed by the process (about 20kT for synthesis; he doesn't give the figure for degradation).

(Actually, the RNA example is worth looking at in more detail than I'm going to bother with here. kT ln 4 per base is enough to throw the equilibrium reactant concentrations off by a factor of 4 between the sequence-specific and -nonspecific reactions, which should be experimentally measureable [but I haven't seen a measurement of it yet]. This difference makes sense from the point of view of kinetics: a sequence-specific synthesis reaction can, at any given moment, only react with one-quarter of the available bases that a nonspecific reaction could. Also, it interesting to examine exactly where the entropy decrease due to sequence-specific synthesis is: in the DNA, the RNA, or only in the entropy of both taken together? I'd argue for the last, meaning that this is one of those cases where entropies don't sum nicely.)

Before ending this section, I'd like to note that these consequences -- such as they are -- are all in the realm of thermodynamics, not information theory. There's still no information-theoretic analog of the second law of thermodynamics. Abstract information-processing systems can still produce and destroy Shannon-entropy without limit. The only time there's any limit on the increase or decrease of Shannon- entropy is when it's encoded in the state of a system that's subject to thermodynamics, and then the limits are due to thermodynamics, not any information-theoretic consideration.

Back in my conclusions, I promised to go overboard with this connection (but over the other side of the boat from Brillouin), and have a go at identifying all thermo-entropy (Boltzmann-entropy, actually) with the Shannon-entropy of a system's microstate. This is really just a simple bit of math, given the similarity between Boltzmann's formula for the entropy of a system in terms of the probabilities of all of the microscopically distinct states it might be in:

and Shannon's formula for the entropy of a set of possible messages (or states, or whatever) in terms of their probabilities:

The choice of a base for the logarithm in Shannon's formula is essentially arbitrary, except that it determines the units of the result. Base 2 is traditional, because it gives the result in bits, which are the most popular units for measuring information. But they're not the only legitimate unit. If you happen to want the entropy in trits, just use base 3; for decimal digits, use base 10; for nats, use base e (natural log, just like the Boltzmann formula). And if you want the result in Joules per Kelvin you can use base e^7.243e22, or take the result in nats and multiply by k. Either way, you'll get the same result you would've from Boltzmann's formula.

(Actually, if you want me to do this units stuff properly, let me claim that information, temperature, and energy are dimensionally related: temperature = energy / information; 1 Kelvin = 9.57e-24 Joule/bit; and Boltzmann's constant is properly written k = 1.38 e-23 J/K*nat = 9.57e-24 Joule/Kelvin*bit = 1. If you allow that, I can use the same units for information and thermo-entropy without blinking.)

What sense can we make of this? I'd say that it means the entropy of a system is the total amount of information represented by the system's state. Weiner's followers would prefer to say it means the entropy of a system is a measure of our uncertainty of (or lack of information about) the precise state of that system. Either way, it doesn't affect the physics at all -- they're just different ways of looking at the same old familiar entropy.

Bennett, Charles H. (1973), "Logical reversibility of computation", IBM Journal of Research and Development, v. 17, pp. 525-532. Reprinted in Leff and Rex (1990), pp. 197-204.

Bennett, Charles H. (1982), "The thermodynamics of computation -- a review", International Journal of Theoretical Physics, v. 21, pp. 905-940. Reprinted in Leff and Rex (1990), pp. 213-248.

Bennett, Charles H. and Rolf Landauer (1985), "The fundamental physical limits of computation", Scientific American, v. 253, pp. 48-56.

Brillouin, Leon (1950a), "Maxwell's demon cannot operate: Information and entropy. I", Journal of Applied Physics, v. 22, pp. 334-337. Reprinted in Leff and Rex (1990), pp. 134-137.

Brillouin, Leon (1950b), "Physical entropy and information. II", Journal of Applied Physics, v. 22, pp. 338-343.

Brillouin, Leon (1956), Science and Information Theory, Academic Press Inc, New York.

Feynman, Richard P. (1996) edited by Anthony J. G. Hey and Robin W. Allen, Feynman Lectures on Computation, Addison-Wesley, ISBN 0-201- 48991-0

Leff, Harvey S. and Andrew F. Rex (1990) Eds. Maxwell's Demon: Entropy, Information, Computing, Princeton University Press, New Jersey, ISBN 0-691-08727-X and 0-691-08726-1

Landauer, Rolf (1961), "Irreversibility and heat generation in the computing process", IBM Journal of Research and Development, v. 5, pp. 183-191. Reprinted in Leff and Rex (1990), pp. 188-196.

Shannon, Claude E. (1948), "A mathematical theory of communication", Bell System Technical Journal, v. 27, pp. 379-423 and 623-656. Reprinted in Claude E. Shannon and Warren Weaver, The Mathematical Theory of Communication (University of Illinois Press, Urbana, 1949); and http://cm.bell-labs.com/cm/ms//what/shannonday/paper.html

Szilard, Leo (1929), "On the decrease of entropy in a thermodynamic system by the intervention of intelligent beings", Zeitschrift fur Physik, v. 53, pp. 840-856. English translations: Behavioral Science, v. 9, pp. 301-310 (1964); B. T. Feld and G. Weiss Szilard, The Collected Works of Leo Szilard: Scientific Papers, (MIT Press, Cambridge, 1972), pp. 103-129; J. A. Wheeler and W. H. Zurek, Quantum Theory and Mearurement (Princeton University Press), pp. 539-548; and Leff and Rex (1990), pp. 124-133.

Information and Thermo-Entropy

Post of the Month: October 2000