Pseudogenes: Matching ad-hoc rationalizations against constrained explanations

Post of the Month: July 2014

Howard Hershey

Subject:    | From a Biblical Creationist perspective ...
Date:       | 04 Jul 2014
Message-ID: | ef319927-e824-4598-bcf5-a5a120caf3cf@googlegroups.com

The first quote being replied to is from a Biblical Young Earth Creationist named itbgcthate9 who posted here claiming that the designation "pseudogene" is an evolutionist interpretation.
All odd inset quotes (">" and ">>>") are from him.
The even inset quotes (">>" and ">>>>") are earlier replies from Roger Shrubber

> My terminology was correct. That pseudogenes are non-functional is an evolutionary
> interpretation. Molecular biologists are starting to come to a different conclusion.

Howard Hershey begins his Post of the month:

That pseudogenes cannot produce a full-length protein that has the same function as the related non-pseudogene is an interpretation based on an understanding of how the genetic code works (known since about the mid-1960s). That is, it is based on a knowledge of molecular biology. The *identification* of pseudogenes required the ability to sequence DNA and identify similarities in sequence. How, exactly, does any of this, either recognizing that a sequence cannot produce a functional protein or that two sequences are related, involve an evolutionary interpretation as an a priori assumption?

I agree that finding such related but non-functional sequences, combined with the knowledge that duplications/deletions of stretches of DNA are among the *already known* types of mutation that can occur in organisms (and often without significant harm to organismal function), does lead one to conclude that such duplications (harmless in themselves) could allow further accumulation of mutations that would lead to a DNA sequence with multiple errors wrt being translated into a protein. That is, evolution certainly can *explain* why pseudogenes can be found in organismal DNA. That is what scientific theories do. And nothing about that explanation would exclude the possibility of some fraction of pseudogenes acquiring a new functionality. Indeed, some functional genes are known to have arisen by the fusion of two unrelated genes.

Your theory, that *all* identified pseudogenes must have crucially important functions and must have been designed for that function, looks untenable. Finding a few examples of pseudogenes that have some detectable function is not evidence that *all* pseudogenes have functionality. The way to test your idea would be delete a pseudogene and see if that has any significant biologic consequence. Best to choose a pseudogene like the one that makes vitamin C.

From "Roger Shrubber" here

>>>> We have observed such pseudogenes to arise by well understood mechanisms of mutation.
>>>> That's a fact. No evolutionary theory yet.

From "itbgcthate9" here

>>> Mutation is a fact, we agree on that.

>>>> Adding evolutionary theory, we expect that when this sort of damage occurs to
>>>> non-essential genes, they will be inherited by subsequent generations but be subject
>>>> to further mutation at the rate of mutation with no selective pressure to not mutate.

That is, evolution can *explain* the existence of sequences related to a functional protein-coding gene that does not actually encode any functional protein. That is what theory does.

>>> How do you know they are 'non-essential'? I used the term 'junk DNA' but you didn't
>>> and perhaps it isn't. I said I think we need to learn a lot more about exactly whether
>>> this is junk. It may turn out to be no friend of evolution.

Deletion is a pretty good (but not perfect) test for non-essentiality. Creationists are the ones claiming that *all* DNA is "essential". They have to demonstrate that universal essentiality. I would say that no observable effect after deletion and no ability to make any protein or a full, functional protein are pretty fair tests (the latter can be gotten from sequence alone). Moreover, there is a lot of DNA not in pseudogenes, functional genes, nor in regions that have any obvious regulatory effect in genes where surrounding sequence is searched for sequences that have a significant regulatory effect (for example by doing mutations) on a gene. In addition it is clear that even within coding sequences there is quite a bit of sequence change that has little or no effect (which is why the same protein from relatively distant animals can differ).

From "Roger Shrubber" here

>> In the case of biosynthetic routes for vitamins that are found in ready supply in food
>> stuffs, those routes are non-essential. That's pretty obvious. There are organisms that
>> can synthesize all 20 genetically encoded amino acids but many that do not. Those that
>> do not get those amino acids from their diet. Vitamins work the same way. This is a
>> familiar concept to those who have studied biology, or at least it should be. It also
>> forms part of evidences for evolution.

> You have assumed that because pseudogenes do not participate in synthesis

They do not form a functional protein (often none at all), if that is what you mean by "participate in synthesis".

> they perform no biological function. This assumption may be correct, but evolutionary
> preconceptions have hampered research in this area,

No it hasn't. Everyone recognized that non-coding sequences *can* have regulatory effects. In fact, a great deal of effort has gone into finding the typically short sequences imbedded in long stretches of DNA that have significant regulatory effects. That is what led scientists to regions like HOX boxes.

> and functions may yet be found.

But probably won't. Again, lots of mutational studies have shown us that much eucaryote DNA sequence can be grossly mutated, including by deletion as well as point mutation, with little or no detectable functional consequence. Even coding sequence can be significantly changed (including small deletions that don't change reading frames) and still produce quite normally functioning proteins that are equivalent in function. This can be seen by the fact that there are many proteins in different organisms that perform the same function with quite different sequences.

> You should be more concerned with how evolutionary mechanisms created these complex
> routes in the first place, instead of how easily they seem to break them (assuming they
> are broken).

Interestingly enough often the same mechanism that lead to pseudogenes (duplication followed by further mutation) can produce related proteins that have slightly different functions. That is why we often see *families* of genes. Similarly, we sometimes (but more rarely) see fusion genes (due to deletion or certain kinds of duplication or DNA rearrangements) that have a new or novel functionality.

>>>> Our observations of pseudogenes finds that in a way that is consistent with our
>>>> models of common descent previously determined by the fossil record.

>>> This is an interpretation.

Of course it is. That is what scientific theory does; it explains (interprets) observations in a consistent and constrained way.

>>> Another is that God used the same 'blueprint' in His creation.

The difference between a scientific theory and your religious 'theory' or explanation is that your theory is unconstrained and can explain anything by an appeal to the old "that's the way God wanted to do it in his unexplained 'blueprint'". It is also inconsistent with evidence in that you imply that all DNA must have exactly the sequence they do have, while we *know* that all individual organisms do not have the same sequence and many function quite well.

>> That is an ad-hoc rationalization. Do you understand the difference between a
>> scientific inference from data and a rationalization to support a preformed conclusion?

> The problem is your "scientific inference" is nothing more than interpretation, based on
> assumptions and incomplete data.

Sure. And so is yours. The difference is that the scientific explanation is consistent with the evidence and constrained to non-supernatural explanations while your interpretation is inconsistent with the evidence (assuming that you are claiming that all DNA has significant utility) and also is unconstrained, relying on appeals to a supernatural entity that can explain anything.

>>>>> What is strange, however, is that natural selection hasn't eliminated this so called
>>>>> junk. Further, because of the lack of selective pressure, why has this junk not
>>>>> mutated beyond all recognition?

Subject change: selective pressure (or not) on junk DNA

There is a difference between selective pressure to eliminate 'non-coding' sequence and the lack of selective pressure to eliminate it. Only *if* there is selective pressure to eliminate it would most (excluding only the most recent examples) non-coding DNA be eliminated. The opposite of selection (positive or negative) is neutral drift. That is what happens to 'junk'.

Given enough time, junk would indeed eventually be mutated beyond recognition. In that case it would simply be non-coding DNA that doesn't have any detectable functional counterpart (either internally or in another species). Pseudogenes do differ in their amount of similarity to their functional counterpart, from those that are barely different (a point mutation that can sometimes revert back to function) to those that are barely recognizable.

>>>> Why do you think this is strange? We know the rate of mutation. We know the cost of
>>>> carrying extra DNA. The cost is so negligible that there's no real selective pressure
>>>> to eliminate extra DNA.

That is the case for *most* eucaryotes, but not all. Some of the yeasts do appear to undergo selection for small genomes.

>>> The cost is substantial, and mechanisms for removal are known.

Really? Evidence. Particularly for the mechanisms of removal. Deletion would work, but we are talking about removal of only those sequences that serve no coding, regulatory, or other function. That is quite a bit harder to do.

>> The cost is not substantial. The metabolic cost of maintaining the entire human genome
>> is less than 0.02% of the energy of a cell. That's the whole genome. The cost of a
>> single gene will be more than 100,000 fold less than that. So less than 2 parts per
>> million of the energy cost of maintaining a cell. If one made the crude approximation
>> of this being a survival advantage, the selection coefficient would be too small to be
>> eliminated within 40 million generations.

>> I'm thinking you don't actually understand the the theory of evolution that you want to
>> reject.

> You have omitted the cost of manufacturing this "junk" in the first place.

Duplication. Rearrangement (translocation, inversion). Single events can produce significant amounts of repetitive DNA. Almost no cost.

> Even if the cost is small relative to other cellular activities, if it represents over
> 90% of the genome, compounded over the course of millions of generations then it's a
> substantial cost. It becomes even worse if some of the "junk" is translated, which is
> supposed to be a requirement for evolution.

Didn't you read what he wrote? Or didn't you understand it? And evolution has no ability to predict the future. It can only respond to the impact of the change after it has occurred compared to what exists without that change. And the cost of most smallish duplications (or even polyploidy for many organisms) is energetically negligible.

>>>> The pseudo genes do gain additional mutations through time but are still recognizable
>>>> as being very like previously functional genes that have been mutated.

>>> Another interpretation. There are a lot of places to look. It's not surprising that
>>> similarities can be found.

As I mentioned, there is a range of similarity in pseudogenes. If a sequence doesn't have enough similarity to a functional gene, it simply isn't called a pseudogene. It is just called non-coding DNA. And there is a lot of it, most of which has no detectable regulatory function either.

>> That's your worst excuse yet. If it was random sequence matches you would expect many
>> such random matches to one or two exons scattered across the genome rather than all 6
>> exons and all 5 introns in series. You don't. Instead you see an accumulation of
>> mutations consistent with there being no selection for sequence and an evolutionary
>> separation consistent with the mutations rates we observe to be acting today.

> If you compare the DNA of any two organisms, regardless of where they are on the
> evolutionist's tree of life, there is inevitably a portion of almost identical DNA.

Minimum length of DNA segment for unique identification

A stretch of random artificial DNA needs to be about 17 nt long to include all possible sequences present only once in a hypothetical 17 billion nt genome. So any identical sequences significantly longer than 18-30 nts can be assumed to be related by more than chance alone. Either they are closely related genes or duplicates or repetitive elements of various kinds. Similar arguments can be made for longer sequences that differ by x nts. The longer and more similar the sequences the more obvious it is that the sequences are not similar by chance alone. Pseudogenes, to be identified and called pseudogenes, must have significant detectable similarity *and* also have mutations that prevent them from producing functional proteins.

> This is actually evidence for a common Creator, not common descent (the non-coding
> DNA in particular should have been mutated beyond all recognition).

Non-coding DNA that changes at the rate expected for DNA that is not undergoing selection (positive or negative) will change at a rate of u per nt (the rate of fixation of a selectively neutral mutation). [u is the mutation rate per nt/generation. In a population of size N, there will be u*N mutations at any specific nt in a gene duplicate. The chance of eventual fixation of any specific mutation is 1/N. Thus the rate of fixation is u*N/N = u per nt site per generation. u is typically around 10^-8. Thus, in a 1000 nt non-functional duplication (say one where only half the protein coding sequence is present) would have about one chance in 10^-5 chance of a new mutation being fixed per generation. Without going into the math, it is pretty clear that it would take a significant number of generations before half of the 500 nts changed to something else under conditions of selective neutrality.

> Furthermore, we don't just see a random accumulation of mutations. There is
> considerable evidence that some mutations are not random, suggesting genomes
> were designed to mutate in certain "hotspots".

Unfortunately there is no correlation of these mutational hotspots with the functional effect of such a change. Achondroplastic dwarfism is one of the most common genetic abnormalities to occur in humans, and it is due to a mutational hotspot (in this case a point mutation). There are other hotspots for inversion, deletion, and translocation (often at or near repetitive sequences; e.g. fragile X). And other hotspots for small duplication/deletion events (e.g. Huntington's chorea). If that is part of the "design" that your God intended, He doesn't come off looking like a particularly nice guy.

[BTW, the likely reason why such sites are not strongly selected against is precisely because even the hotspot mutations are rare.]

> And there's another complication; in the so called "ultraconserved" regions there's
> been virtually no mutation at all in the supposed hundreds of millions of years.

These are easy to explain. It isn't that mutation doesn't occur at these spots. It is that when it does, the result is lethal or highly deleterious and the mutants are selected against. This typically occurs in proteins where nearly every amino acid in the protein connects to the substrate, such as histones or cytochrome c. Typically small proteins.

But that is different beast from a nt site that, because of its flanking sequence, is harder to mutate than the average nt site in the opposite way of being a mutational hotspot. It is likely that such sites that are harder than average to mutate are not under strong selective pressure either, but I don't know that for sure.

> Even for an evolutionist, the logical conclusion must surely be that pseudogenes do
> serve some kind of purpose.

Let's see. Pseudogenes are relatively easy to generate via duplication of some sort. But aren't under strong (or even significant) selective pressure to either remain unchanged nor to be eliminated by some mechanism that wouldn't introduce more damage. It is hard to see the mechanism that would *specifically* remove only the duplicate sequences. I fail to see why the logical conclusion would be that pseudogenes serve some purpose. And there are deletion and mutation experiments that indicate that they are not particularly crucial to any cell function in general (although that doesn't exclude that as a possibility).

>>>> I thought you claimed to understand what evolution claims? It does not appear that
>>>> you do. In fact, you are echoing some common misunderstandings about evolution in
>>>> your comments above.

>>> Sorry, but your clarifications haven't help me.

Chimp versus human DNA sequence differences

>> Pay more attention. Do you know the rate of mutation that we observe today and how we
>> observe it?

> Measuring mutation rates is not an exact science. I don't think anyone knows what the
> actual rates are, or even if the current estimates are accurate. For example, one method
> is to use the observed "error rate" of DNA replication, but the results can vary widely.
> Further, replication isn't the only source of mutation and the problem may be compounded
> by mutations in the repair mechanisms themselves. Another method measures differences in
> the genomes of related individuals, but the identification of genuine new mutations is
> difficult, and again the results appear inconclusive (even in the narrow areas that have
> been researched).

> So the answer to your question is no, I don't know the rate of mutation we observe
> today. You are welcome to enlighten me.

I see that there are a lot of things you don't know. That you should. Or could, if you looked at the actual science.

>> Do you know the rate at which differences between various apes have been apparently
>> accumulating according to evolutionary models?

> The most commonly quoted rate seems to be 2.5 * 10^-8 per base pair per generation. But
> this is based on the measurement of nucleotide substitutions between humans and chimps,
> with assumptions about the time since "divergence" and the population size of the "MRCA"
> (among others). So common descent is assumed, and then "proved" by subsequent
> calculations!

Wrong again. The question asked was, "Is the amount of *observed* difference in the DNA sequences of modern humans and modern chimps consistent with the hypothesis that most of those differences are selectively neutral (which would be due to the *slowest* mechanism of DNA change), given the assumption that the time of divergence was 5-7 mybp (which is the best current scientific estimate) and assuming a certain generation time (the size of the MRCA [Most Recent Common Ancestor] has very little to do with it). The answer was: Given those assumptions (which are the best ones we have), the answer is "Yes."

Now, if you have *evidence* that humans and chimps do not share any ancestry, you can present that evidence. If you have evidence that humans and chimps diverged only 3000 years ago, you can present that evidence. If you have evidence that *any* of the assumptions are wrong, you can certainly present that evidence.

The argument scientists make that it is possible for the amount of observed difference in DNA of human and chimp to accumulate after divergence *if* humans and chimps did diverge, *if* the populations diverged around so many years before present, *if* mutation rates observed today are the same throughout this time frame, and myriad other assumptions is not a "proof". It is more evidence that the *observed results* are consistent with the other evidence in an evolutionary framework.

Again, pick an assumption and show us the evidence that it is false. Don't complain that, given the evidence we have as to, times, mutation rates, generations, etc. that the results are consistent with an evolutionary explanation. Attack the assumptions by showing us that they are wrong. That the *evidence* really shows us that the amount of DNA difference we observe cannot be explained because you have evidence that, say, humans and chimp fossils can be seen to be identical, unchanged, and present in all geologic strata back to the beginning of life on this planet. Show us *evidence* that there really has only been 4000 years since the universe was created. Show us that mutational rates changed dramatically between Egyptian mummies and modern humans. Whatever. Something that really attacks the real assumptions.

>> Do you know what the correspondence is between these two rates?

> Any correspondence between the observed rate, and the rate based on the assumption of
> common descent exists only in the imagination of an evolutionist. And this would be true
> even if the observed rate was accurately known. The assumed rate can easily be
> manipulated post-hoc by feeding in different parameters, so it doesn't explain anything.

Actually the only assumption is that the two lineages start with an identical sequence. Obviously the easiest way for the two lineages to start with an identical sequence is for them to be, at that time, members of an interbreeding population (aka, the MRCA). But you can also posit that, despite identical DNA or near identical DNA, the two modern species' lineages were created separate and yet the two were even then identifiable as chimp and human (rather than being neither modern species; as fossils we really only know the human side of things). Perhaps DNA didn't matter as much back then. Finding the fossil modern human and modern chimp in geological layers where we only see hominid species that are now extinct would help your cause. Especially since you probably also think that all these species (modern and extinct) were created at the same time (and in a very short time-frame, a mere week).

>>>>> From a YEC perspective, we'd expect some commonality between all created forms
>>>>> (pointing to a single designer). However, I think we need to learn a lot more before
>>>>> jumping to conclusions about so called junk DNA.

>>>> You do realize that does not make any sense, right?

>>> No, sorry.

Vitamin C synthesis mutation common to all great apes

>>>> Why would you expect all great apes to have stretches of DNA that look like the final
>>>> gene needed to make Vitamin C but that don't work to make vitamin C?

>>> Mutation.

Sure. But mutation in what species? All of the great apes simultaneously? All having evidence of the same initial inactivating mutation? Rather than an ancestor having the mutation which is passed on, with subsequent changes in different lineages, to all the descendants? If you are assuming mutation, you probably must make it a mutation that was "designed" to be put in all the great apes. Or not. But how to explain it. Is it due to mutation as a natural event? Or is it mutation, designed by God? If natural, common descent is the best way to explain the pattern in the primates. If by God, why design a common mutation in a gene that now cannot function?

>>>> And you've said designer instead of creator but also YEC. Does this mean you think
>>>> that something, not necessarily the God of the bible, designed life recently? But
>>>> apparently you think they did not design each species independently but had to reuse
>>>> DNA from other created species, even reusing parts that did not work?

>>> I meant the God of the Bible was the designer.

>> Why would your creator god insert defective genes into his de novo created genomes?

> After the Fall, all of creation was cursed.

So the mutations were natural events after the Fall? That just happened to occur in all the primates by chance? Or mutated purposely by God in such a way as to imply common descent?

>> Apparently you are suggesting something about a "common blueprint". But then apparently
>> the blueprint kept changing as well. And that changing blueprint keeps changing in ways
>> that create the pattern that evolutionary theory deduces to arise from common descent.

Common blueprint

> It appears that the protein coding genes are informationally simplistic - the "common
> blueprint" used to "build" all the different organisms. Evidence for this is the
> identical genetic patterns that are identified in all organisms, regardless of how they
> are separated in evolutionary terms.

Can you explain what the above means? What "identical genetic pattern" are you talking about? The canonical genetic code (which is not universal)? DNA (or RNA) as the genetic material? That certainly is consistent (even more so, given the exceptions in the genetic code) with common descent.

> The sea anemone for example, shows significant genetic homology to humans, but not the
> fruit fly (so much for common descent). In fact, it's entirely possible the non-coding
> DNA helps produce genetic diversity and if this proves to be the case, it would
> demonstrate an incredibly efficient, well-designed system.

You (or more likely, your 'creationist' source of misinformation) are misinterpreting this data. What it shows is that fruit flies and nematodes have lost more gene families than humans have relative to the sea anemone. That the anemone is more complex (wrt number of gene families) than nematodes and fruit flies in terms of number of gene families. This is in agreement with other earlier studies that fruit flies and nematodes have lost genes.

http://www.the-scientist.com/?articles.view/articleNo/25223/title/Surprises-in-sea-anemone-genome/

However, sea anemones have both animal-like and plant-like regulatory systems.

http://medienportal.univie.ac.at/presse/aktuelle-pressemeldungen/detailansicht/artikel/sea-anemone-is-genetically-half-animal-half-plant/

>> It's certainly possible to just say "because God did it that way" but that is a
>> rationalization, not a conclusion based on a simple model. If the same rationalization
>> can be used to "explain" anything, then in fact it explains nothing because it does not
>> serve to distinguish between why this and not that or the other things.

Evolution can be used to explain any observation

> The same applies to "evolution did it". See above for an example of circular reasoning.

No, evolution, as a scientific explanation, is quite constrained by the evidence. If the evidence were different, say by clear evidence that the earth is only 4000 years old, evolution as we know it would be impossible. If we found fossil modern humans and all other modern animals and plants as well as extinct animals and plants distributed throughout geologic layers either randomly or by ecosystem, that would be inconsistent with common descent and could be due to a global flood or some other mechanism. If we found that there was no branching similarity in DNA and protein sequences or that the differences were always due more to functional differences rather than time since divergence differences, we could more readily attribute the differences to "design necessity". But none of that is the case.

Sharing between species of nonfunctional regions of genome

>>>> Because you did not simply say, "I don't know", you said you expect the commonality.
>>>> But that commonality extends far beyond what is functional, and that is an observed
>>>> fact, not some theoretical result.

>>> No, it's an interpretation. There's still the possibility that pseudogenes turn out to
>>> be no friend of evolution.

Wishful thinking.

>> There have been a few instances where some evidence has been found that certain
>> pseudogenes may have the ability to bind, as transcribed RNA, to similar sequence DNA
>> genes that are not pseudogenes. However, a few cases that were studied in depth found
>> that the effects were artifacts of poorly designed experiments. It is only even
>> postulated that a minority of pseudogenes might have this effect. So biochemically, it
>> does not look like pseudogenes have phenotypic impacts on cells. This is consistent
>> with the current theory of evolution.

Moreover, your position would be that *all* pseudogenes and *all* non-coding DNA (there is a lot more of this) must have important (aka, selectively valuable) function. There is more than enough information to say that this is not true.

Known function of Pseudogenes

> I think your information is out-of-date. Pseudogenes are now known to play a role in
> gene expression, gene regulation and generation of genetic diversity.

Evidence? I would not argue that no pseudogenes can have a "function" of some sort. But that is a matter of quantity, not quality. What percent of pseudogenes have these "functions" of which you speak? 1% or 95%? And what evidence do you have to back up that amount?

> Sadly, the evolutionary preconception that "junk DNA" is just a collection of molecular
> fossils could have set us back many years. Now that it's finally being studied, I repeat my
> claim that there's still the possibility that pseudogenes turn out to be no friend of evolution.

Evolution makes use of 'useful' accidents. That does not mean that it makes use of *all* accidents. Only those that have some utility in a particular time and place (local environment). And 'utility' in evolution is measured by the metric of relative reproductive success. Pseudogenes are accidents. We know how they are created by mutation. We know that they *are* created by mutation. Why should we be shocked if some of those accidents are useful on the metric of reproductive success. I would also not be shocked if some of them were harmful on the metric of reproductive success. And I would not be at all shocked if most of them were selectively neutral.

[Return to the 2014 Posts of the Month]

Reader Paths

Source Links

Source ef319927-e824-4598-bcf5-a5a120caf3cf@googlegroups.com
Source here
Source here
Source here
Source here
Source http://www.the-scientist.com/?articles.view/articleNo/25223/title/Surprises-in-sea-anemone-genome/
Source External

Conceptual Connections

Concept Evolutionary change
Defines evolution and gives readers the baseline needed for more specialized Archive articles.
Search
Concept Creationist claims
Routes recurring claims into the Index to Creationist Claims for claim-by-claim context.
Search
Concept Intelligent design
Connects design arguments to Archive and design-specific critiques.
Search
Concept Mutation and variation
Connects variation, mutation, and mechanisms of evolutionary change.
Search
Concept Genetics and populations
Links genetics-heavy pages to population-level evolutionary mechanisms.
Search

The Talk.Origins Archive Post of the Month: July 2014

Pseudogenes: Matching ad-hoc rationalizations against constrained explanations

Post of the Month: July 2014