- Feb 4, 2021
- 15 min read

The Great Gedankenexperiment

My research is all about understanding evolution. For now, you can think of this research as being motivated by a simple question: “Is evolution repeatable?” If evolution IS repeatable, then it might also be predictable. However, if evolution is not repeatable, then specific evolutionary outcomes are much more difficult to anticipate.

To address this question, we first need to cover a few basics about how evolution actually works. Despite the wonderful complexity it produces, evolution is a rather simple process. You can think of it as a recipe with only three ingredients, each with its own simple rule.

The first ingredient we call adaptation. Also known as selection, the rule is simple: Some individuals perform better than others. The second ingredient we call chance. It's also known as random variation. The rule: Organisms need to be different. The source of those differences are mutations, genetic drift and other processes which occur at random. We’ll stick with chance for now. Our third ingredient is history. The rule: Attributes get passed down from one generation to the next through inheritance. So there we have it. Our recipe is complete. Evolution will occur anytime we combine: adaptation, chance, and history.

The goal of my research is to understand how these three essential ingredients work together to determine if evolution is predictable.

Before we jump into the details of our experiment. Let’s talk about a very famous thought experiment, which I call “The Great Gedankenexperiment”.

In 1990, Stephen Jay Gould proposed a rather clever way to determine if evolution is repeatable. He wrote:

I call this experiment “replaying life’s tape.” You press the rewind button and, making sure you thoroughly erase everything that actually happened, go back to any time in the past-say, to the seas of the of Burgess Shale. Then let the tape run again and see if the repetition looks at all like the original. If each replay strongly resembles life’s actual pathway, then we must conclude that what really happened pretty much had to occur. But suppose that the experimental versions all yield sensible results strikingly different from the actual history of life? What could we say then about the predictability of self-conscious intelligence? Or of mammals? Or of vertebrates? Or of life of land? Or simply of multicellular persistence for 600 million years… […] The bad news is that we cannot possibly perform the experiment.

Gould realized that if adaptation was the primary cause of evolutionary change, then evolution would be highly repeatable. But if the repeats are unpredictable, it would demonstrate the importance of historical contingency, the impact of chance and history. To understand why let’s look at this picture.

Fig. 1

This picture depicts evolutionary change primarily due to adaptation, chance, and history. On the X-axis, that’s going from left to right, we see ancestral values. You can think of this as the score given to an organism’s ancestor at the start of evolution. Going up and down, on the Y-axis, we have derived values. You can think of this as the score assigned to the descendant populations at the end of evolution. On the left we see that if adaptation dominates evolution, evolution starts from different values (that’s indicated by these values being spread along the x-axis), but since adaptation makes evolution repeatable, the derived values are all the same. Selection has forced them towards a common outcome.

If evolution is primarily due to chance then we will see that even if populations are identical at the start of evolution, random processes will nevertheless produce differences between the derived values. This is depicted by these populations having an identical ancestral value of 1 at the start of evolution. But at the end, the descendent populations’ values are all different. To understand the influence of history, we also need to think about populations that start with different values. When history dominates evolutionary change, the derived values retain differences at the end of evolution, as opposed to converging toward a common value as they would under the influence of adaptation.

The important thing to remember is that we can use replaying the tape of life to measure the effects of our three ingredients: adaptation, chance, and history. We can repeat the history of life from a common starting point to assess chance. But we need multiple starting points to distinguish between the effects of adaptation and history.

Even though Gould considered this to be purely an exercise of the imagination, I am going to tell you about a very real experiment I’ve conducted with digital organisms based on replaying the tape of life.

Fig. 2

The experiment has two phases. In the first phase, we made copies of a single individual and put each copy in its own digital world. Since the environment for each population is the same, the experiment is literally a replay of evolutionary history multiple times under identical conditions. The difference is that the replays all take place at the same time, in parallel, instead of one after the other. Think of this star as representing each population starting in its own world, but since they are copies of a single individual, there is absolutely no difference between the populations. The black lines going from left to right indicate each population evolving in complete isolation from the others. As time goes on the populations become different from one another. Early on there are only a few differences between these populations. But as time goes on, their history gets deeper and they become very different from each other. We let these populations evolve for a long time. In Avida, the unit of time is called an update. These populations evolved for a total of 500,000 updates.

Let’s turn our attention to the 2nd phase. This is where we perform the replay experiments that we want to focus on. Let’s look only at these red populations for now. We made copies of a single individual from each of the Phase I populations. You can think about this as going into each one and isolating a new star. This way we can once again replay the tape of life from a common start. But this time, we’re using a new environment to restart evolution, with one set of copies from each of the unique histories we created in the old environment. This allows us to have the different starting points we need to estimate the contributions of adaptation, chance, and history at the end of Phase 2.

We measured each ancestor at the start of evolution in the new environment. Then at the end of Phase II, we measured the derived values at the end of evolution. You can think of that as measuring each one of the individual red tips. The differences between the score of each ancestor at the beginning and average score of its descendants reflect adaptation. Let’s say the average difference was 4 points. We’d say adaptation explains a change of 4 points across groups, because it explains the repetitive change across all the red clans. Now, within each red group, evolution was replayed from a common starting point, so only chance can explain differences between populations within the same clan. Let’s say on average the spread within each red group is 4 pts. We’d say that random variation, or chance, is responsible for a spread of 4 points. Of course, each one of these red groups began with a single individual from each Phase 1 lineage. Each group’s unique history is responsible for differences between red groups. Again, let’s say that the spread between group averages is also 4 points. In this simple example, we’d expect the red populations to have about 12 points of total spread, or variation between them, with adaptation, chance, and history making equal contributions, each explaining about 4 pts of variation.

I’ll spare you the details of the analysis, but I wanted to show you how we can isolate the effects of adaptation, chance, and history using our experimental design.

The exciting thing about this design however is that we can measure how the interaction of adaptation, chance, and history changes with what we call the “footprint of history”. You’ll notice that all of these red populations are actually isolated very early in Phase I. The lineages had only recently began their unique evolutionary histories. These populations have only been evolving for 20,000 updates or about 3,000 generations. However, when we performed our replays in the new environment, we also included groups of copies isolated from the same lineages after 100,000 updates or about 15,000 generations. Those are the green populations. We also isolated samples from the end of Phase 1 that evolved for the full 500,000 updates. That’s about 65,000 generations. Within each color, we can estimate adaptation, chance, and history as described. But the differences in those estimates between colors reflect the differences created by the accumulation of historical effects throughout Phase 1.

Essentially, we are replaying the tape of life using our most highly evolved organisms. That would be the blue populations from the end of Phase 1. At the same time, we’re allowing them to compete head-to-head with their ancestors from deep in the evolutionary past…as well as even more primitive ancestors, from the dawn of time, in parallel virtual universes.

Before we jump into the results, let me mention a few small details.

First, we started Phase 1 with 10 populations instead of the five depicted here. We also made 10 copies of everyone at the start of Phase II instead of the five shown here. We performed the experiment twice, using different new environments. The first new environment was similar to the old environment. We named it “Overlapping” because it shares common resources with the ancestral environment. The other new environment was very different from the old environment. We named it “Orthogonal” because it has no resources in common with the old environment. We wanted to see if the effect of the footprint of history depends on how foreign the new environment is in relation the old environment. Also, we measured two attributes of the organisms. One was the organism’s fitness. You can think of this as a score that tells us how well the organism takes advantage of the resources in the environment. Organism’s with higher fitness reproduce faster and thus have an evolutionary advantage. The other attribute is genome length. It’s simply a measure of the number of instructions that make up the organism’s genetic code.

The software program used for this experiment is called Avida. Avida calculates fitness and genome length automatically. At the beginning of Phase I the initial ancestor is a generic individual. It can make copies of itself but is otherwise a program full of “no operation” instructions that don’t do anything. As the organism makes copies of itself, mutations occur at random. Over time, the organisms evolve to perform various computational tasks. If a task they perform is rewarded, they get a boost that helps them reproduce faster.

Let’s get into the results. We should first look at what happened during Phase I.

Fig. 3

Here we can see the same star we saw in the previous picture that represents the initial ancestor of all populations at the start of Phase I. Time is depicted from left to right on the x-axis. Each color represents a single replay and the colors used for fitness and genome length match. So we can easily compare the evolution of fitness and genome length for each population. It’s important to note that the values on the y-axis, going up and down the left side of each plot, are log transformed by a factor of 10. It just allows us to show large numbers more easily. On this particular scale you can think of each integer as representing a factor of 10. So the number 10 becomes 1, 100 becomes 2, 1000 becomes 3, and so on.

Let’s start at the top with fitness. The initial ancestor had a fitness of less than 1 (which is why it’s depicted below 0 on this scale) and there was no variation between populations. As time went on however, all 10 populations increased drastically in fitness. It’s important to note that not only did the average increase but the amount of spread between populations increased with time. At the end, the raw fitness scores ranged from about 10 to over 30 million. That’s a lot of spread!

However, when we look down below at genome length we see a very different story. The populations all start out with a genome length of 100, depicted by this star at the number 2. Over the course of Phase 1, populations do increase in the spread of their genome lengths, but not by very much and there is no overall direction. Some got longer, some got shorter, and the direction of change often switches. Notice how much more squiggly the lines for genome length are compared to fitness.

Alright, now remember, we don’t actually estimate adaptation, chance, and history until after Phase II. Because all of these Phase I populations share a single history. Let’s look at what happened during Phase II. We’ll start with the most primitive populations. Those are the ancestors we sampled here at 20,000 updates. They have a shallow history. We’ll start by just looking at what happens when we used the new environment that was similar to the old environment.

Fig. 4

Remember, we’re now looking at 100 replays of the tape of life! But this time there are 10 replays that start from a single individual from each of the 10 Phase I populations. So these populations are grouped by color and the colors correspond to the Phase I population their ancestor is taken from. Now we have 10 different histories, whereas we only had one during the first phase. We compare the measurements of the ancestors here at the beginning of Phase II and then we measure all of the values at the end and we use those to estimate adaptation, chance, and history.

The most important thing to notice on this graph is that evolution happened just like it did in Phase I. There is massive improvement in the fitness of all populations and the variation increases dramatically. Now the range of fitness scores is from about 10 to numbers with about 18 zeroes after them. I have no idea how to even pronounce such numbers [quintillion]. But genome length doesn’t change much at all and similarly shows no overall direction. Don’t worry too much about trying to interpret this graph. It’s just useful for visualizing the raw data.

Let’s look at our estimates of adaptation, chance, and history.

Fig. 5

Here we can see the estimates of adaptation, chance, and history for populations with a shallow history after evolving in the new environment that was similar to the old environment. To the right, we’ve also plotted the ancestral variation. That’s the amount of spread that existed between the different ancestors at the beginning of Phase II. It’s shown for comparison.

Our estimates are still on that log10 scale so we can more easily represent extreme numbers. Let’s start with genome length. We see that our estimate for adaptation is almost zero. Since the margin of error includes 0, we’d say that adaptation is not making a significant contribution to genome length for these populations. However, history and chance do make significant contributions.

However, when we look at fitness things are much different. History and chance still make significant contributions, but evolution is clearly dominated by adaptation. In other words, the evolution of higher fitness was very repeatable. Before we go deeper, let’s take a quick look at what happened in the other environment.

Fig. S2

Remember that this other new environment, is very different from the old environment. Nevertheless, the patterns of adaptation, chance, and history are nearly the same. The estimates in the similar environment are much higher, but for now just focus on the main patterns. Remember, it’s how the estimates of adaptation, chance, and history compare to each other that we’re most interested in and those relationships are nearly identical.

Adaptation does not contribute significantly to genome length, but history and chance do. All three contribute significantly to the evolution of fitness, but evolution is clearly dominated by adaptation. As a matter of fact, if we covered the numbers on the y-axis you probably would not be able to tell the difference.

There is one key difference with regard to genome length that I’d like to call your attention to. In the less similar environment, the estimate of chance is the highest estimate, edging out history, but in the more similar environment, history takes a slight victory.

Let’s see what happens to these patterns when we extend the footprint of history. We’ll simply rerun our analysis but this time we’ll include the populations with both intermediate and deep evolutionary histories from Phase 1. Let’s first discuss what happens to genome length. This time we’ll look at both environments at the same time. Remember that shallow history is red, intermediate history is green, and deep history is blue.

Fig. 7

The main thing to notice is that in general, the situation hasn’t changed much from what we saw when we were only looking at the populations with a shallow history. The contribution of adaptation is negligible. And the contributions of history and chance are both significant. The contribution of history increases as the footprint of history extends from shallow to intermediate to deep. However, the contribution of chance declines as the depth of history increases from shallow to intermediate to deep.

So as time marches on, history goes up and chance goes down. We can ignore adaptation for now.

Fig. 8

Now let’s have a look at the situation regarding the other attribute, fitness. Again, we’ll look at both environments at the same time. Let’s start by taking note of what hasn’t changed. Adaptation, chance, and history all make significant contributions to fitness in both environments, regardless of the depth of history. Also, just like we saw with genome length, the contribution of history increases with the depth of history and the contribution of chance decreases with the depth of history.

However, the most interesting result of these experiments is observed when we look at the contribution of adaptation. When we replayed the tape of life for these 300 populations in the new environment that was very similar to the old environment, the contribution of adaptation declined sharply with the depth of history.

As a matter of fact, you can see that the margins of error for the red and blue populations don’t overlap at all. This is by far the most significant impact of the footprint of history in any of the estimates throughout the experiment. This causes a very important shift in the dynamics of evolution.

For populations with a shallow footprint of history (the red ones), the contribution of adaptation is several times that of history. But for populations with the deepest footprint of history (the blue ones), the contribution of history is slightly higher than that of adaptation. This means that by extending the period of time we allowed the ancestor populations to evolve in the old environment, we can actually observe the footprint of history becoming too deep to be overwhelmed by adaptation. In other words, we see a major shift to the relative contributions of adaptation, chance, and history. This result was very exciting!

But this “replaying life’s tape” fantasy-come-to-life had another huge surprise in store for us.

When we look at the same estimates of adaptation when we reran the tape of life using the exact same 300 individuals, only this time using the new environment that was totally different from the old environment, the effect of the footprint of history completely vanished! In this environment, adaptation reigned supreme, and the most highly evolved digital organisms evolved almost in lock-step with even the most primitive ancestors. As a matter of fact, we can actually plot these estimates over time.

Fig. 9

In this final plot our colors are the same. The solid lines are adaptation, the dashed-dotted lines are history, and the dashed lines are chance. Those faded dotted lines are the ancestral spread, which is useful for comparison.

I only want to draw your attention to one important difference. When we look at the evolution of fitness in the more similar environment up top, we can see that for the shallow (red) populations, adaptation rapidly becomes the most dominant ingredient very early in the experiment. However, for blue populations with deep history, even by the end, adaptation hadn’t eclipsed history. But in the other environment, we can really see how rapidly adaptation overwhelmed even the deepest footprint of history by noticing that adaptation becomes the dominant ingredient for every depth of history, even more rapidly than it came to dominate the shallowest history in the more similar environment.

We’ve looked at a lot of information! I’m sure you’d like to leave with a clear picture of what it all means.

Let’s start with the basics. Remember that evolution requires three essential ingredients: adaptation, chance, and history. We’ve seen that each can be the “main” ingredient. We use replay experiments to estimate the potency of each ingredient. This can help us understand the balance between what makes evolution repeatable (adaptation) versus the factors that contribute to historical contingency (chance and history). The balance between adaptation, chance, and history largely depends on what attribute we’re measuring.

Fitness, and similar traits that are very important to an organism’s ability to reproduce are said to be under strong selection. These attributes are likely to be dominated by adaptation. Other traits, like genome length, are not vital to reproductive success and are said to be under weak selection. The evolution of these traits is likely to be dominated by history and chance.

As it relates to the depth of history, it’s almost certainly an important consideration when thinking about the relative contributions of adaptation, chance, and history. Almost every estimate of adaptation, chance, and history went in order, according to the depth of history.

The influence of history consistently increased when deepening the footprint of history. While the effect of chance consistently decreased. This was regardless of the attribute or environment.

The most interesting effect of the depth of history was on adaptation. The effect is likely not important for traits under weak selection. But for traits under intense selection, the effect of deepening the footprint of history on adaptation can depend on the relationship between the old and new environment. If the old and new environment are similar, history can constrain adaptation. In this situation, deep populations begin the replays with a head start from having additional time to adapt to a similar habitat, but this pre-specialization causes them to adapt less rapidly in the new environment. In the less similar environment, training in the old environment does not carry over. So all of the cohorts evolve at the same rate, effectively erasing the legacy of even the deepest footprint of history.

In the end, I hope you can begin to understand, even just a little, how such a simple three ingredient recipe can come together to produce incredibly complex results. Remember that organisms, both in the real world and in computers, have many traits. Some of those traits will be more like fitness and so they may evolve more predictably. Other traits, will be more like genome length, and they’ll be dominated by randomness and inheritance. But evolution is constantly acting on all traits simultaneously. This means adaptation, chance, and history are constantly working in a potentially unlimited number of combinations at the same time. By designing the right experiment, we can observe how the balance between these three ingredients changes if we allow evolution more time to work its magic!

Be good to yourselves…and each other. Stay curious. Thanks for dropping by the virtual evolution lab!

Follow me at www.twitch.tv/pocketlocker86 for live streams. I stream science, music, gaming, and behind-the-scenes broadcasts of my content production process.

The Great Gedankenexperiment

Recent Posts

Express Yourself!