Genetic Algorithms in Ruby, Part I

Dhaivat Pandya

Charles Darwin produced the theory of the “survival of the fittest” (though the term was coined by Herbert Spencer), which means that the fittest and quickest adapting organisms in a specific environment will always prevail over the others. This makes a lot of sense; the organisms that are stronger are more likely to survive and also more likely to pass on their genetic code. What makes this interesting is that there are occasional mutations in the chromosomes of organisms. So, a mutation might produce an elephant whose vision was much better than the others, who would then be likely to survive longer since he’d be able to avoid lions better than the rest, and pass on his special trait. This means that if we want to “optimize” the strength of a certain species of animals, we can probably just wait around for a couple million years while mutations occur and the fittest win over the rest.

Computer scientists and mathematicians of the 1950’s started to borrow ideas from biology and brought them to computer science. One of those ideas was genetic algorithms. With genetic algorithms (GAs), we replicate the process of evolution on a computer to optimize a function or to perform search queries. In plain english, we define a rule that tells you when an “organism” is better than another. We mutate these slightly and have the fittest reproduce to get (after some number of cycles) an optimal “organism”. I say “organism” (in quotes) because there aren’t really any organisms; we’re just using a simplified model of one. So, it turns out that genetic algorithms enable a computer to make “smart” decisions for optimization or searching (or a host of other applications).

If you didn’t really understand much of the above, that’s perfectly fine. We’ll be writing our own genetic algorithm code (in Ruby, of course) to solve an optimization problem, and every step of the way will be explained with much more depth than the above.

Details

Let’s start with an example problem, from which we will develop the idea of genetic algorithms.

Suppose we have a machine, which, when given a binary number four bits long (meaning it has four digits in binary) performs an XOR operation on it with the number 1010 and outputs the result. For example, if the input is 1100, the output would be 0110 (or, in decimal, 6).

We want to find out what input would make the output of this machine as great as possible.

When using genetic algorithms, we follow some steps:

  1. An organism is represented by a chromosome, which is simply a string
  2. We randomly generate a bunch of organisms (i.e. strings)
  3. We determine the fitness of the organisms by taking the string of each and recording the return value from the “fitness function” (i.e. the rule that the machine follows; in our case, the XOR operation)
  4. We assign probabilities to each organism based on fitness (i.e. higher fitness implies higher probability.) Afterwhich, we pick an organism that we then “reproduce”, and the organisms that were unable to reproduce die out.
  5. All of the organisms find mates with which the strings are “crossed over”.
  6. We perform many iterations of this cycle.

Again, don’t worry if that went too quick for you. We’ll walk through each step in detail.

So, our organisms in this case are very easily represented; they’re just bit strings. We’ll have a binary number 4 bits long; something like 1011, which represents the input of the machine.

Then, we have to randomly generate these bit strings, which is quite easily done.

We take all of the randomly generated bit strings (called the organism/chromosome pool) and measure their fitnesses by passing them through the fitness function, which is the XOR operation.

You might be wondering why we don’t just have a pool large enough that it can encompass all of the possible inputs. Then, we could just plug them into the fitness function and pick the organism with the highest fitness; it would be just 16 total bit strings that we have (since we’re considering four bits). Suppose that instead of four bits, we were considering 40 bits. If we were to build up a pool of all the possible bit strings, we’d have 1099511627776 total bits; that’s a lot! In short, it is because of efficiency that we don’t consider all possible inputs, and, in some cases, there may be infinitely many of them. In fact, the very point of using genetic algorithms is to improve optimization efficiency, if this isn’t a concern, brute force methods are much easier to implement.

Back to the algorithm. The probabilities are assigned based on fitness, and, a few organisms are picked that are then “cloned”, i.e. a copy of the chromosome is inserted into the chromosome pool.

Then, to produce new chromosomes (that are based on old ones), the chromosomes are mated with other chromosomes with parts of the chromosomes interchanging with one another.

So, that’s the basic idea of genetic algorithms. Don’t expect to understand all of it just yet; but, a general idea should be starting to form. Just keep reading; it’ll all fall together.

Organism

The basic structure of genetic algorithms is the organism, so, it will beneficial for us to define a class around it.

That gives us a data structure onto which we can attach methods, so that this component of the GA code will be entirely reusable. The constructor generates a random bit string of a given length when one isn’t provided, since that’s something we’ll be doing a lot.

If you test this class’s constructor out with a arguments like 4 for length and 100 for so_many, it spits out a gigantic list of binary numbers as an output (if you choose to print the bitstring for each Chromosome object).

I don’t know about you, but, I feel pretty cool when I have binary numbers scrolling down my screen!

Fitness

Now, we need to get our fitness function down flat; that’s what everything is hinged on. We are XORing the input with 1010 and then returning the result.

Here it is in our beloved Ruby:

If you test it out with 1100, you should get a result of 6 or 0110 in binary, which is correct.

Weighted Randoms

A special challenge we have to deal with is “weighted” random generation. So, we have to assign probabilities to each organism and then pick based on those probabilities.

This is often called a roulette wheel selection, since it resembles a roulette wheel, with portions being of different sizes. This isn’t the fastest way to do it, but it’s one of the simplest and easiest to implement.

We are able to do this by using an array on which portions are reserved for each chromosome depending upon what the fitness of the chromosome. Here’s the implementation added to our Chromosome class:

We’re doing the process in two parts; one in which the roulette “wheel” is set up, and, another, in which a random number is chosen from the roulette wheel. Of course, the roulette wheel should be updated just before the roulette is going to be used.

In Part II of this article, I will finish our study by focusing on bits like reproduction, finding mates, and “crossing” our chromosomes.

Win an Annual Membership to Learnable,

SitePoint's Learning Platform

  • Mikael Konutgan

    Camel case? using a while loop to iterate over a range? Methods that start with get and set? Class variables? to_s with parenthesis? Seriously, the author should look into some Ruby best practices and a style guide. He might as well just write Java…

    The material is interesting, though the article is too simple to be taken seriously.

    • Dhaivat Pandya

      Hi Mikael,

      I’ve tried to put your suggested changes into action into this part, as well as next.

      As for the content of the article – the second part gets a lot more interesting.

      • Mikael Konutgan

        Looking a lot, lot better. One last tip would be to remove all `return` statements. You’ve already not used them in some of the methods.

        By the way, I’m very interested in evolution and natural selection, I’ll be looking forward to part 2.

        • http://ckcarroll.herokuapp.com Cameron Carroll

          Why remove return statements? Simply stating a function’s result is confusing because it’s not certain whether you’re returning a variable or invoking a method with no arguments.

          Return statements, if a bit more verbose, are completely clear.

          • Pullo

            That’s not actually correct. Using a return statement does not make it completely clear.
            Writing ‘return foo’ could still return either the value of the variable ‘foo’, or the return value of the method call to ‘foo’.

            In ruby it is usually considered good style to use implicit returns, unless you are doing something like terminating a method prematurely. E.g.

            def foo
            return nil
            puts “hello”
            end

            In this case “hello” is never printed.

  • Ursel

    Dude, are you insane? Camelcased????????? Please remove anything saying you write ruby…

    • http://www.ruprict.net/ Glenn Goodrich

      Wow…you guys are really adamant about style, huh? So much so that you’d tell a guy to disassociate with Ruby?

      I would say your reaction is far worse than any style infraction by this (or any) author.

      • David Jenkins

        Well said. I’m always amazed at how many people seem so eager to give a community — in this case the Ruby community — a reputation for being nasty snobs.

    • Dhaivat Pandya

      Fixed. Happy?

  • Diego

    It was actually Herbert Spencer who coined the term Survival of the Fittest. FTFY

    • Dhaivat Pandya

      Oh, thanks. Will add that in.

  • markus

    Found some small errors:

    def initialize (length, bitstring=nil)
         @bitstring ||= rand(2**length+1).to_s(2).rjust(length, ‘0’)    
    fitness  
    end  

    you are never using the bitstring argument and the instance variable can not be set therefore this is the same as your code: 

    def initialize (length)
    @bitstring = rand(2**length+1).to_s(2).rjust(length, ‘0’)
    fitness
    end

    I guess you had a typo there because the number argument is not used in the body. I think you meant

    def self.random_chromosomes(length, number)
        return Array.new(number) { Chromosome.new(length) }  
    end

    • Dhaivat Pandya

      Second one is correct, the first one is fixed in the next part.

  • http://codetunes.com/ yanoo

    @bitstring = rand(2**length+1).to_s(2).rjust(length, ‘0’)

    shouldn’t that be just rand(2**lenght)? rand generates number between 0 <= x < max which sounds correct for bits (max value 2**length – 1)

  • Zamo

    Another small step in improving code readability would be to prepend and append spaces after “=”, e.g. “roulette = []” or “bitstring = nil”. Excellent work on the article and improving it based on (perhaps not so constructive) criticism!

  • http://letterstobreathe.com chris

    Thankyou Dhaivat

    I love the fact you informed me about GA, I love the fact you (inadvertantly) informed me about ruby style and I love the fact that you didn’t let the manner of the criticism (“he” knows who “he” is) get in the way of your own or my learning from it.