A Tour Through Random Ruby

blue abstract numbers

This article covers various ways that you can generate random (usually pseudo-random) information with Ruby. Random information can be useful for a variety of things, in particular testing, content generation, and security. I used Ruby 2.0.0, but 1.9 should produce the same results.

Kernel#rand and Random

In the past, a random range of numbers might be described like

rand(max - min) + min

For example, if you wanted to generate a number between 7 and 10, inclusive, you would write:

rand(4) + 7

Ruby lets you do this in a much more readable manner by passing a Range object to Kernel#rand.

>> rand(7..10) 
=> 9
>> rand(1.5..2.8)
=> 1.67699693779624

Kernel#srand sets the seed for Kernel#rand. This can be used to generate a reproducible sequence of numbers. This might be handy if you are trying to isolate / reproduce a bug.

>> srand(333)
>> 10.times.map { rand(10) }
=> [3, 3, 6, 3, 7, 7, 6, 4, 4, 9]
>> 10.times.map { rand(10) }
=> [7, 5, 5, 8, 8, 7, 3, 3, 3, 9]

>> srand(333)
>> 10.times.map { rand(10) }
=> [3, 3, 6, 3, 7, 7, 6, 4, 4, 9]
>> 10.times.map { rand(10) }
=> [7, 5, 5, 8, 8, 7, 3, 3, 3, 9]

If you need multiple generators, then you can access the complete interface to Ruby’s PRNG (Pseudo-Random Number Generator) through Random.

>> rng = Random.new
>> rng.rand(10) 
=> 4

Random#new can take a seed value as an argument. The #== operator will return true if two Random objects have the same internal state (they started with the same seed and are on the same generation).

>> rng1 = Random.new(123)
>> rng2 = Random.new(123)
>> rng1 == rng2
=> true
>> rng1.rand
=> 0.6964691855978616
>> rng1 == rng2
=> false
>> rng2.rand
>> 0.6964691855978616
>> rng1 == rng2
=> true

Random Array Elements

If you wanted a random element from an Array, you could pass pass the Array a random index like this:

>> arr = [1, 2, 3, 4, 5]
>> arr[rand(arr.size)]
=> 1

This isn’t necessary. As of Ruby 1.9, you can use Array#sample. It was previously known as Array#choice.

>> [1, 2, 3, 4, 5].sample
=> 4

Two consecutive #sample calls are not guaranteed to be different. You can pass the number of unique random elements you want to #sample.

>> [1, 2, 3, 4, 5].sample(2)
=> [4, 1]

Since #sample is only available for Array, for other collections you will need to either do it the old-fashioned way or convert them to Array first.

Actually Random Numbers

Sometimes pseudo-random numbers are not good enough. If they are based on something predictable, they can be predicted and exploited by an attacker.

RealRand is a wrapper for 3 genuine random number generator services:

  • random.org: generates randomness from atmospheric noise
  • FourmiLab(HotBits): uses radioactive decay
  • random.hd.org(EntropyPool): claims to use a variety of sources, including local processes / files / devices, web page hits, and remote web sites.

Note: As of this writing, the RealRand homepage appears to contain examples for 1.x, where RealRand’s classes are grouped under the Random module. The newest version of the gem (2.0.0) groups the classes under the RealRand module, as in these examples.

$ gem install realrand

>> require 'random/online'

>> rorg = RealRand::RandomOrg.new
>> rorg.randbyte(5)
=> [197, 4, 205, 175, 84]

>> fourmi = RealRand::FourmiLab.new
>> fourmi.randbyte(5)
=> [10, 152, 184, 66, 190]

>> entropy = RealRand::EntropyPool.new
>> entropy.randbyte(5)
=> [161, 98, 196, 75, 115]

In the case of the RandomOrg class, you also have the #randnum method which will let you specify a range in addition to the number of random numbers.

>> rorg.randnum(5)
=> [94, 3, 94, 56, 97]

>> rorg.randnum(10, 3, 7)
=> [7, 7, 7, 5, 7, 4, 4, 5, 6, 7]

Random Security

Ruby ships with SecureRandom for generating things like UUIDs (Universally Unique Identifiers), session tokens, etc.

>> require 'securerandom'

>> SecureRandom.hex
=> "e551a47137a554bb08ba36de34659f60"

>> SecureRandom.base64
=> "trwolEFZYO7sFeaI+uWrJg=="

>> SecureRandom.random_bytes
=> "\x10C\x86\x02:\x8C~\xB3\xE0\xEB\xB3\xE7\xD1\x12\xBDw"

>> SecureRandom.random_number
=> 0.7432012014930834

“Secure” probably depends on who you are. SecureRandom uses the following random number generators:

  • openssl
  • /dev/urandom
  • Win32

A glance at the code reveals that it defaults to OpenSSL::Random#random_bytes. It looks like PIDs and process clock times (nanosecond) are used for entropy whenever the PID changes.
I suspect that this is enough for most things, but if you need an extra layer of protection, you could use RealRand for additional entropy. Unfortunately, SecureRandom does not have anything like a #seed method, so you will need to seed OpenSSL directly. Note: OpenSSL seeds are strings.

>> require 'openssl'
>> require 'random/online'
>> rng = RealRand::RandomOrg.new
>> OpenSSL::Random.random_add(rng.randbyte(256).join, 0.0)

You can read why I used 0.0 here. According to the patch discussion, the 0.0 as the second argument to #random_add is the amount of estimated entropy. Previously, it was being overestimated, so the number was changed to 0.0. However, According to the OpenSSL documentation the 2nd argument to RAND_add is the number of bytes to be mixed into the PRNG state, and the 3rd argument is the estimated amount of entropy. OpenSSL::Random#random_add does only take 2 arguments (instead of 3), but if they got the 2nd argument wrong and 0 bytes of seed are getting mixed in, then SecureRandom is probably worthless for anything serious without a fix. If you know anything about this, please leave a comment.

Random Numbers Based on Probability Distributions

Let’s say you wanted to generate random, yet realistic, human masses (i.e. weights for non-égalité imperials). A naive attempt might look like this:

>> 10.times.map { rand(50..130) }
=> [92, 84, 77, 55, 95, 127, 120, 71, 105, 94]

Now, although you could find human beings that are 50 kilograms (110 lbs), and you could find some that are 130 kilograms (286 lbs), most are not quite that extreme, making the above result unlikely for a completely random sample (not mostly members of McDonald’s Anonymous and professional wrestlers).

One option is to just ignore the extreme ranges:

>> 10.times.map { rand(55..85) }
=> [58, 80, 55, 65, 58, 70, 71, 82, 79, 60]

The numbers that would generally be obtained are a little better now, but they still don’t approximate reality. You need a way to have the majority of the random numbers fall within a smaller range, while a smaller percentage fit within a much larger range.

What you need is a probability distribution.

Alas, Ruby is not strong in the math department. Most of the statistics solutions I came across were copy/paste algorithms, unmaintained libraries/bindings with little documentation, and hacks that tap into math environments like R. They also tended to assume an uncomfortably deep knowledge of statistics (okay maybe like one semester, but I still should not have to go back to college to generate random numbers based on a probability distribution).

This document claims that human traits like weight and height are normally distributed. Actually, quite a few things show up in normal distributions.

Rubystats is one of the simpler libraries I encountered. It can generate random numbers from normal, binomial, beta, and exponential distributions.

For this example I used a normal distribution with a mean mass of 68 kg and a standard deviation of 12 kg (just guesses, not to be taken as science).

$ gem install rubystats

>> require 'rubystats'
>> human_mass_generator = Rubystats::NormalDistribution.new(68, 12)
>> human_masses = 50.times.map { human_mass_generator.rng.round(1) }
=> [62.6, 75.4, 62.1, 66.2, 50.9, 58.9, 70.8, 51.4, 60.9, 63.5, 72.0,
    48.2, 62.3, 63.0, 75.3, 62.6, 103.0, 62.3, 46.6, 66.2, 62.7, 92.2, 
    76.1, 85.1, 77.5, 75.9, 57.1, 68.3, 63.8, 53.3, 51.6, 75.4, 61.9, 
    67.7, 58.2, 64.2, 83.3, 69.0, 75.5, 68.8, 60.4, 83.8, 76.2, 81.0, 
    60.9, 61.2, 55.5, 53.1, 61.4, 79.0]

There are 2.2 American pounds in a kilogram, for those of you to whom these numbers mean little.

>> human_weights = human_masses.map { |i| (i * 2.2).round(1) }
=> [137.7, 165.9, 136.6, 145.6, 112.0, 129.6, 155.8, 113.1, 134.0, 139.7,
    158.4, 106.0, 137.1, 138.6, 165.7, 137.7, 226.6, 137.1, 102.5, 145.6, 
    137.9, 202.8, 167.4, 187.2, 170.5, 167.0, 125.6, 150.3, 140.4, 117.3, 
    113.5, 165.9, 136.2, 148.9, 128.0, 141.2, 183.3, 151.8, 166.1, 151.4, 
    132.9, 184.4, 167.6, 178.2, 134.0, 134.6, 122.1, 116.8, 135.1, 173.8]

If this is up your alley, you might also want to check out gsl, distribution, and statistics2

Random Strings

There is a good page on stackoverflow which has several solutions for generating random strings. I liked these:

>> (0...8).map { (65 + rand(26)).chr }.join
=> "FWCZOUOR"

>> (0...50).map{ ('a'..'z').to_a[rand(26)] }.join
=> ygctkhpzxkbqggvxgmocyhvbocouylzfitujyyvqhzunvgpnqb

Webster

Webster is an English / English-sounding word generator. It could be useful for generating confirmation codes in western localizations.

$ gem install webster

>> require 'webster'
>> w = Webster.new
>> w.random_word
=> "unavailed"

>> 20.times.map { w.random_word }
=> ["bombo", "stellated", "kitthoge", "starwort", "poleax", "lacinia",
    "crusty", "hazelly", "liber", "servilize", "alternate", "cembalist", 
    "dottore", "ullage", "tusculan", "tattlery", "ironness", "grounder", 
    "augurship", "dupedom"]

random-word

The random-word gem claims to use the massive wordnet dictionary for its methods. You ever had somebody accuse you of using “them big words?” Those are the kinds of words that random-words appears to produce.

$ gem install random-word

>> require 'random-word'

>> 10.times.map { RandomWord.adjs.next }
=> ["orthographic", "armenian", "nongranular", "ungetatable", 
    "magnified", "azimuthal", "geosynchronous", "platitudinous", 
    "deep_in_thought", "substitutable"]

>> 10.times.map { RandomWord.nouns.next }
=> ["roy_wilkins", "vascular_tissue", "bygone", "vermiform_process",
    "anamnestic_reaction", "engagement", "soda_niter", "humber", 
    "fire_salamander", "pyridoxamine"]

>> 10.times.map { RandomWord.phrases.next }
=> ["introvertive glenoid_cavity", "sugarless reshipment", 
    "anticipant cyclotron", "unheaded ligustrum_amurense", 
    "dauntless contemplativeness", "nativistic chablis", 
    "scapular typhoid_fever", "warlike dead_drop", 
    "pyrotechnic varicocele", "avionic cyanite"]

If you want to get rid of those underscores, just add a gsub:

>> 10.times.map { RandomWord.nouns.next.gsub('_', ' ') }
=> ["litterbug", "nebe", "business sector", "stochastic process",
    "playmaker", "esthesia", "granny knot", "purple osier", 
    "sterculia family", "ant cow"]

Faker

Faker is useful for generating testing data. It has rather large library of data, so you might be able to generate procedural game content as well.

$ gem install faker

>> require 'faker'

>> 20.times.map { Faker::Name.name }
=> ["Gilberto Moen", "Miss Caleb Emard", "Julie Daugherty", 
    "Katelin Rau", "Sheridan Mueller", "Cordell Steuber", 
    "Sherwood Barrows", "Alysson Lind II", "Kareem Toy", 
    "Allison Connelly", "Orin Nolan", "Dolores Kessler", 
    "Kassandra Hackett Jr.", "Mikayla Spencer II", "Lonie Kertzmann", 
    "Emile Walsh V", "Tara Emmerich", "Mrs. Beryl Keeling", 
    "Jerry Nolan DVM", "Linnie Thompson"]

>> 10.times.map { Faker::Internet.email }
=> ["catherine.schamberger@toy.net", "eleonore@heaney.net",
    "toni@colliermoore.org", "merl_miller@pfeffer.net",
    "florine_dach@gusikowski.net", "bernadine@walter.net",
    "stevie.farrell@crooks.net", "janick@satterfield.name",
    "leanna.lubowitz@bogisich.biz", "rey@kutch.info"]

>> 10.times.map { Faker::Address.street_address } 
=> ["3102 Jasen Haven", "8748 Huel Parks", "1886 Gutkowski Creek", 
    "837 Jennie Spurs", "4921 Carter Coves", "7714 Ida Falls", 
    "8227 Sawayn Bypass", "269 Kristopher Village", "31185 Santos Inlet", 
    "96861 Heaney Street"]

>> 10.times.map { Faker::Company.bs }
=> ["aggregate extensible markets", "repurpose leading-edge metrics",
    "synergize global channels", "whiteboard virtual platforms", 
    "orchestrate ubiquitous relationships", "enable interactive e-services", 
    "engineer end-to-end convergence", "deploy enterprise e-services", 
    "benchmark wireless solutions", "generate impactful eyeballs"]

Impressed yet? Faker also offers data for multiple locales. For example, maybe you are making a game that takes place in Germany, and you want random character names of the Deutsch variety.

>> Faker::Config.locale = :de
>> 10.times.map { Faker::Name.name }
=> ["Mara Koehl", "Penelope Wagner", "Karolina Kohlmann", "Melek Straub",
    "Marvin Kettenis", "Lyn Behr", "Karina Deckert", "Janne Damaske", 
    "Sienna Freimuth", "Lias Buder"]

Or maybe you would like company catch phrases…in Spanish.

>> Faker::Config.locale = :es
>> 5.times.map { Faker::Company.catch_phrase }
=> ["adaptador interactiva Extendido", "línea segura tangible Distribuido",
    "superestructura asíncrona Diverso", "flexibilidad bidireccional Total",
    "productividad acompasada Re-implementado"]

Of course, there’s Lorem Ipsum stuff as well.

>> Faker::Lorem.paragraph
=> "Sit excepturi et possimus et. Quam consequatur placeat fugit aut et
    sint. Sint assumenda repudiandae veniam iusto tenetur consequatur."

Make sure you check the docs to see what else it can do. Also, if this is really your thing, look at the functional predecessor of Faker, Forgery. It has fallen out of use but seems easy to adapt.

random_data

One of the downsides of Faker is that it doesn’t seem to provide gender-specific name generation. The random_data gem does, although it could use some work (as of version 1.6.0).

$ gem install random_data

>> require 'random_data'
>> 20.times.map { Random.first_name_female }
=> ["Donna", "Sharon", "Anna", "Nancy", "Betty", "Margaret", "Maria",
    "Helena", "Carol", "Cheryl", "Donna", "Cheryl", "Sharon", "Jennifer", 
    "Helena", "Cheryl", "Jessica", "Elizabeth", "Elizabeth", "Sandra"]

>> 20.times.map { Random.first_name_male }
=> ["Richard", "William", "Arthur", "David", "Roger", "Daniel", "Simon",
    "Anthony", "Adam", "George", "George", "David", "Christopher", 
    "Steven", "Edgar", "Arthur", "Richard", "Kenneth", "Philip", "Charles"]

Looking at these names, they’re a bit…well, let’s just say there’s no “Sheniquoi.”

To be fair, it does have some pretty cool date and location methods. Random#date appears to pick dates near the current one.

>> 10.times.map { Random.date.strftime('%a %d %b %Y') }
=> ["Mon 16 Sep 2013", "Sat 21 Sep 2013", "Tue 24 Sep 2013", 
    "Sat 28 Sep 2013", "Thu 03 Oct 2013", "Fri 20 Sep 2013", 
    "Mon 23 Sep 2013", "Tue 24 Sep 2013", "Sun 29 Sep 2013", 
    "Thu 03 Oct 2013"]

>> 30.times.map { Random.zipcode }
=> ["33845", "87791", "27961", "94156", "40897", "24887", "51985", "12099",
    "82247", "33015", "77437", "93497", "35269", "94426", "58919", "50170", 
    "99952", "62229", "73271", "34316", "17547", "24590", "99613", "52954", 
    "95117", "38454", "70195", "84415", "97096", "58282"]

>> 30.times.map { Random.country }
=> ["Fiji", "Sudan", "Cambodia", "Belgium", "Rwanda", "Czech Republic",
    "Marshall Islands", "Georgia", "Saudi Arabia", 
    "United Arab Emirates", "Switzerland", "Uganda", "Uruguay", "Somalia", 
    "Ukraine", "Canada", "Jamaica", "Cape Verde", "Indonesia", "Sudan", 
    "Malaysia", "Virgin Islands (U.S.)", "Turkmenistan", "Libya", "Sweden", 
    "St. Vincent and the Grenadines", "Korea, Dem. Rep.", "Faeroe Islands", 
    "Myanmar", "Zimbabwe"]

Note: According to the random_data github page, “zipcodes are totally random and may not be real zipcodes.”

Raingrams

The raingrams gem is probably the most interesting thing in this tutorial. It can produce random sentences or paragraphs based on provided text. For example, if you are some kind of sick, depraved, YouTube comment connoisseur, you could create a monstrosity that generates practically infinite YouTube comments, retraining the model with the worst comments as you go, scraping the depths of absurdity, until you get something like:

“no every conversation with a democrat goes like neil degrasse tyson is basically carl sagan black edition at nintendo years old when I was your age I thought greedy corporations worked like this comment has been deleted because the video has nothing to do with what this mom makes 30 dollars a day filling out richard dawkins surveys which is still a better love story than twilight.”

According to wikipedia, “an n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application.”

Raingrams describes itself as a “flexible and general-purpose ngrams library written in Ruby.” It generates text content by building models based on text occurring in pairs, trios, etc – there doesn’t seem to be a limit on the complexity of the model you can use, but the model classes included go from BigramModel to HexagramModel.

$ gem install raingrams

Creating and training a model is easy.

require 'raingrams'
model = Raingrams::BigramModel.new
model.train_with_text "When you are courting a nice girl an hour seems like a second. When you sit on a red-hot cinder for a second that seems like an hour. That's relativity."

model.random_sentence
=> "When you sit on a nice girl an hour."

If you include the Raingrams module, you don’t need to use it as a namespace.

include Raingrams
model = BigramModel.new

One of the really nice things about Raingrams is the ability to train it with files or web pages instead of just strings. Raingrams provides the following training methods:

  • Model#train_with_paragraph
  • Model#train_with_text
  • Model#train_with_file
  • Model#train_with_url

I was pleasantly surprised to find that #train_with_url works…pretty well! It isn’t perfect, and it can create sentences that are cut off, but writing a filter to discard broken sentences is probably easier than writing a scraper for every single site you want to train your models with.

Bigram models can work with very small data sets, but they tend to produce rather incoherent results.

>> require 'raingrams'
>> include Raingrams
>> model = BigramModel.new
>> model.train_with_url "http://en.wikipedia.org/wiki/Central_processing_unit"
>> model.random_sentence
=> "One notable late CPU decodes instructions rather than others before him
    such as pipelining and 1960s no arguments but still continued by eight 
    binary CPU register may not see 4."

Coherence to the point of almost believability seems to start with quadgrams. Unfortunately, quadgrams require quite a bit of data in order to produce “random” text.

>> model = QuadgramModel.new
>> model.train_with_url "http://en.wikipedia.org/wiki/Central_processing_unit"
>> model.random_sentence
=> "Tube computers like EDVAC tended to average eight hours between failures
    whereas relay computers like the slower but earlier Harvard Mark I which was
    completed before EDVAC also utilized a stored-program design using punched 
    paper tape rather than electronic memory."

If you wanted to create a “H.P Lovecraft- sounding” prose generator, you could train n-grams models on his stories.

>> model = QuadgramModel.new
>> model.train_with_url "http://www.dagonbytes.com/thelibrary/lovecraft/mountainsofmaddness.htm"
>> model.random_sentence
=> "Halfway uphill toward our goal we paused for a momentary breathing 
    spell and turned to look again at poor Gedney and were standing in a 
    kind of mute bewilderment when the sounds finally reached our 
    consciousness the first sounds we had heard since coming on the camp 
    horror but other things were equally perplexing."

>> model.random_sentence
=> "First the world s other extremity put an end to any of the monstrous
    sight was indescribable for some fiendish violation of known natural 
    law seemed certain at the outset."

That missing apostrophe in “world s” is not a typo, and it was present in the original text. You will need to watch for stuff like that.

Conclusion

Ruby has a lot to offer when it comes to random data. Even more, a lot of these libraries would be easy to modify or improve upon. If you are a newcomer to Ruby, and you want to get involved, this is a great opportunity.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Darek

    Rubystats doesn’t work. It might ruby 1.93 problem, or it is this typo: rubystatUs. I will check it later.
    Aside this: great article!
    ps. https://github.com/benburkert/randexp for random string creator from regexp + random names;
    https://github.com/Imikimi-LLC/literate_randomizer “might” be better alternative for raingram.

  • Anonymous

    You’re right. It’s a typo, but also the constructor is missing. I though I caught all of those. I’ll get it fixed.

    human_mass_generator = Rubystatus::NormalDistribution(68, 12)
    should be: human_mass_generator = Rubystats::NormalDistribution.new(68, 12)

    As for literate_randomizer, I spent forever looking for that! I couldn’t remember what it was called, and so I got left out.

    Thanks for pointing out all of these.

  • Jan Lelis

    Thank you for this collection, learned about some useful gems! Additonal read: I blogged on some things you should keep in mind when using ruby’s random methods: http://rbjl.net/67-ruby-and-random

  • madbomber

    this was fun. Thanks for the article.

  • Anonymous

    H. P. Lovecraft example is really awesome!

  • Gavin

    Great work!

  • Anonymous

    >> model = Raingrams::TrigramModel.new
    >> model.train_with_url “http://www.sitepoint.com/tour-random-ruby/”
    >> model.random_sentence
    => However According to wikipedia an n-gram is a good page on stackoverflow which has several solutions for generating confirmation codes in western localizations.

    Cool.

  • http://scanty-evidence-1.heroku.com Kai Middleton

    Fantastic article!! The only thing that I didn’t see clearly stated, though, is: Do I need to somehow seed the rand() function myself or can I just haul off and start using it? And why?