The Ruby Transmogrifier

Input Output GraphicOne of the things computers are good at is moving data. When you have to migrate data from one type to another, I have found that Ruby makes my job a lot easier. A while back I had this a task that involved moving data. We were getting dozens of data sets that needed to be converted from CSV to a fixed length text file. This is the story of how we got it done.

Defining the Finished Output.

We are given a, let’s say, definition file on how the data needed to be formatted for importing. Let’s say it looked like this.

FIELD NAME      FORMAT
NAME            A(50)
ADDRESS1        A(50)
ADDRESS2        A(50)
CITY            A(50)
STATE           A(2)
ZIP             A(10)
CONTACT         A(50)
CONTACTPHONE    A(10)
ACCOUNTOPENED   9(8)

What does this mean? We’re really focused on the format. ‘A’ means alphanumeric with white space added to the right. ‘9’ means numeric with white space added to the left. The number means length of the field, so ‘A(10)’ is a alphanumeric field of 10 characters.

Our source data is in a CSV file. Something like:

name,address1,address2,city,state,zip,contact,contactPhone,accountOpened
Wonder widgets,1600 Vassar Street,,dallas,dallas,tx,75220,Tim Smith,214-555-1212,12052001
Timmy's Bikes,2723 Auburn Street,Building 3,Erie,ERIE,PA,16508-1234,,814-555-4321,03232011

The output just needs to be a text file.

Wonder widgets  1600 Vassar Street             dallas tx75220     Tim Smith 214-555-121220011205
Timmy's Bikes   2723 Auburn Street  Building 3 Erie   PA16508-1234          814-555-432120110323

How would you do that?

Time to Start Transmogrifing

How would you transmogrify the data? From the target definition, we know the name will be 50 characters long. You can write a method to take the name column and make it 50 characters long. Want to plow down that path? We can re-factor later. Remember you have to pass in text.

def nameFixing(name)
  name = name.ljust(50)
end

Good old ljust will add the extra white space we need. Does this seem to easy? It is Ruby, but maybe we should test this.

require 'test/unit'

def nameFixing(name)
  name = name.ljust(50)
end

class NameLengthTest < Test::Unit::TestCase
  def test_name_short_should_equal_50
    name = nameFixing "boo"
    assert_equal(name.length, 50)
  end
end

Save that file and go ahead and run the test.

$ ruby ljust_test.rb
Loaded suite ljust_test
Started
.
Finished in 0.000802 seconds.

1 tests, 1 assertions, 0 failures, 0 errors, 0 skips

Test run options: --seed 27164

Sweet, It did work.

What happens if it’s more than 50 characters long? Go ahead and take stab at it.

require 'test/unit'

def nameFixing(name)
  name = name.ljust(50)
end

class NameLengthTest < Test::Unit::TestCase
  def test_name_short_should_equal_50
    name = nameFixing "boo"
    assert_equal(name.length, 50)
  end

def test_long_short_should_equal_50
    name = nameFixing "dgrtefdgdfshrtyutyuykhjkmbnmvcbdfgrthjghjgghvdfgrstthg"
    assert_equal(name.length, 50)
  end
end

 

$ ruby ljust_test.rb
Loaded suite ljust_test
Started
F.
Finished in 0.001179 seconds.

1) Failure:
test_long_short_should_equal_50(NameLengthTest) [ljust_test.rb:15]:
<54> expected but was
<50>.

2 tests, 2 assertions, 1 failures, 0 errors, 0 skips

Test run options: --seed 40377

Did you expect that? ljust adds white space, but does not slice off extra characters. What do you think would slice off the extra characters? You got it slice.

If the name is less than 50 we need to add white space to the left. If the name is over 50 we need to slice off the extra characters. If the name has 50 characters we leave it alone. Did we just rewrite our tests?

require 'test/unit'

def nameFixing(name)
  case
  when name.length 50
    name = name.slice(0..49)
  when name.length < 50
    name = name.ljust(50)
  else
    name
  end
end

class NameLengthTest < Test::Unit::TestCase
  def test_name_short_should_equal_50
    name = nameFixing "boo"
    assert_equal(name.length, 50)
  end

def test_name_long_should_equal_50
    name = nameFixing "dgrtefdgdfshrtyutyuykhjkmbnmvcbdfgrthjghjgghvdfgrstthg"
    assert_equal(name.length, 50)
  end

def test_name_50_should_equal_50
    name = nameFixing "dgrtefdgdfshrtyutyuymbnmvcbdfgrthjghjgghvdfgrstthg"
    assert_equal(name.length, 50)
  end
end

Do you think this will pass? Try it.

Now that you have a way to make the name the proper length, we need to continue with the other columns. Did you notice a pattern with the definitions? Alphanumeric or numeric and a certain length.

Why don’t we go ahead and tackle the length of the column. You should just be able to pass in a variable to set that. Try it.

require 'test/unit'

def nameFixing(name,len)
  case
  when name.length len
    name = name.slice(0..(len-1))
  when name.length < len
    name = name.ljust(len)
  else
    name
  end
end

class NameLengthTest < Test::Unit::TestCase
  def test_name_short_should_equal_50
    name = nameFixing "boo",50
    assert_equal(name.length, 50)
  end

def test_name_long_should_equal_50
    name = nameFixing "dgrtefdgdfshrtyutyuykhjkmbnmvcbdfgrthjghjgghvdfgrstthg",50
    assert_equal(name.length, 50)
  end

def test_name_50_should_equal_50
    name = nameFixing "dgrtefdgdfshrtyutyuymbnmvcbdfgrthjghjgghvdfgrstthg",50
    assert_equal(name.length, 50)
  end
end

Now you’re passing in the length we need to set it too. Save and rerun the test. Remember Check yourself before you wreck yourself. I need to practice that too.

$ ruby ljust_test.rb
Loaded suite ljust_test
Started
...
Finished in 0.000863 seconds.

3 tests, 3 assertions, 0 failures, 0 errors, 0 skips

Test run options: --seed 30117

Still green. We can set the the length to whatever we want. We’ve only really worked with the alphanumeric side. For numeric we need to add white space to the left side. We haven’t checked to see if we are adding white space to the right side of the string. Why don’t you go ahead and add that to the test. I’ll wait.

def test_name_should_have_white_space_on_the_right
  name = nameFixing "should be 18", 18
  assert_equal(name, "should be 18      ")
end

Are your tests passing? Cool. Let’s move on to numeric columns. Pretty much the same idea but white space on the left. Here’s my quick and dirty code

require 'test/unit'

def nameFixing(name,len)
  case
  when name.length len
    name = name.slice(0..(len-1))
  when name.length < len
    name = name.ljust(len)
  else
    name
  end
end

def numbFixing(numb,len)
  case
  when numb.length len
    numb = numb.slice(0..(len-1))
  when numb.length < len
    numb = numb.rjust(len)
  else
    numb
  end
end

class NameLengthTest < Test::Unit::TestCase
  def test_name_short_should_equal_50
    name = nameFixing "boo",50
    assert_equal(name.length, 50)
  end

def test_name_long_should_equal_50
    name = nameFixing "dgrtefdgdfshrtyutyuykhjkmbnmvcbdfgrthjghjgghvdfgrstthg",50
    assert_equal(name.length, 50)
  end

def test_name_50_should_equal_50
    name = nameFixing "dgrtefdgdfshrtyutyuymbnmvcbdfgrthjghjgghvdfgrstthg",50
    assert_equal(name.length, 50)
  end

def test_name_should_have_white_space_on_the_right
    name = nameFixing "should be 18",18
    assert_equal(name, "should be 18      ")
  end

def test_numb_should_have_white_space_on_the_left
    numb = numbFixing "123",8
    assert_equal(numb, "     123")
  end
end

Go ahead, try it. Is it green? Excellent but we’re not very DRY.

Time to Refactor.

We con combine the methods and when we need to add white space we could put an if in there for alphanumeric or numeric. Let’s try that.

require 'test/unit'

def allThingsFixing(data,len,type)
  case
  when data.length len
    data = data.slice(0..(len-1))
  when data.length < len
    if type == "A"
      data = data.ljust(len)
    else
      data = data.rjust(len)
    end
  else
    data
  end
end

class NameLengthTest < Test::Unit::TestCase
  def test_name_short_should_equal_50
    data = allThingsFixing "boo",50,"A"
    assert_equal(data.length, 50)
  end

def test_name_long_should_equal_50
    data = allThingsFixing "dgrtefdgdfshrtyutyuykhjkmbnmvcbdfgrthjghjgghvdfgrstthg",50,"A"
    assert_equal(data.length, 50)
  end

def test_name_50_should_equal_50
    data = allThingsFixing "dgrtefdgdfshrtyutyuymbnmvcbdfgrthjghjgghvdfgrstthg",50,"A"
    assert_equal(data.length, 50)
  end

def test_name_should_have_white_space_on_the_right
    data = allThingsFixing "should be 18",18,"A"
    assert_equal(data, "should be 18      ")
  end

def test_numb_should_have_white_space_on_the_left
    data = allThingsFixing "123",8,"X"
    assert_equal(data, "     123")
  end

end

Did you rerun the test? Did they all pass? Brilliant.

How to Handle the Date.

According to the definition, date needs to be yyyymmdd and luckily we are receiving it mmddyyyy. We just need to cut it up and rearrange it.

Will you be able to pass that data blindly into our method? We do know that the name of the column is “accountOpened.” To keep things moving we could pass a column name into the method. Time to write a test.

def test_date_should_format_properly
  data = allThingsFixing "08152003",8,"9","accountOpened"
  assert_equal(data, "20030815")
end

Did you remember to add the new variable you’re passing into the method?

Here’s what I did with the method

def allThingsFixing(data,len,type,column)
  case
  when data.length len
    data = data.slice(0..(len-1))
  when data.length < len
    if type == "A"
      data = data.ljust(len)
    else
      data = data.rjust(len)
    end
  else
    data
  end
end

You can go ahead and run the test. It should fail.

$ ruby ljust_test.rb
Loaded suite ljust_test
Started
F.....
Finished in 0.001405 seconds.

1) Failure:
test_date_should_format_properly(NameLengthTest) [ljust_test.rb:52]:
<"08152003"expected but was
<"20030815">.

6 tests, 6 assertions, 1 failures, 0 errors, 0 skips

Test run options: --seed 25697

Now you just need to reformat the date. It looks like we just need to move the year from the back to the front. Now where should we put this? Since there are eight digits in the column and it calls for eight digits in the output, we’ll put it in the section of its where data.len == len

def allThingsFixing(data,len,type,column)
  case
  when data.length len
    data = data.slice(0..(len-1))
  when data.length < len
    if type == "A"
      data = data.ljust(len)
    else
      data = data.rjust(len)
    end
  else
    if column == "accountOpened"
      data = data.slice(4..7)+data.slice(0..3)
    else
      data
    end
  end
end

There seem to be a lot of if statements. That can be re-factored later. Does the test pass?

$ ruby ljust_test.rb
Loaded suite ljust_test
Started
......
Finished in 0.001030 seconds.

6 tests, 6 assertions, 0 failures, 0 errors, 0 skips

Test run options: --seed 48958

Transmogrification Complete!

How would you get the data into the Transmogrifier? Stay tuned!

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Someone

    I think that you have inadvertently added an extra ‘r’ into ‘transmogrify’ (‘mog’ becomes your ‘morg’).

  • John

    I’m fairly sure the word you are attempting to use is transmogrify not transmorgrify unless you are coining a new word.

  • http://offortunity.co.uk John Edgley

    Excellent article… though it jarred somewhat every time you spelled transmogrify (and derivatives) wrongly… and you even got Glenn Goodrich doing so in the Site Point email! (http://www.merriam-webster.com/dictionary/transmogrify)

  • http://www.ruprict.net/ Glenn Goodrich

    Ugh…the misspelling is completely on me, folks. I am !smrt.

    Thanks for letting me know.

    Can’t do anything about the newsletter, but them’s the brakes, I guess. :S