The Ruby Transmogrifier, Part II
Getting information into the Transmogrifier
In our last episode, we transmogrified data from one format into another. Now you need to get data into it using the transmogrifier. We could hard code the file names in there but that will come back to haunt us. Let’s make is so we can load in the definition file, the data file, and the name of the output file.
You’ll need to read the csv data file. How do you read a file line by line in Ruby?
require 'csv'
File.open("def.txt") do |file|
while line = file.gets
puts line
end
end
You can save the file. I saved it as transmogrifier.rb
You can go ahead and run it.
$ ruby transmogrifier.rb
FIELD NAME FORMAT
NAME A(50)
ADDRESS1 A(50)
ADDRESS2 A(50)
CITY A(50)
STATE A(2)
ZIP A(10)
CONTACT A(50)
CONTACTPHONE A(10)
ACCOUNTOPENED 9(8)
It works. We have data now we need to load up the definitions. In the transmogrifier.rb go ahead and add the code to do that. Remember what to do?
File.open("data.csv") do |file|
while line = file.gets
puts line
end
end
If you’re curious go ahead and run it.
Now that you have the definitions and the data loaded you need to send them to the transmogrifier. You need to send the data, length of the field, the type, and the column name. Three of the four come from the definition file.
Let’s write out the Type, Length and, Field name. The Field name starts at the beginning of a line
We could split the line up and put it in an array. Since the file uses spaces, not tabs, for aligning things we can use split.
require 'csv'
definitions = Array.new
File.open("def.txt") do |file|
while line = file.gets
definitions = line.split()
puts definitions[0]
puts definitions[1]
end
end
...
That gets us the field name but the Type and Length are still together. Were you thinking of regular expressions to find the Type and the Length?
require 'csv'
definitions = Array.new
File.open("def.txt") do |file|
while line = file.gets
definitions = line.split()
field = definitions[0]
type = definitions[1] =~ /9/? '9' : 'A'
length = definitions[1].slice(1..definitions[1].length).gsub(/[^0-9]/, "")
puts 'field => ' + field.upcase + ' type => ' + type + ' length => ' + length.to_s
end
end
Go ahead run it.
$ ruby transmogrifier.rb
field => FIELD type => A length =>
field => NAME type => A length => 50
field => ADDRESS1 type => A length => 50
field => ADDRESS2 type => A length => 50
field => CITY type => A length => 50
field => STATE type => A length => 2
field => ZIP type => A length => 10
field => CONTACT type => A length => 50
field => CONTACTPHONE type => A length => 10
field => ACCOUNTOPENED type => 9 length => 8
Now we’re getting somewhere. Three of the four variables that get passed to the transmogrifier. Maniacal laugh.
One Queston.
How do you think you will get the data into that array?
What if we throw the array into a hash? Then, take that hash into another hash where the key is the field name and the data is from the first hash.
All we need to do is loop through data and find the key from the hash that goes with that column and send all that to the transmogrifier. Keeping things simple here.
First, go ahead and put the definition data into a hash. Then stuff that into a hash. For fun, go ahead and output that last hash so we can see it.
require 'csv'
object_name = Hash.new
definitions = Array.new
File.open("def.txt") do |file|
while line = file.gets
definitions = line.split()
field = definitions[0]
type = definitions[1] =~ /9/? '9' : 'A'
length = definitions[1].slice(1..definitions[1].length).gsub(/[^0-9]/, "")
object_formatting = Hash.new
object_formatting["type"] = type
object_formatting["length"] = length
object_formatting["field"] = field
object_name[field.upcase] = object_formatting
end
end
puts object_name
Run it and let’s and see what we have.
$ ruby transmogrifier.rb
{"FIELD"=>{"type"=>"A", "length"=>"", "field"=>"FIELD"}, "NAME"=>{"type"=>"A", "length"=>"50", "field"=>"NAME"}, "ADDRESS1"=>{"type"=>"A", "length"=>"50", "field"=>"ADDRESS1"}, "ADDRESS2"=>{"type"=>"A", "length"=>"50", "field"=>"ADDRESS2"}, "CITY"=>{"type"=>"A", "length"=>"50", "field"=>"CITY"}, "STATE"=>{"type"=>"A", "length"=>"2", "field"=>"STATE"}, "ZIP"=>{"type"=>"A", "length"=>"10", "field"=>"ZIP"}, "CONTACT"=>{"type"=>"A", "length"=>"50", "field"=>"CONTACT"}, "CONTACTPHONE"=>{"type"=>"A", "length"=>"10", "field"=>"CONTACTPHONE"}, "ACCOUNTOPENED"=>{"type"=>"9", "length"=>"8", "field"=>"ACCOUNTOPENED"}}
...
Lovely. Now you need to process the data file. Let’s get the Field name and the data for that column
CSV.foreach("data.csv", headers: true) do |data|
data.headers.each do |field|
puts field + ' => ' + data[field].to_s
end
end
Hold on! What is that CVS.foreach() call? Let’s take a look at that.
That line will process a CSV file and treat the first line as a header and not data.
When you run that you output should look like
name => Wonder widgets
address1 => 1600 Vassar Street
address2 =>
city => dallas
state => tx
zip => 75220
contact => Tim Smith
contactPhone => 214-555-1212
accountOpened => 12052001
name => Timmy's Bikes
address1 => 2723 Auburn Street
address2 => Building 3
city => Erie
state => PA
zip => 16508-1234
contact =>
contactPhone => 814-555-4321
accountOpened => 865289
We’re Almost Home.
Now we have the data and the definitions. Let’s send that information to the transmogrifier.
As we loop through the data, we need to find the matching key form the hash we made earlier. When we match it send it to the transmogrifier
CSV.foreach("data.csv", headers: true) do |data|
data.headers.each do |field|
object_name.each do |key,value|
if field.upcase == key
field.upcase!
line = transmogrifier(data[field].to_s, object_name[field]["length"].to_i, object_name[field]["type"], field)
print line
end
end
end
puts ' '
puts ' --------- '
end
That will do all that we talked about. Had we written tests like in the first part we would catch a bunch of gotchas. What if there is no data for a column? How do we get all the data on one row? How do we save the data? How do we pass the definition and data file names in?
Here’s what I came up with before refactoring
require 'csv'
def transmogrifier(data,len,type,column)
proper_formatted = ''
unless data.nil?
case
when data.length > len
data = data.slice(0..(len-1))
when data.length < len
if type == "A"
data = data.ljust(len)
else
data = data.rjust(len)
end
else
if column == "ACCOUNTOPENED"
data = data.slice(4..7)+data.slice(0..3)
else
data
end
end
proper_formatted += data
end
proper_formatted += '|' # added pipes to see where the field ends
proper_formatted
end
object_name = Hash.new
file = File.new(ARGV[0],"r")
definitions = Array.new
while (line = file.gets)
definitions = line.split()
field = definitions[0]
type = definitions[1] =~ /9/? '9' : 'A'
length = definitions[1].slice(1..definitions[1].length).gsub(/[^0-9]/, "")
object_formatting = Hash.new
object_formatting["type"] = type
object_formatting["length"] = length
object_formatting["field"] = field
object_name[field.upcase] = object_formatting
end
file.close
aFile = File.new(ARGV[2], "w")
CSV.foreach(ARGV[1], headers: true) do |data|
data.headers.each do |field|
object_name.each do |key,value|
if field.upcase == key
field.upcase!
line = transmogrifier(data[field].to_s, object_name[field]["length"].to_i, object_name[field]["type"], field)
aFile.write(line)
print line #so you can see something in the terminal window
end
end
end
aFile.write("n")
puts ' '
puts ' --------- '
end
aFile.close
If you run $ ruby transmogrifier.rb def.txt data.csv formated.txt
You should have a new file in the folder you are working in called formated.txt
Now go forth and transmogrify Maniacal laugh.