SitePoint Sponsor

User Tag List

Results 1 to 19 of 19

Hybrid View

  1. #1
    SitePoint Member
    Join Date
    May 2006
    Posts
    21
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Tab Separated .txt with Ruby

    Hi,
    I've been tasked with using Ruby to read through a simple Tab Seperated .txt file, pulling the data out and reorganizing the columns. There are 5 Columns and about 20 entries. Seems simple but I can't quite figure out where to begin other than opening the file. Can anyone provide some insight?

    A different language or renaming the file is not an option.

    Thanks for any help.
    Last edited by nheinrich; May 22, 2006 at 00:19.

  2. #2
    SitePoint Zealot
    Join Date
    Jul 2005
    Posts
    124
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I would take a look at the IO class, in particular the readlines method. Hope it helps.

  3. #3
    SitePoint Member
    Join Date
    May 2006
    Posts
    21
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Any hints on differentiating the tabbed areas? I've been looking at readlines but can't decipher how to find the tabs. Once I do that I'm hoping to dump them to an array or hash and display them.

    Thanks.

  4. #4
    SitePoint Zealot
    Join Date
    Jul 2005
    Posts
    124
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Couldn't you use the string class's split method to separate things by each occurence of a \t?

  5. #5
    SitePoint Member
    Join Date
    May 2006
    Posts
    21
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I think I've got something working with readlines and split, just trying to sort out the array.

    What's the best way to test a standalone .rb file? Or the best way to embed in html and get it to show up? I'm currently just using "Ruby test.rb" from terminal.

  6. #6
    SitePoint Zealot
    Join Date
    Jul 2005
    Posts
    124
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I usually use IRB. Just store the path to your file in a variable and load it every time it changes.

    Code:
    irb
    irb >> test_file = "C:\\Documents and Settings\\All Users\\Desktop\\Test.rb"
    irb >> load test_file

  7. #7
    SitePoint Member
    Join Date
    May 2006
    Posts
    21
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    thanks a bunch for all of your help iaihmb

    the terminal thing has been working alright i suppose for what im doing. no need for more, just thought it would be fun to drop it in html.

    I've got it displaying my line (still trying to figure out how to get it to scroll through readlines but i may have that working soon), but I'm looking on how to return a tab back to the screen in between the array, so far my code looks like this.

    Code:
    f = File.new("info.txt")
    line = f.readlines[0].split
    line.values_at(0, 2, 1, 3, 4)
    puts line.join('        ')
    I've tried \t but it doesn't seem to know what that is. Any ideas?

    Thanks again,
    couldn't have done it without you.

  8. #8
    SitePoint Zealot
    Join Date
    Jul 2005
    Posts
    124
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    No problem at all. Give the following example a shot, it should give you a couple of ideas:

    Code:
    # Replace with the path to your file, make sure you use double backslashes
    # as a single backslash is Ruby's escape character.
    test_file = "C:\\Documents and Settings\\All Users\\Desktop\\Test.txt"
    
    # Stores each line of the file in the "lines" array.
    lines = IO.readlines(test_file)
    
    # Loop through each line stored in said array and output the raw content.
    lines.each do |line|
    	p line
    end
    Also, can you paste the raw output of one of the lines? If the entries are truly separated by tabs you should see an occasional \t.

  9. #9
    SitePoint Member
    Join Date
    May 2006
    Posts
    21
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    i wasn't seeing \t's the way that i was doing it, but i am now, of course i'm liking this much better as it's now looping through the files. here is the output, ive tried a couple different ways of getting it to split on \t since you posted it but none have worked yet.
    Thanks.

    "1\tJJ-Shirt\t1\tbrown\tcafepress\n"
    "2\tJJ-Mug\t2\tblack\tcafepress\n"
    "3\tJJ-DVD\t4\tN/A\tJJ\n"
    "4\tJJ-205\t10\tN/A\tJJ\n"
    "5\tJJ-THIS\t10\tN/A\tJJ\n"
    "6\tJJ-Pen\t2\twhite\tpenco\n"



    and the code im currently using

    Code:
    lines = IO.readlines("info.txt")
    
    lines.each do |line|
    line.split('\t')
    line.values_at(0, 2, 1, 3, 4)
    puts line.join('        ')
    end

  10. #10
    SitePoint Zealot
    Join Date
    Jul 2005
    Posts
    124
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Edit: Use double quotes on line 4.

  11. #11
    SitePoint Member
    Join Date
    May 2006
    Posts
    21
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Still having the issue

    Code
    Code:
    lines = IO.readlines("info.txt")
    
    lines.each do |line|
      line.split("\t")
      line.values_at(0, 2, 1, 3, 4)
      puts line.join('       ')
    end

    Output
    item name subcat color vendor
    1 JJ-Shirt 1 brown cafepress
    2 JJ-Mug 2 black cafepress
    3 JJ-DVD 4 N/A JJ
    4 JJ-205 10 N/A JJ
    5 JJ-THIS 10 N/A JJ
    6 JJ-Pen 2 white penco
    quiz.rb:9: undefined method `values_at' for "item\tname\tsubcat\tcolor\tvendor\n":String (NoMethodError)
    from quiz.rb:7

  12. #12
    SitePoint Member
    Join Date
    May 2006
    Posts
    21
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    in doing a bit more testing i don't think it's seeing that line of code at all, if i try to split on something more easy to find, like 'm' it still doesn't. im a bit lost

  13. #13
    SitePoint Zealot
    Join Date
    Jul 2005
    Posts
    124
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You're trying to use values_at on a string (As it's being used on line, which is a variable containing the value of the current array index.), you would have to use it on lines (The array containing the file lines.). Just curious, why are you trying to use values_at, to limit the result, or? I'm sure there is a better way to go about it.

    Edit: What's the current problem, I'm a bit lost. You can store items in a hash after separating them with split (Which now works because of the double quotes.), and then print them out if that's what you're after. Now I'm confused. :P

  14. #14
    SitePoint Member
    Join Date
    May 2006
    Posts
    21
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    hahaha.
    I have to swap two of the columns which is the reason why I'm attempting to reorder them with values_at.

    What I'm assuming is happening, (which isn't saying much), is I split 'line' and then reorder it, then print it out. From what I can tell, split isn't working as split would create an array out of line (i may be wrong here) at which time i could reorder with values_at.

    Just to clarify,
    I'm trying to grab the info from the text file
    reorder columns 2 and 3 (1 and 2 if you start with 0)
    and print them back out.

    Hope that makes a bit more sense now.
    Thanks

  15. #15
    SitePoint Member
    Join Date
    May 2006
    Posts
    21
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Just to verify, I tried this

    lines.each do |line|
    p line.split("\t")
    end



    Which is giving me this


    ["item", "name", "subcat", "color", "vendor\n"]
    ["1", "JJ-Shirt", "1", "brown", "cafepress\n"]
    ["2", "JJ-Mug", "2", "black", "cafepress\n"]
    ["3", "JJ-DVD", "4", "N/A", "JJ\n"]
    ["4", "JJ-205", "10", "N/A", "JJ\n"]
    ["5", "JJ-THIS", "10", "N/A", "JJ\n"]
    ["6", "JJ-Pen", "2", "white", "penco\n"]


    So split is working, I'm a bit confused, so when you apply something in a do/end loop it's not permanent then?

  16. #16
    SitePoint Zealot
    Join Date
    Jul 2005
    Posts
    124
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Alright, I'll come up with an example, but I just thought of something else. On line 4, you've got:

    Code:
    line.split("\t")
    This doesn't modify line, it returns an array. Try changing that line to:

    Code:
    line = line.split("\t")
    I'll have an example in, ~20 minutes or so.

    Edit: See, it's returning an array. Split doesn't modify the value of line, it takes the value of line as one parameter and the string that should divide said line as another parameter, returning an array.

  17. #17
    SitePoint Member
    Join Date
    May 2006
    Posts
    21
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    clarifying that made a world of difference for me.

    So now I have this

    Code:
    lines = IO.readlines("info.txt")
    
    lines.each do |line|
    line = line.split("\t")
    line = line.values_at(0, 2, 1, 3, 4)
    print line.join("\t\t")  #two added to seperate columns better
    end
    Which is printing this (after I modified one of the shirt names to play nice)
    Code:
    item            subcat          name            color           vendor
    1               1               JJShirt         brown           cafepress
    2               2               JJ-Mug          black           cafepress
    3               4               JJ-DVD          N/A             JJ
    4               10              JJ-205          N/A             JJ
    5               10              JJ-THIS         N/A             JJ
    6               2               JJ-Pen          white           penco
    So it works (yay!)
    I'd still like to check out your example though if you'd like to make it, to see a proper way of handling this.

  18. #18
    SitePoint Zealot
    Join Date
    Jul 2005
    Posts
    124
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Haha, I'm new to Ruby as well. :P If you're only going to have 5 columns in each row then I believe that you're good to go. Though I've come up with the following example that will accomodate for an infinite amount of columns, I'm sure someone will be able to improve upon it.

    Test.txt:

    Code:
    R1E1	R1E2	R1E3	R1E4	R1E5	
    R2E1	R2E2	R2E3	R2E4	R2E5	
    R3E1	R3E2	R3E3	R3E4	R3E5	
    R4E1	R4E2	R4E3	R4E4	R4E5	
    R5E1	R5E2	R5E3	R5E4	R5E5
    Test.rb:

    Code:
    test_file = "C:\\Documents and Settings\\David Sissitka\\Desktop\\Test.txt"
    
    lines = IO.readlines(test_file)
    
    lines.each_with_index do |line, i|
    	current_row = line.split("\t")
    	puts "Row: #{i}"
    	entry_2 = current_row[1]
    	entry_3 = current_row[2]
    	current_row [1] = entry_3
    	current_row [2] = entry_2
    	
    	current_row.each_with_index do |column, i|
    		puts "Column #{i}: #{column}"
    	end
    end
    Sample Output:

    Code:
    Row: 0
    Column 0: R1E1
    Column 1: R1E3
    Column 2: R1E2
    Column 3: R1E4
    Column 4: R1E5
    Column 5: 
    Row: 1
    Column 0: R2E1
    Column 1: R2E3
    Column 2: R2E2
    Column 3: R2E4
    Column 4: R2E5
    Column 5: 
    Row: 2
    Column 0: R3E1
    Column 1: R3E3
    Column 2: R3E2
    Column 3: R3E4
    Column 4: R3E5
    Column 5: 
    Row: 3
    Column 0: R4E1
    Column 1: R4E3
    Column 2: R4E2
    Column 3: R4E4
    Column 4: R4E5
    Column 5: 
    Row: 4
    Column 0: R5E1
    Column 1: R5E3
    Column 2: R5E2
    Column 3: R5E4
    Column 4: R5E5
    Results are in the form of RXEY where is is the current row and y is the current column.

  19. #19
    SitePoint Member
    Join Date
    May 2006
    Posts
    21
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    That makes sense, I can see why that would be better if I had a bunch of columns, especially if it were to expand (depending on the expansion).

    Thanks for all your help, I really appreciate it. I think I understand looping in Ruby 100x more than I did a few hours ago.

    /bow


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •