I'm a new user of Ruby, or strictly saying, I am a new guy in programming. I am currently doing a project which requires to write a parser of the log file of Apache web server in Ruby, so as to produce some page-visiting statistics, eg. which is the most/top10 popular page(s) in a specific day/week.

As a new user, again and again I read a few books and related docs but still have no idea how to write the parser class. I'm posting the problem here and expecting you experts can give me a hand.

So, the daily log file contains entries, each of which is following a standard format as shown below. And the "-" hyphen indicates the info. is not available.

clientIP identd userid time request statusCode objSize
for example, - - [06/Oct/2005:17:03:08 +0100] "GET /interface/video-ipod.html HTTP/1.1" 200 5657 - - [06/Oct/2005:17:03:10 +0100] "GET /php/adlog.htm" 200 43

each piece of info. is separated by a space, each entry is a new line, the whole file consists of lines of entries in this format.

To parse this file, I know some basic idea:

1. to read through the *.log file, I use the code
source = File.new("12-01-2005.log", "r")
while (line = source.gets)

2. for each line(or say, entry), some parsing expressions:
for clientIP, eg., can be expressed by /[0-9]+(.[0-9]+)?/
for identd, it is always be hyphen "-", so can be expressed by /-/
for userid, it is arbitary many chars, so, /[a-zA-Z0-9]+/
for time, eg.[06/Oct/2005:17:03:08 +0100], as it is starting with "[" and
end with "]", so can be expressed by /^[$]/
for request piece, eg. "GET /interface/video-ipod.html HTTP/1.1", can be /^"$"/
other two are simply just digits /[0-9]+/

3. the result of the parser class could probably be an array for further uses. that is, we write each of the parsed entry into an array of object "entry".

So, this is the first step I need to do, I learnt a little and these are what I designed. I think there should be something which are not correct, and somewhere that need to be improved. Also, as I have no experience in Ruby, I cannot construct all these and write these in a class. I am hereby hoping your experts could help me with the solution. Every little helps! Thanks very much!