SitePoint Sponsor

User Tag List

Results 1 to 3 of 3
  1. #1
    SitePoint Enthusiast menkes's Avatar
    Join Date
    Jul 2002
    Location
    Channel Islands
    Posts
    75
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Large files - Need Help Using Memory

    I have a script (not web based) that consolidates account data from several files. As each account is consolidated, the individual records are "marked" to indicate the record purpose. These marked records are written to the "consolidation" file. Here is an example (Conolidation file - cons.dat):

    C <35 records of customer account data> ...from cust.dat
    C2<12 records of seller data> ...from seller.dat
    DH<20 records of phone line info> ...from detail.dat
    DD<38 records of call detail> ...from detail.dat
    DS<4 records of call summary info> ...calculated from internal arrays

    When I run my script on small files, it does exactly what I intended. But on large files the computer is just totally bogged down.

    The computer = Dual Xeon 450MHz processors, 1GB RAM, running W2K Server. The only other thing running is SQL Server 2K (a hog, I know).

    Here are the basic steps:

    1. Read in customer data. I put this in an hash because it must be dynamically accessed several thousand times (same with 2 & 3 below).
    Code:
    while (<HDRFILE>)
    {
    	chomp;
    	@hdr_fld = split(/\t/, $_);
    	$hdr_invoice = $hdr_fld[1];
    	$header{$hdr_invoice} = $_;
    }
    2. Same as 1, but from summary file (3MB).

    3. Same as above but conference call data (100K).

    4. This is the meat of the script. This is the detail file and it runs about 1.5GB in size. It starts out:
    Code:
    while(<INFILE>) 
    {
    	# Remove line feed
    	chomp;
    	$line = $_;
      
    	# Split line into an array using tab delimiter
    	my @fld = split(/\t/, $line);
    ...based on the key field, I will grab data from the hashes(steps 1-3), build new hashes for summarizing, and do some minor calculations. The new summary hashes will contain from 1 to ~200 lines of data...then they are cleared for the next set. As you can see, I read in one detail line at a time....I then write out the detail line, and possibly additional lines from:
    - The hashes built in steps 1-3
    - The summary hashes
    - Logic (i.e., a header record to indicate the start of a new invoice)

    I write out each line to the consolidation file...I do have a concern it may be the output file causing the problem, but I really don't know?..?

    I know most people will probably not even read this far, but if you are one of those...please help. I am in a desparate situation on this one and need results quickly. If you can resolve this and would like to be paid, that's fine...send me a PM. Otherwise, post away. I will provide whatever additional info you need...I am at my computer all day so I will respond within a few minutes to any valid post.

    Thanks.

  2. #2
    SitePoint Wizard silver trophy redemption's Avatar
    Join Date
    Sep 2001
    Location
    Singapore
    Posts
    5,269
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    How big is your consolidation file? I'd written a similar script before and I found that if the consolidation file is very big, then that would most likely be the thing that's causing the slowdown (I had some tens of MB of output which when disabled, speeded things up a whole lot). That could be your problem.

    I'm not so sure if the 1.5GB input file could be your problem cos I never had to deal with such a large input file before. *shrug*

  3. #3
    SitePoint Enthusiast menkes's Avatar
    Join Date
    Jul 2002
    Location
    Channel Islands
    Posts
    75
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The final output file will be roughly 10% larger than the big file...making it 1.6 to 1.7GB. I am concerned that this is the problem, but don't really know. It seems that the script "hangs" near the end for a considerable amount of time, leaving me to think it is having issues with the output file.

    <Wondering if I should write several output files then just concantenate them?>

    As a follow-up, I did get the script to run last night. It took 10.5 hours to complete. Soooo, I am still looking for assistance - my processing window is closer to 2 hours.

    Thanks.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •