Read big text file and count the numbers in this file

hi i have this text file in this address

first i read file and i wana count numbers, i think i should store them in array and then delete repetitive numbers,every line have two number that separated by space,

be;ow is my code but i think for limit memory access its dosnt work,

its seems to be too easy,

<?php
echo "hello ,welcome to retieval page";
ini_set('max_execution_time', 360);
echo "<br>" ;
//$file = file('web-graph.txt');
ini_set('memory_limit', '2048M');
$sumlink=0; //tedade linkga



$sumpage = array();;
  $file = fopen( "web-graph.txt", "r" ) or exit ( "Unable to open file!" ) ;
 
  

  while ( !feof ( $file ) )
    {
    
      //shomaresh link
  if(fgets($file)){
     $sumlink++;
       //shomaresh page
         $str=fgets ( $file );  
         $pos=strpos($str, " ");
         $adad=Substr($str,0,$pos);
         $sumpage[]=$adad;
  }
    }
   echo "تعداد لینک ها =".$sumlink;
  
  echo "<br><br><br><br>";
    
    $f=array_unique($sumpage);
    $x=sizeof($f);
    echo "تعداد صفحات=".$x;
    
  fclose( $file ) ;

?>

whats your point

Rather than using an array to load the entire file and then drop it down to the unique numbers, wouldn’t it be easier to use an indexed array?

$sumpage[$adad] = "x";

Then lose the array_unique() and set $x = count($sumpage)?

I can’t see how you are accessing the second number on each line. Don’t you also need to do that?

1 Like

thanks,

yes i need that one too,but first i need to add them all to array,and array unique is for delete the same numbers in array and count all the numbers,

I get that the array_unique was to delete duplicates, but I figured that if the problem is that you’re using too much memory, using an indexed array might help because instead of loading all the numbers (which might overflow memory) and then deleting duplicates, it would mean you will only create an array element for each unique number, as the array is being built.

1 Like

yes array_unique is for delete duplicates,
but i cant how use indexed array to delete duplicates…

The point of creating an indexed array is that you won’t have any duplicates. Because the index of the array is the number ($adad), each time it finds the same number it will either create the new array element, or overwrite the existing one.

1 Like

@droopsnoot
the bigger problem is , i cant add them all in array,

i edited my code to below code:


<?php

  $file = fopen( "web-graph.txt", "r" ) or exit ( "Unable to open file!" ) ;
while ( !feof ( $file ) )
    {
    
  if(fgets($file)){
     
    $str=fgets ( $file );
    $arr[] = explode(" ", $str);

  }
    }
var_dump($arr);
echo "<br>"."here:";
echo count($arr);

    
  fclose( $file );

?>

but i received this error:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 32 bytes) in C:\xamp\htdocs\webgraf\counter.php on line 45

how to fix this???

If you’re genuinely using more memory than you have available, then you’ll have to figure out a different way of solving the problem. Maybe what you need to do is create a database table (perhaps a temporary one), add a row containing each number, then you can count them either during the build or later with a query. You might be able to do something with an area of memory and set the appropriate bit to 1 - what’s the maximum value of any of the numbers, and are there any negative ones? Are there decimals, or just integers?

So does an indexed array still run out of memory?

<?php
echo "hello ,welcome to retieval page";
ini_set('max_execution_time', 360);
echo "<br>" ;
//$file = file('web-graph.txt');
ini_set('memory_limit', '2048M');
$sumlink=0; //tedade linkga
$sumpage = array();;
  $file = fopen( "web-graph.txt", "r" ) or exit ( "Unable to open file!" ) ;
  $str = fgets($file);
  while ( !feof ( $file ) )
    {
      //shomaresh link
     $sumlink++;
     //shomaresh page
     $tmp = explode(" ", $str);
     $sumpage[$tmp[0]]=1;
     $sumpage[$tmp[1]]=1;
     $str = fgets($file);
  }
   echo "تعداد لینک ها =".$sumlink; // number of lines in file
  
  echo "<br><br><br><br>";
    $x=count($sumpage);
    echo "تعداد صفحات=".$x;
    
  fclose( $file ) ;

?>
1 Like

Awesome,well done,its worked,this one count unique number,right?

Can u explain this? that i quoted?

Unindexed array:

$sumpage[0] = 105;
$sumpage[1] = 105;
$sumpage[2] = 105;
...
$sumpage[13450] = 105;

with an plain array like the above in your original code, every time you create a new array element it gets a new element number starting from zero, and your number from the file is put into the contents of that array. So if you have a file that contained the number 105 over and over again, 13451 times, you’d see the above. You can see that takes up 13451 array elements (0-13450) which contain the same number. Using an indexed array, where it doesn’t really have an element number but uses something else instead, it removes the need to create duplicates. So for the same file, we’d get

$sumpage[105] = 1;

and no matter how many times your file contains the value 105, it will only ever create that one array element, and a separate one for each different number. You could enhance the code to count how many times each value appears, and the sample code I posted doesn’t check whether explodes returns 0, 1 or 2 elements and in live code you’d need to do that.

It’s still only good luck that this version happened to be under the memory limit - that would depend on the spread of numbers and how likely duplicates are. But I’m glad it helped.

1 Like

Thanks so much for your valuable time,:kissing_heart:

Awesome solution and perfect explain,

hi, in large number this text give wrong number,in this big file real number of links is near 900000 but your Code says its 1700000 ,

in little text i found when in each line the number is like each other,its add them,

Have you added code to check that both elements in $tmp have something in them? Also have you checked for leading or trailing spaces?

1 Like

HI,


   if($sumpage[$tmp[0]]!=1 || $sumpage[$tmp[0]]==null ){
    $sumpage[$tmp[0]]=1; 
     
   }
     if($sumpage[$tmp[1]]!=1 || $sumpage[$tmp[1]]==null ){
    $sumpage[$tmp[1]]=1; 
    }
               $str = fgets($file);


  }

with this i chechked before this index is valued or not, but no change,i mexed up with this,

big problem is when left number reapeat in right side for first time,the code assume its new unique number,and if left number appear more than one time in right side the assume just one unique number,

the code just can know unique numbers in same side,if its repeat in other side,code assume its new number,

I cant understand

See the comment in the other topic - you need to trim the $tmp elements to get rid of leading and trailing spaces. For an array index, “1” is not the same as "1 ".

1 Like

HI. PROBLEM SOLVED,

I REMOVED THAT SPACE,AND BINGO

Thanks in advance

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.