SitePoint Sponsor

User Tag List

Results 1 to 9 of 9
  1. #1
    SitePoint Member
    Join Date
    Aug 2004
    Location
    in a house
    Posts
    17
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Remove text from string

    I am trying to remove all the text before the URL and the text after the file name. The result of the following code is an output of the original input .txt file with nothing removed from the lines.

    If I remove the unwanted text before processing the file I am able to break apart the URL perfectly. I want to be able to remove the unwanted text within the PHP page.

    Here is the format of the log I am working with.
    Code:
    07/01/2004.07:05:44a [Nick] http://www.mydomain.com/filename.swf this file rocks
    What I have tryd so far with parse_url is not working for me.
    PHP Code:
    <?php
    $ipffile 
    "buffer.txt";
    $opfile "buffer_clean.txt";
    //
    // open the url to be read
    $fd fopen($ipffile,"r") or die ( "Server is not responding...." ) ; 
    $fp fopen($opfile,"w+");

    while (!
    feof ($fd)) {
    $url_parts parse_url($url);
        if(!empty(
    $url)) { 
            
    $scheme $url_parts['scheme'];
            
    $user $url_parts['user'];
            
    $pass $url_parts['pass'];
            
    $host $url_parts['host'];
            
    $port $url_parts['port'];
            
    $path $url_parts['path'];
            
    $fragment $url_parts['fragment'];
            
    $query $url_parts['query'];

    // Break the link into its parts and finish processing


    }
    glue_url ($url_parts);
    // write the contents of the url to the file 
    // read the file and just rewrites it out to the clean.txt file
    fwrite ($fp$url);
    ?>

  2. #2
    SitePoint Guru
    Join Date
    Jul 2004
    Location
    Raleigh, NC
    Posts
    783
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    perhaps i'm misunderstanding the problem, but if you want to remove everything before and after the url, why don't you just use a regex to retain only the url? please correct me if i've misinterpretted your problem

  3. #3
    SitePoint Wizard Dangermouse's Avatar
    Join Date
    Oct 2003
    Posts
    1,024
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If the spaces are constant, you could do this for example:

    PHP Code:
    $str "07/01/2004.07:05:44a [Nick] [url]http://www.mydomain.com/filename.swf[/url] this file rocks";
     list(
    $date$name$url) = explode(" "$str);
     echo 
    $url

  4. #4
    SitePoint Member
    Join Date
    Aug 2004
    Location
    in a house
    Posts
    17
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Dangermouse
    If the spaces are constant, you could do this for example:

    PHP Code:
    $str "07/01/2004.07:05:44a [Nick] [url]http://www.mydomain.com/filename.swf[/url] this file rocks";
     list(
    $date$name$url) = explode(" "$str);
     echo 
    $url
    Thank you for the help, got it figured out. I just needed a push in the right direction.
    I worked with your list script and combined it with Darchangel's suggestion to "just use a regex to retain only the url? "

    The problem ended up being how I was opening the file for reading. It was being read as one long string. I found this in another post and it worked. The $cURL = fread($handle, filesize($ipffile)); is what fixed the problem.
    PHP Code:
    // get contents of a file into a string
    //$filename = "buffer.txt";
    $handle fopen($ipffile"r");
    $cURL fread($handlefilesize($ipffile));


    $cCleaned preg_replace("/^http:\/\//"""$cURL);

    print 
    "$cURL<br>$cCleaned<p>";

    foreach(
    preg_split("/\//"$cCleaned) as $cTmp) {
    print 
    "Do something with $cTmp<br>";

    I also found a post from wwb_99 for a class PostParser(), http://www.sitepoint.com/forums/show...t=preg_replace that gave me a function to make MakeHyperlinks($text) from the data.

    PHP Code:
    // open the url to be read
    $fd fopen($ipffile,"r") or die ( "Server is not responding...." ) ; 
    $fp fopen($opfile,"w+");

    // get contents of a file into a string
    //$filename = "buffer.txt";
    $handle fopen($ipffile"r");
    $cURL fread($handlefilesize($ipffile));

    include 
    'PostParser.php';
    $cCleaned preg_replace('/(\r\n|\n|\r)/'"<br />\n",  $cURL);

    foreach(
    preg_split("/http:\/\//"$cCleaned) as $url)

        {

    //$url_parts = parse_url($url);
        
    if(!empty($url)) { 

    $data=$url
    $pp=new PostParser($data); 
     
    echo 
    $pp->ToHTML();
        } 

  5. #5
    eschew sesquipedalians silver trophy sweatje's Avatar
    Join Date
    Jun 2003
    Location
    Iowa, USA
    Posts
    3,749
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    There is already a function to do:
    PHP Code:
    $cCleaned preg_replace('/(\r\n|\n|\r)/'"<br />\n",  $cURL);
    //instead use
    $cCleaned nl2br($cURL); 
    Also, you should not choose a regex delimiter that is a character you are going to use in the expression:
    PHP Code:
    foreach(preg_split("/http:\/\//"$cCleaned) as $url)
    //could be
    foreach(preg_split('~http://~'$cCleaned) as $url
    it is much cleaner to read without all the escaping.
    Jason Sweat ZCE - jsweat_php@yahoo.com
    Book: PHP Patterns
    Good Stuff: SimpleTest PHPUnit FireFox ADOdb YUI
    Detestable (adjective): software that isn't testable.

  6. #6
    SitePoint Member
    Join Date
    Aug 2004
    Location
    in a house
    Posts
    17
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by sweatje
    There is already a function to do:
    PHP Code:
    $cCleaned preg_replace('/(\r\n|\n|\r)/'"<br />\n",  $cURL);
    //instead use
    $cCleaned nl2br($cURL); 
    Also, you should not choose a regex delimiter that is a character you are going to use in the expression:
    PHP Code:
    foreach(preg_split("/http:\/\//"$cCleaned) as $url)
    //could be
    foreach(preg_split('~http://~'$cCleaned) as $url
    it is much cleaner to read without all the escaping.
    Sweatje, thanks for the help I used the $cCleaned = nl2br($cURL) and it really simplified my work. I ended up removing 50 lines of unneeded code. The output from the file is breaking at the space like I need it to be.


    I now need to remove the lines I don't want to put in the DB.
    Example of the output.
    07/01/2004.07:05:44a
    [withnail]
    http://www.mydomain.com/file/path/filename.txt
    mario
    rocks
    My question is this, would it be easier to write the line into a temp database and then use a query to write only the lines I want to another database. Or should I send it to another text file and process it again to remove the unwanted line.
    I am thinking about the process time involved if the temp database was large like 1K-2K records.

    Thanks, AR Com

  7. #7
    eschew sesquipedalians silver trophy sweatje's Avatar
    Join Date
    Jun 2003
    Location
    Iowa, USA
    Posts
    3,749
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    That sounds like a one shot operation, not something that is being done each request? If so, then it does not really matter what the performance is. Do whatever is easier for you to write

  8. #8
    SitePoint Member
    Join Date
    Aug 2004
    Location
    in a house
    Posts
    17
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by sweatje
    That sounds like a one shot operation, not something that is being done each request? If

    so, then it does not really matter what the performance is. Do whatever is easier for you to write
    That's the trouble, it's not easy for me to write it because what I was trying was not working.
    After rereading the suggestions, I have used the suggestion from Dangermouse and have accomplised a portion of the task.

    Now the $path variable holds /folder1/folder2/folder3/filename.swf and the $fragment and $query vars are empty.
    Can someone help me in splitting the filename into $query var.

    Here is the current solution I am useing.
    PHP Code:
    <?PHP
    session_start
    ();
    /*************************************** 
    **                                    **
    **        Log_Cleaner.php             **
    **                                    **
    ***************************************/

    include('dbconnect.php');

    //-------------------------------------
    // getting the info from the file into variables
    // --------------------------------------
    // assign file name to a variable
    $file_name "buffer.txt";
    $opfile "buffer_clean3.txt";

    // open the file for reading
    $fd fopen($file_name'r');

    // open the file for writeing
    $fp fopen($opfile,"w+");

    // put the entire file into an array
    $all_lines file($file_name); 

    // count the number of lines in the file and store it in a variable
    $how_many_lines count($all_lines);

    // Take each line and 
    // 1. separate each line into its individual elements (aka fields aka pieces of information)
    // 2. grab the value(s) you are going to test and place them in a variable. Remember the
    //     count for the elements starts at 0 (zero). Also be careful that your variables don't 
    //     contain extraneous parts (like extra spaces, newlines and tabs). Use the trim() function to
    //     clean up.
    // 3. do your evaluation.

    // put the file lines into an array
    for ($i 0$i $how_many_lines$i++) { 
    $all_fields[$i] = explode(" "$all_lines[$i]);

    $date = ($all_fields[$i] [0]);
    $name = ($all_fields[$i] [1]);
    $url = ($all_fields[$i] [2]);

    // write the contents of the url to the screen
    echo $url;
    echo 
    '<BR>';
        

    {
    $url_parts parse_url($url);
    $scheme $url_parts['scheme'];
    $user $url_parts['user'];
    $pass $url_parts['pass'];
    $host $url_parts['host'];
    $port $url_parts['port'];
    $path $url_parts['path'];
    $fragment $url_parts['fragment'];
    $query $url_parts['query'];

    $sql_insert "INSERT IGNORE INTO files (scheme,user,pass,host,port,path,fragment,query) 
    VALUES('
    $scheme', '$user', '$pass', '$host', '$port', '$path', '$fragment', '$query')";
    # RUN THE QUERY 
        
    mysql_query($sql_insert,$db);
        
    mysql_checkerror();
    }

    // write the contents of the url to the file   
    fwrite ($fp$url);fwrite($fp,"\n");
    }

    // close the file connections
    fclose ($fd);fclose ($fp);

    ?>
    Thank you all for your help.

  9. #9
    SitePoint Member
    Join Date
    Aug 2004
    Location
    in a house
    Posts
    17
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Finally got it, disreguard my last post. Thanks all.
    PHP Code:
    // this creates an array called $parts containing the various parts of the URL 
    $parts explode('/'$path); 
    // Grab the last element of the array: "file.zip" 
    $file array_pop($parts);
    echo 
    $file.'<BR>';

    $query $file;
    $fragment $url_parts['fragment']; 


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •