SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    SitePoint Evangelist
    Join Date
    Feb 2004
    Location
    Sofia, Bulgaria
    Posts
    421
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    strpos VS regular expressions

    hi all,
    everyday i check out some IT sites for news.. i check them one by one.. i decided to make a php script to check them automatically and show me the news.. i made the following function:

    // $content - the whole page read from URL
    // $offset - current offset in the $content
    // $max_offset - strlen($content)

    function copy_string($start_str, $end_str) {
    $start_pos = strpos($content, $start_str, $offset);
    if (($start_pos !== false) AND ($start_pos > $offset)) {
    $start_pos += strlen($start_str);
    $end_pos = strpos($content, $end_str, $start_pos);
    if ($end_pos !== false) {
    if ($end_pos < $max_offset) {
    $offset = $end_pos;
    }
    else {
    $offset = $max_offset;
    }
    return substr($content, $start_pos, ($end_pos-$start_pos));
    }
    }
    return 0;
    }

    then i have downloaded some ready-to-use php scripts for the same purpose.. some of them use similar functions, other use ereg/preg_match expressions..
    my questions are:
    1. what is the best way to do this kind of script?
    2. which is the fastest way for files larger than 100K?
    3. advantages/disadvantages of the methods?
    4. optimization suggestions for my function?

  2. #2
    SitePoint Zealot
    Join Date
    Jun 2003
    Location
    hamburg, germany
    Posts
    103
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by dacool
    1. what is the best way to do this kind of script?
    2. which is the fastest way for files larger than 100K?
    3. advantages/disadvantages of the methods?
    4. optimization suggestions for my function?
    1.) preg_match / preg_match_all
    2.) non-php, external application, cronjob
    3.) hmm ...
    4.) caching and use regexp, its MUCH easier

    kai

  3. #3
    Non-Member
    Join Date
    Jan 2004
    Location
    Planet Earth
    Posts
    1,764
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You'll proberly find your requirement over at www.phpclasses.com anyway, so why not register and see what is there ?

  4. #4
    SitePoint Evangelist
    Join Date
    Feb 2004
    Location
    Sofia, Bulgaria
    Posts
    421
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    some tests

    thank you, guys.. i do some tests of my function and preg_match and the results were that my function is about 14-18 times faster than preg_match.. so i decided to use it in my script.. the site for the tests was about 50KB.. the test procedure: 1. get the important content (about 20KB); 2. parse content. the results were:
    my function - 1. 0.0021ms; 2. 0.0003ms (only first occurreance)
    preg_match - 1. 0.0373ms; 2. 0.0041ms (only first occurreance)

    i'll probably do some more tests, when i add other sites to the script.. if somebody has found other tests on that topic, please post a URL here..

  5. #5
    SitePoint Zealot patrikG's Avatar
    Join Date
    Aug 2003
    Location
    Sussex, UK
    Posts
    129
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Alternatively, try RSS - makes life much, much easier and faster. There is a nice article on that at http://www.sitepoint.com/article/get-off-your-rss

  6. #6
    SitePoint Evangelist
    Join Date
    Feb 2004
    Location
    Sofia, Bulgaria
    Posts
    421
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    thanks for the idea, patrikG.. the problem is that the format of the pages i want to get is neither RSS nor XML.. they are in plain HTML..for that reason i need script to parse the content.. besides the format and information is different for different pages and therefor i need script for each page.. but thanks for the post anyway..

    i've got another question for you: can i put my own avatar here? and how, if i can?

  7. #7
    SitePoint Zealot patrikG's Avatar
    Join Date
    Aug 2003
    Location
    Sussex, UK
    Posts
    129
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The beauty of an XML-feed (that RSS is) is that you can easily transform it into anything - a pdf, WML, HTML etc. XML is basically just a clever way of standardising data to be transferred/exchanged.

    Your own avatar? Should be up somewhere under mySitePoint (top left of the page). Could be that you'd have to have a certain postcount to be able to do that...

  8. #8
    SitePoint Evangelist
    Join Date
    Feb 2004
    Location
    Sofia, Bulgaria
    Posts
    421
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    patrikG, i know that for XML but the people who made these sites, i want to get info from, perhaps not

    Off Topic:


    i checked there for avatar, but i can only choose from available.. i can't put my own.. maybe i have to have more posts as you mentioned..


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •