SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    AdSpeed.com Son Nguyen's Avatar
    Join Date
    Aug 2000
    Location
    Silicon Valley
    Posts
    2,241
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi,

    Anyone knows of a PHP class that get meta tags?
    I know the built-in function get_meta_tags but it doesn't work very well (other people also say that) with various way to make meta tags.

    Any link is very helpful.
    Thanks
    - Son Nguyen
    AdSpeed.com - Ad Serving and Ad Management Made Easy

  2. #2
    SitePoint Evangelist
    Join Date
    May 2000
    Location
    Canada
    Posts
    533
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    ever try using regular expressions on the file yourself?

  3. #3
    AdSpeed.com Son Nguyen's Avatar
    Join Date
    Aug 2000
    Location
    Silicon Valley
    Posts
    2,241
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Oh yeah, I did but it was in Perl and haven't really tested throughoutly.

    Secondly, having a class is much better, just need to use it or apply some fix on it.

    - Son Nguyen
    AdSpeed.com - Ad Serving and Ad Management Made Easy

  4. #4
    SitePoint Enthusiast
    Join Date
    Oct 2001
    Location
    London
    Posts
    26
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    How to get meta-tags using PHP

    I would use something like the following. I took it out of one of my classes which is why you see $this-> everywhere.

    $this->meta_tag can have anything you like in there e.g. "description", or "keywords" or "description|keywords|title" for
    all of them.

    $haystack is any web page.

    // -----------------------------------------------------------------------------------
    // Explanation of the regular expression:
    //
    // <meta # Match the word meta somewhere in the line
    // [^>]+ # Match anything, as long as it's not a > character
    // keywords # until we find the word keywords in the line
    // [^>]+ # and continue matching anything as long as not >// content # until we find the word content, so we know it's correctly formatted
    // \s*=\s* # and then match any amount of spaces before and after the = sign
    // [\"|\'] # but we don't mind if the tag starts with a " or ' character
    // ([^\"]+) # collect anything from now on that isn't a " character
    // [\"|\'] # and we don't mind if the tag closes with a " or ' character.
    // /i # and we don't mind about the case.
    // -----------------------------------------------------------------------------------

    preg_match_all ("/<meta[^>]+(" . $this->meta_tag . ")[^>]+content\s*=\s*[\"|\']([^\"]+)[\"|\']/i", $haystack, $matches, PREG_PATTERN_ORDER);

    for ($j=0; $j < count($matches[2]); $j++) {

    // Strip out any rogue tags, characters and spacing.
    // associative array for dupe checking.
    $matches[2][$j] = strip_tags($matches[2][$j]);

    // Remove any rogue characters
    $matches[2][$j] = trim(ereg_replace("[^a-z0-9\-\+\,\.\ \']", "", $matches[2][$j]));

    // Strip out carriage returns
    $matches[2][$j] = str_replace(chr(13) . chr(10), "", $matches[2][$j]);

    .
    .
    .

    now $matches[2][$j] holds your meta tag from the $haystack.

    }



Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •