SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    3MTA3
    Join Date
    Jul 2003
    Location
    Florida
    Posts
    1,016
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    extracting anchor text

    I want to extract the anchor text of my backlinks. What's the best way to do this? Regular expressions? I already have a listing of the sites that have links to my site so I only need to know how to extract the anchor text.

    Thanks for any info!

  2. #2
    SitePoint Wizard Mike Borozdin's Avatar
    Join Date
    Oct 2002
    Location
    Edinburgh, UK
    Posts
    1,743
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If anchor text is the thing I'm thinking about - the text after '#' for example somepage.html#1, 1 is anchor text, then use this code:
    PHP Code:
    $matches = array ();
    if ( 
    preg_match_all "/<a href=\"[^<>]*#([^<>]+)\">/i"$text$matches ) ) {
      foreach ( 
    $matches[1] as $key => $val ) {
         print ( 
    $val "<br />" );
      }


  3. #3
    3MTA3
    Join Date
    Jul 2003
    Location
    Florida
    Posts
    1,016
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Sorry for not being clearer. For anchor text, I mean the text between the anchor tags like this:

    Code:
    <a href="http://www.mydomain.com">This is the Anchor Text I Meant</a>

  4. #4
    3MTA3
    Join Date
    Jul 2003
    Location
    Florida
    Posts
    1,016
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ok - so I found some code that retrieves all links from a page and I modified it into a function that extracts the anchor text for a site's backlink. So far it has worked in all my testings.

    If anyone knows a more efficient way than this, then please speak up:

    PHP Code:
    <?php
    // Anchor Text Extractor
    function GetAnchorText($site$recip){

        
    // read file into array
        
    $lines file($recip);
        
        
    // join array elements
        
    $html implode("",$lines);
        
        
    // remove all line breaks
        
    $html str_replace("\n","",$html);
        
        
    // put new line break after anchor tag
        
    $html str_replace("</a>","</a>\n",$html);
        
        
    // split the string into single lines
        
    $lines split("\n",$html);
        
        for(
    $i=0;$i<count($lines);$i++)
        {
            
    // delete everything in front of the anchor tag
            
    $lines[$i] = eregi_replace(".*<a ","<a ",$lines[$i]);
            if ( 
    eregi($site$lines[$i]) ){
                
    $anchortext strstr($lines[$i], '>');
                
    $anchortext substr($anchortext1);
                
    $anchortext substr($anchortext0, -4) . "\n";
                
    $anchortext strip_tags($anchortext);
                echo 
    $anchortext;     
            }
        }

    }

    $site "YourSite.com";  // your url goes here
    $recip "http://www.ThePageWithYourBacklinkOnIt.com"// reciprocal site
    GetAnchorText($site,$recip);
    ?>
    Last edited by devised; Oct 23, 2004 at 03:23.

  5. #5
    SitePoint Wizard Mike Borozdin's Avatar
    Join Date
    Oct 2002
    Location
    Edinburgh, UK
    Posts
    1,743
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Well, if you don't have HTML tags in your anchor text, this will work:
    PHP Code:
    $matches = array ();
    if ( 
    preg_match_all "/<a href=\"[^<>]+\">([^<>]+)<\/a>/i"$text$matches ) ) {
      foreach ( 
    $matches[1] as $key => $val ) {
         print ( 
    $val "<br />" );
      }

    $text is text that contains links.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •