SitePoint Sponsor

User Tag List

Results 1 to 3 of 3

Hybrid View

  1. #1
    SitePoint Evangelist
    Join Date
    Oct 2000
    Posts
    430
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    High, I'm new to PHP (and all server side scripting), but I was wondering if anyone can help me with this problem:-

    I'm using a PHP script developed by MC4.com which is a simple script to grab elements of any other web page on the net. What I want to use it for is to take headline links from other sites and place them in my site - which then link back to the specific articles on the source site. This works fine if the headline links I'm grabbing are absolute links, but will not work from my site if they are relative links.

    The script is as follows:

    <?
    ########################
    ## Mandatory Setting ##
    ########################

    $GrabURL = "URL to fetch"; //- Complete URL of the page your grabbing from!
    $GrabStart = "Start grab here";
    $GrabEnd = "End grad here";

    #############################
    ## Do Not Edit Below Here ##
    ## Do Not Edit Below Here ##
    #############################
    $file = fopen("$GrabURL", "r");
    $rf = fread($file, 20000);
    $grab = eregi("$GrabStart(.*)$GrabEnd", $rf, $printing);
    // $printing[1] = str_replace("", "", $printing[1]); Un-Comment This Line for "Replace" purposes!
    fclose($file);
    echo $printing[1];
    echo "&nbsp;&nbsp;<font face=Verdana size=1>Script Provided By: <a href=\"http://www.4cm.com/\" target=\"_blank\">www.4cm.com</a></font>";
    ####################
    ## End of Script ##
    ####################
    ?>

    Now my question is:
    How do I modify this script so that it automatically changes the relative links it fetches to absolute links?

    I know this is probably a real challenge - but I'm sure there are a few of you PHP geniuses out there who could probably figure it out.

    I'd be eternally grateful if anyone could figure out a solution.

  2. #2
    Gong!
    Join Date
    May 2000
    Location
    Helsinki, Finland
    Posts
    229
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You could try to match if the grabbed URL includes http:// at the beginning of the url and if not figure out where did you actually grab the url.

    If the grabbed url doesn't include the http-header and after you've know what is the site's actual url (like http://www.domain.com), you could try to add them together.

    Or in other words:

    1. grab url
    2. check if url is absolute: if (substr_count($grabbed_url, "http://" == 1) { ... }
    3. else if it is not absolute, do $new_absolute_url = $grabbed_domain . $grabbed_url;

    Edit: Thou shall not be hasty Perhaps I should read the example code given above more carefully.

    Indeed, you just add another variable $GrabDomain (http://www.domain.com), which is part of $GrabURL, check the existance of http://-header in your grabbed url's, if it isn't there, add the grabbed relative url to the $GrabDomain-variable.
    <Edited by hmahonen on 11-29-2000 at 07:08 AM>

  3. #3
    SitePoint Evangelist
    Join Date
    Oct 2000
    Posts
    430
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks - I sort of get what your talking about. I did however find an alternative solution, which I got to work.


    If the URL I'm grabbing from is http://www.thesite.com/news/index.html
    If the link urls are something like /news/article1.html

    The code can be altered like this


    // $printing[1] = str_replace("/news/", "http://www.thesite.com/news/", $printing[1]);


    The seems to work on pretty much everything - where you advising to do the same thing?

    The only problem with this (small I must admit) is that this piece of code needs to be altered for each site.

    I don't suppose there is anyway possible of having a piece of code which does this automatically? - Maybe I'm being greedy!


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •