SitePoint Sponsor

User Tag List

Results 1 to 1 of 1
  1. #1
    SitePoint Wizard
    Join Date
    May 2003
    Location
    Berlin, Germany
    Posts
    1,829
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Loading Microsoft.com via the XML Dom Extension

    Hi all, I am building a small application analyse the links of a given webpage - determining external and internal links.

    When I am trying to load microsoft.com via the DOM XML extension, the encoding seems to be wrong, because I am only getting very weird characters. Could you please help me?

    PHP Code:
    function bbGetPageLinks2($url) {
        
    $f LL_Admin::getPageContentOverProxy($url); // retrieve website via fsockopen
        
    $f preg_replace("@<!--.*?-->?@is"' '$f);
        
    $f preg_replace('@&(.*?);@','',$f);
        
    $f preg_replace('@<(.*?)\?php@','',$f);
        
    $f preg_replace('@\?>@','',$f);
        
        
        
    $url parse_url($url);
        
    $url $url['host'];
        
        
    $dom = new DomDocument;
        
    $dom->preserveWhiteSpace false;
        
    //$f = mb_convert_encoding($f, 'HTML-ENTITIES', "UTF-16"); 
        //$f = utf8_decode($f);
        //$f = utf8_encode($f);
        
    $dom->loadHTML($f);

        if(
    strpos($url,'microsoft') !== false)
            echo 
    $dom->saveHTML();


    ...

    Thanks in advance!
    Last edited by DarkAngelBGE; Jun 6, 2007 at 06:46.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •