SitePoint Sponsor

User Tag List

Results 1 to 18 of 18
  1. #1
    SitePoint Enthusiast
    Join Date
    May 2009
    Location
    Arizona
    Posts
    50
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    htmlspecialchars() outputs null value if accented characters in string

    I've got a head scratcher that I'm not sure how to deal with. I'm using htmlspecialchars to display user inputted variables on a page. On my local server (PHP version 5.1.4) it works just as expected, but on the live site (PHP version 5.2.9), if there are any accented characters in the string, I am getting a null value after running it through the htmlspecialchars function.

    Here is some sample code I've been working with, if it's at all helpful.

    PHP Code:
    $text "canapés";
    echo 
    'My text is '$text;
    $text htmlspecialchars($textENT_QUOTES'UTF-8');
    echo 
    'The encoded text is ' $text
    Output:

    My text is canapésThe encoded text is

    Help?

  2. #2
    Keeper of the SFL StarLion's Avatar
    Join Date
    Feb 2006
    Location
    Atlanta, GA, USA
    Posts
    3,748
    Mentioned
    69 Post(s)
    Tagged
    0 Thread(s)
    Try htmlentities() instead?

  3. #3
    SitePoint Enthusiast dyer85's Avatar
    Join Date
    Nov 2004
    Location
    L2 cache.
    Posts
    46
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm unable to reproduce your problem (PHP 5.3, Apache 2, Windows). However, you shouldn't need to specify the charset, since, in your case, the characters affected by the function are in the same positions as in ISO-8859-1 (htmlspecialchars() docs).

    Did you post all the code?
    "Structure padding is the use of extraneous materials to
    enhance the shape of a struct and make it more attractive
    to members of the opposite struct. (see also 'struct
    silicone.')" -- Eric Sosman

  4. #4
    SitePoint Enthusiast
    Join Date
    May 2009
    Location
    Arizona
    Posts
    50
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yep, that was the whole code that I was using for testing. After the first suggestion to use htmlentities instead (which works fine) I started poking at it some more, and what I've found is that if I don't specify the charset, then it works.

    PHP Code:
    $text 'canapés aren\'t popular';
    echo 
    'My text is '$text;
    $text stripslashes(htmlentities($textENT_QUOTES));
    echo 
    'The htmlentities text is ' $text;

    $text 'canapés aren\'t popular';
    echo 
    'My text is '$text;
    $text htmlspecialchars($textENT_QUOTES);
    $text=str_replace('&','&',$text);
    echo 
    'The encoded text is ' $text
    Output:
    PHP Code:
    My text is canapés aren't popular
    The html entities text is canapés aren& #039;t popular
    My text is canapés aren'
    t popular
    The encoded text is canapés aren
    &# 039;t popular 
    I'm the first to admit that what I know about character encoding would fit on the head of a pin, but I'm particularly concerned about it for this project because the site serves an international audience and needs to be able to accurately reproduce character sets from multiple languages.

    P.S. No matter what I do, the forum software is converting my apostrophes so I stuck spaces in there to try to force it.

  5. #5
    SitePoint Enthusiast dyer85's Avatar
    Join Date
    Nov 2004
    Location
    L2 cache.
    Posts
    46
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm afraid I'm at a loss, since I'm still unable to reproduce the problem. Just out of curiosity, though, what is the output on your live server for the following:
    PHP Code:
    <?php
    header
    ('Content-Type: text/plain; charset=UTF-8');
    var_dump(htmlspecialchars('canapés'ENT_QUOTES'UTF-8'));
    You might also try searching for PHP bugs related to htmlspecialchars().
    "Structure padding is the use of extraneous materials to
    enhance the shape of a struct and make it more attractive
    to members of the opposite struct. (see also 'struct
    silicone.')" -- Eric Sosman

  6. #6
    Keeper of the SFL StarLion's Avatar
    Join Date
    Feb 2006
    Location
    Atlanta, GA, USA
    Posts
    3,748
    Mentioned
    69 Post(s)
    Tagged
    0 Thread(s)
    I cant really explain it either, since neither of the hex-pairs that constitute e-acute are a quote character.... perhaps it's being screwed up by the character being read as e' or e` or some other Quote-Containing phrase? Have you tried parsing the string without the ENT_QUOTE flag?

  7. #7
    SitePoint Enthusiast
    Join Date
    May 2009
    Location
    Arizona
    Posts
    50
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    PHP Code:
    header('Content-Type: text/plain; charset=UTF-8');
    var_dump(htmlspecialchars('canapés'ENT_QUOTES'UTF-8')); 
    Output:
    string(0) ""

    PHP Code:
    header('Content-Type: text/plain; charset=UTF-8');
    var_dump(htmlspecialchars('canapés'));
    echo 
    'canapés'
    Outputs:
    string(7) "canap�s"
    canap�s

    So, it appears the accented character is not being interpreted properly. Is it possible the server itself can't handle UTF-8 encoding? Do character sets have to be enabled somehow on a server?

    I really appreciate your help on this so far.

  8. #8
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Have you checked what headers the server is sending to your browser prior the the script output? Do you have an external URL you post/PM ?

    If Apache/IIS is sending conflicting headers, this would produce the problem you're having.
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  9. #9
    SitePoint Enthusiast
    Join Date
    May 2009
    Location
    Arizona
    Posts
    50
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    This is the information from $_SERVER

    Array
    (
    [PATH] => /usr/bin:/bin
    [DOCUMENT_ROOT] => /home/mydomainroot/public_html
    [HTTP_ACCEPT] => text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
    [HTTP_ACCEPT_CHARSET] => ISO-8859-1,utf-8;q=0.7,*;q=0.7
    [HTTP_ACCEPT_ENCODING] => gzip,deflate
    [HTTP_ACCEPT_LANGUAGE] => en-us,en;q=0.5
    [HTTP_CONNECTION] => keep-alive
    [HTTP_HOST] => mydomain.com
    [HTTP_KEEP_ALIVE] => 300
    [HTTP_USER_AGENT] => Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10 (.NET CLR 3.5.30729)
    [REMOTE_ADDR] => xx.xx.xxx.xx
    [REMOTE_PORT] => 1564
    [SCRIPT_FILENAME] => /home/mydomainroot/public_html/info.php
    [SERVER_ADDR] => xx.xxx.xx.xx
    [SERVER_ADMIN] => webmaster@mydomain.com
    [SERVER_NAME] => mydomain.com
    [SERVER_PORT] => 80
    [SERVER_SOFTWARE] => Apache/1.3.41 (Unix) mod_gzip/1.3.26.1a mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2635.SR1.2 mod_ssl/2.8.31 OpenSSL/0.9.8e-fips-rhel5 PHP-CGI/0.5
    [PHPHANDLER] => /usr/local/php52/bin/php
    [GATEWAY_INTERFACE] => CGI/1.1
    [SERVER_PROTOCOL] => HTTP/1.1
    [REQUEST_METHOD] => GET
    [QUERY_STRING] =>
    [REQUEST_URI] => /info.php
    [SCRIPT_NAME] => /info.php
    [PHP_SELF] => /info.php
    [REQUEST_TIME] => 1271001105
    [argv] => Array
    (
    )

    [argc] => 0
    )

  10. #10
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Oh, I think you've misunderstood me. You need to look at the HTTP headers sent by Apache when you request the script/page in question.

    You can quite easily do this using Firefox and the LiveHTTPHeaders extension.

    For instance, here's my request for www.google.co.uk .

    Code:
    GET / HTTP/1.1 
    Host: www.google.co.uk 
    User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100214 Linux Mint/8 (Helena) Firefox/3.5.8 
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 
    Accept-Language: en-gb,en;q=0.5 
    Accept-Encoding: gzip,deflate 
    Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 
    Keep-Alive: 300 
    Connection: keep-alive 
    
    HTTP/1.0 200 OK 
    Date: Sun, 11 Apr 2010 16:47:03 GMT 
    Expires: -1 
    Cache-Control: private, max-age=0 
    Content-Type: text/html; charset=UTF-8 
    Content-Encoding: gzip 
    Server: gws 
    Content-Length: 4515 
    X-Cache: MISS from Zeus 
    X-Cache-Lookup: MISS from Zeus:3128 
    Via: 1.0 Zeus:3128 (squid/2.7.STABLE3) 
    Connection: keep-alive
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  11. #11
    SitePoint Enthusiast
    Join Date
    May 2009
    Location
    Arizona
    Posts
    50
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ah, gotcha.

    Here's the info using the LiveHTTPHeaders extension.

    GET /info.php HTTP/1.1
    Host: mydomain.com
    User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10 (.NET CLR 3.5.30729)
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
    Accept-Language: en-us,en;q=0.5
    Accept-Encoding: gzip,deflate
    Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
    Keep-Alive: 300
    Connection: keep-alive
    Cache-Control: max-age=0

    HTTP/1.1 200 OK
    Date: Sun, 11 Apr 2010 17:15:45 GMT
    Server: Apache/1.3.41 (Unix) mod_gzip/1.3.26.1a mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2635.SR1.2 mod_ssl/2.8.31 OpenSSL/0.9.8e-fips-rhel5 PHP-CGI/0.5
    X-Powered-By: PHP/5.2.9
    Connection: close
    Transfer-Encoding: chunked
    Content-Type: text/html

  12. #12
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Great, can you throw the following in a stand-alone script and post both the headers and output?
    PHP Code:
    <?php
    header
    ('Content-Type: text/plain; charset=UTF-8');
    echo 
    htmlspecialchars("Anthony's canapés aren't popular at all, in fact, they suck."ENT_QUOTES'UTF-8');
    exit;
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  13. #13
    SitePoint Enthusiast
    Join Date
    May 2009
    Location
    Arizona
    Posts
    50
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    There is no output, either in the browser or by viewing source. Here are the headers:

    GET /info.php HTTP/1.1
    Host: mydomain.com
    User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10 (.NET CLR 3.5.30729)
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
    Accept-Language: en-us,en;q=0.5
    Accept-Encoding: gzip,deflate
    Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
    Keep-Alive: 300
    Connection: keep-alive

    HTTP/1.1 200 OK
    Date: Sun, 11 Apr 2010 19:18:35 GMT
    Server: Apache/1.3.41 (Unix) mod_gzip/1.3.26.1a mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2635.SR1.2 mod_ssl/2.8.31 OpenSSL/0.9.8e-fips-rhel5 PHP-CGI/0.5
    X-Powered-By: PHP/5.2.9
    Connection: close
    Transfer-Encoding: chunked
    Content-Type: text/plain; charset=UTF-8

  14. #14
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Interesting, there's a bug listed which may apply.

    http://bugs.php.net/bug.php?id=43896

    I'll come back to you.
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  15. #15
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Try this....
    PHP Code:
    <?php
    header
    ('Content-Type: text/plain; charset=UTF-8');
    echo 
    htmlspecialchars(
        
    utf8_encode("Anthony's canapés aren't popular at all, in fact, they suck."),
        
    ENT_QUOTES ENT_COMPAT,
        
    'UTF-8'
    );
    exit;
    If it works, your PHP script is saved by your editor as ISO-8859-1.
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  16. #16
    SitePoint Enthusiast
    Join Date
    May 2009
    Location
    Arizona
    Posts
    50
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I get this for output:

    Browser:
    Anthony&#039;s canap&#233;s aren&#039;t popular at all, in fact, they suck.

    Source code:
    Anthony&#039;s canap&#233;s aren&#039;t popular at all, in fact, they suck.

  17. #17
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Yay! Progress!

    So, we're good?
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  18. #18
    SitePoint Enthusiast
    Join Date
    May 2009
    Location
    Arizona
    Posts
    50
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Not quite The forum software converted my apostrophes. The output doesn't convert the encoding for the apostrophes, so they come out as & #039; If I change the header, is that a bad thing to do?

    header('Content-Type: text/html; charset=UTF-8');


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •