PHP and Standards: arg_separator.output

Tweet

PHP’s configuration directive arg_separator.output allows you to tell PHP how it should separate arguments in a URL and has a default value of ‘&’.

The directive affects all URLs that are generated or modified automatically by PHP. The only time this is likely to affect us is when we use PHP Session Handling along with session.use-trans-sid to auto-generate URLs with session IDs. So, if you don’t use this, the following problem may not affect you.

Theoretically, if your web application preferred separating arguments in a URL with another character, such as ‘;’, then you could tell PHP to use that character instead:


http://www.example.com/url?variable1=value1;variable2=value2

However, the following note in the PHP manual’s Session Handling section indicates a problem.

Note: The arg_separator.output php.ini directive allows to customize the argument seperator. For full XHTML conformance, specify & there.

This seems very strange. Firstly, the issue has nothing to do with XHTML conformance. A quick glace at the HTML 4.01 Specification, the HTML 3.2 Specification or even this Introduction to SGML upon which HTML is based should remind you that all occurences of ‘&’ must be escaped (for example, with &), regardless of the version of HTML or XHTML in use. I believe that the myth that this is an XHTML issue only may be due to the fact that validating’s one markup has only become trendy at about the same time as using XHTML has.

Secondly, the description given of the arg_separator.output directive given in the PHP manual indicates that the separator will be used in the URL:

arg_separator.output string The separator used in PHP generated URLs to separate arguments.

PHP has glossed over the distinction between a URL, and a URL represented within an HTML attribute value. In the second case, a small selection of characters (& and ‘ or “) must be escaped. We ought to be able to set arg_separator.output to ‘&’ and PHP should escape this appropriately whenever it uses it as an attribute value in HTML (or XHTML, for that matter).

Sure enough, when using a value of ‘&’, PHP wrongly turns your links into something like:


Tony

This is PHP’s default behaviour, and it is incorrect in all versions of both HTML and XHTML.

A comment in the PHP demonstrates the confusion this has caused:

arg_separator.output set to “&” is bad when you want to work with xhtml. Xhtml requires & instead to be written out. This for example prevents validation of xhtml using php sessions. I hope the default value will be changed

The implied difference beteen XHTML and HTML here is incorrect. The requirement for ampersands in attribute values to be escaped applies equally to HTML (based on SGML) and XHTML (based on XML). Also, the proposed solution of fixing this by setting the default value to ‘&’ is inelegant. Ideally, this value should be ‘&’ and PHP should realise that when it is dealing with attribute values, it needs to convert ‘&’ inself.

If you use PHP’s session handling and session.use_trans_sid to auto-generate URLs with session IDs, for now you can set arg_separator.output to ‘&’ in your PHP configuration in order to remain well-formed in HTML or XHTML.

Lachlan pointed out to me that changing this value will also affect http_build_query, which is used to generate raw URLs, which should not contain HTML entities. So if you do set arg_separator.output to ‘&’ to work around this problem, avoid using http_build_query, and vice versa.

The most elegant solution would be to insert your own session IDs in URLs and avoid session.use_trans_sid completely, because session.use_trans_sid is voodoo.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://diigital.com cranial-bore

    Wow, I can’t believe how much confusion the & vs & issue has caused. There have been a few instances in the forums recently where people have not understood the difference between an actual URL (where & is correct) and the HTML used to create a link to that URL (where & should be used).
    It is disappointing that even PHP is confused about this.

  • Jim

    > This seems very strange. Firstly, the issue has nothing to do with XHTML conformance. A quick glace at the HTML 4.01 Specification, the HTML 3.2 Specification or even this Introduction to SGML upon which HTML is based should remind you that all occurences of ‘&’ must be escaped (for example, with &), regardless of the version of HTML or XHTML in use.

    This is incorrect. Ampersands only need to be escaped in some circumstances in HTML. They need to be escaped under all circumstances (bar CDATA) in XHTML.

    I don’t know why you think otherwise; the parts of the specifications you point to don’t say anything like that.

    Try it yourself: something like

    This & that.

    is perfectly valid HTML.

    > The most elegant solution would be to insert your own session IDs in URLs and avoid session.use_trans_sid completely, because session.use_trans_sid is voodoo.

    Why not use ‘;’ to completely avoid the escaping issue? PHP supports it, and the W3C recommends it in the part of the HTML 4.01 specification that you link to. When the PHP docs say:

    “For full XHTML conformance, specify & there.”

    …they obviously mean “specify & instead of &”, not that & is necessary for XHTML conformance itself, just that its unescaped value would not be valid.

  • Ren

    Yes, http_build_query() not having an extra optional parameter for the seperator was a big oversight.

    Had a rather heated discussion with the developer of it, and worryingly they thought it was fine.

    So now have another ini setting, along with reg_globals, and magic_quotes, which have to think about, and check for writing portable apps.

    Usually test of arg_seperator.output is an entity, if so, decode and reset it.

  • Jim

    One other thing is that they switched nl2br() from generating HTML to generating XHTML in a minor point release, giving no option to generate valid HTML unless you wrote your own nl2br() to do it manually.

  • Jim

    mmj, what can I say but please read and understand what you quoted. It gives an example of a particular URI that contains particular characters and points out that in these circumstances, the ampersand must be escaped.

    From this, you are deriving “all ampersands must be escaped everywhere”. You are simply wrong. That isn’t what the specification says.

    > I’d also encourage you to try validating some HTML 4 documents using the W3C validator.

    I did that before I posted my comment just to be 100% sure I was remembering the rules correctly. Both the W3C and the htmlhelp.com validators agree with me and not you. By all means, follow your own advice and check this for yourself.

  • http://www.sitepoint.com/ mmj

    Try it yourself: something like

    This & that.

    is perfectly valid HTML.

    Yes it is, but it is not a URL. The blog post was referring to the use of an ampersand in a URL which is part of an HTML attribute. In such cases, the ampersand will never be followed by whitespace, and therefore must always be escaped as specified in the spec linked to. This applies to HTML and XHTML.

  • Jim

    FWIW I was just responding to a different comment posted by mmj that has since been altered. I think it’s pretty bad form to go back and alter what you said after somebody has responded.

  • http://aplosmedia.com/ Eric.Coleman

    I noticed in php5, if you use php.ini-recommeded, it automatically has the proper value set in php.ini

    - Eric

  • http://www.assemblysys.com/dataServices/index.php mniessen

    “The most elegant solution would be to insert your own session IDs in URLs and avoid session.use_trans_sid completely, because session.use_trans_sid is voodoo.”

    I posted a related thread in the forums (http://www.sitepoint.com/forums/showthread.php?t=245037) a few days ago and I still haven’t found any solution to that problem. I thought maybe anyone contributing to this blog might help…

    Thank you.

    Micha

  • Sharif

    I encountered this issue several months ago (http://bugs.php.net/bug.php?id=30049) and decided to finally send a message to the PHP internals mailing list about it (http://news.php.net/php.internals/15568).

    If I wasn’t already in the process of sending it, this post would have no doubt caused me to. In reality, however, Gmail’s controversial text ads picked up on arg_separator.output in the draft message and linked me to this site. This post provided some validation for the issues I raised and I’ve thus linked to it in the message. It might be fruitful to have any further discussion of the issue on PHP internals (http://news.php.net/php.internals, internals@lists.php.net).

  • Meint

    Make your life easier and use ; instead of & as the argument separator. The HTTP 1.1 standard allows this and uses ; in the explanatory parts of the standard (see RfC 2616 paragraph 3.2.1 in combination with RfC 2396).

    Cheers

    Meint