PHP and Standards: arg_separator.output
PHP’s configuration directive arg_separator.output allows you to tell PHP how it should separate arguments in a URL and has a default value of ‘&’.
The directive affects all URLs that are generated or modified automatically by PHP. The only time this is likely to affect us is when we use PHP Session Handling along with session.use-trans-sid to auto-generate URLs with session IDs. So, if you don’t use this, the following problem may not affect you.
Theoretically, if your web application preferred separating arguments in a URL with another character, such as ‘;’, then you could tell PHP to use that character instead:
However, the following note in the PHP manual’s Session Handling section indicates a problem.
Note: The arg_separator.output php.ini directive allows to customize the argument seperator. For full XHTML conformance, specify & there.
This seems very strange. Firstly, the issue has nothing to do with XHTML conformance. A quick glace at the HTML 4.01 Specification, the HTML 3.2 Specification or even this Introduction to SGML upon which HTML is based should remind you that all occurences of ‘&’ must be escaped (for example, with &), regardless of the version of HTML or XHTML in use. I believe that the myth that this is an XHTML issue only may be due to the fact that validating’s one markup has only become trendy at about the same time as using XHTML has.
Secondly, the description given of the arg_separator.output directive given in the PHP manual indicates that the separator will be used in the URL:
arg_separator.output string The separator used in PHP generated URLs to separate arguments.
PHP has glossed over the distinction between a URL, and a URL represented within an HTML attribute value. In the second case, a small selection of characters (& and ‘ or “) must be escaped. We ought to be able to set arg_separator.output to ‘&’ and PHP should escape this appropriately whenever it uses it as an attribute value in HTML (or XHTML, for that matter).
Sure enough, when using a value of ‘&’, PHP wrongly turns your links into something like:
This is PHP’s default behaviour, and it is incorrect in all versions of both HTML and XHTML.
A comment in the PHP demonstrates the confusion this has caused:
arg_separator.output set to “&” is bad when you want to work with xhtml. Xhtml requires & instead to be written out. This for example prevents validation of xhtml using php sessions. I hope the default value will be changed
The implied difference beteen XHTML and HTML here is incorrect. The requirement for ampersands in attribute values to be escaped applies equally to HTML (based on SGML) and XHTML (based on XML). Also, the proposed solution of fixing this by setting the default value to ‘&’ is inelegant. Ideally, this value should be ‘&’ and PHP should realise that when it is dealing with attribute values, it needs to convert ‘&’ inself.
If you use PHP’s session handling and session.use_trans_sid to auto-generate URLs with session IDs, for now you can set arg_separator.output to ‘&’ in your PHP configuration in order to remain well-formed in HTML or XHTML.
Lachlan pointed out to me that changing this value will also affect http_build_query, which is used to generate raw URLs, which should not contain HTML entities. So if you do set arg_separator.output to ‘&’ to work around this problem, avoid using http_build_query, and vice versa.
The most elegant solution would be to insert your own session IDs in URLs and avoid session.use_trans_sid completely, because session.use_trans_sid is voodoo.