One of the subjects I brushed over last week was how you handle UTF-8 in email, because I don’t have a full picture on the best way to solve this. The fundamental problem is summarized nicely on Wikipedia’s discussion of MIME;
The basic Internet e-mail transmission protocol, SMTP, supports only 7-bit ASCII characters […]. This effectively limits Internet e-mail to messages which, when transmitted, include only the characters sufficient for writing a small number of languages, primarily English. Other languages based on the Latin alphabet typically include diacritics not supported in 7-bit ASCII, meaning text in these languages cannot be correctly represented in basic e-mail.
Part of the problem there is there are different approaches to how you can solve this. Do you send the raw UTF-8 text body encoded as quoted printable or Base64? Alternatively you could convert it to UTF-7? Or would it be easier to send the text as HTML, and use html entities for anything non-ASCII? Does that limit the number of clients that can read the mail? What about encoding of the headers like the subject or sender / receiver names (as in iconv_mime_encode())? There’s plenty of gotchas and clearly more than than you get by default with PHP’s mail function (that’s after you fixed your code for email injection). In other words taking a “not invented here” view is going to leave you with a big workload.
From browsing around the source (having been reminded by Jad Madi’s blog to take a look), the good news is it looks like eZ systems have this problem well solved. The ezcMail class, and related classes default to (assume) ASCII but you can explicitly tell it to use UTF-8 (note iconv is required) for subject, recipient, sender and body. In fact it’s a very impressive mail library all round, handling parsing as well as generating, multipart messages and all that.
It looks like you still need to know how to handle the body of the message (which leaves some open questions as to best approach) but headers are mime encoded for you automatically. If I’ve understood ezcMail right, the following (untested) example should illustrate the point, allowing it to be handled by MTA’s which aren’t 8-bit clean;
<?php // ... $mail = new ezcMail(); // There are automatically encoded using iconv_mime_encode $mail->from = new ezcMailAddress( 'email@example.com', 'Hans Gräser', 'UTF-8' ); $mail->addTo( new ezcMailAddress( 'firstname.lastname@example.org', 'Werner Dröge-Modelmog', 'UTF-8' ) ); $mail->subject = 'Ich bin spät...'; // Flag as UTF-8 - will also be automatically encoded with iconv_mime_encode $mail->subjectCharset = 'UTF-8'; // Encode the UTF-8 as base64 for 7 bit MTAs $mail->body = new ezcMailText( base64_encode('...noch 30 minuten bis Zürich'), 'UTF-8', ezcMail::BASE64 ); // Send the mail $transport = new ezcMailMtaTransport(); $transport->send( $mail );
Otherwise, a general question, which Patrice raised at the meeting last week, and I’d also be interested to hear opinions on: how much is the 7-bit limitation of SMTP still a problem today? Is 8BITMIME now so widely used that we can stop worrying about it?