When does PDO encode special characters as HTML entities?

I was messing around with some simple PHP scripts using vanilla PHP with PDO to interact with MySQL. As I was throwing around some edge cases to see what happens, I noticed a slightly odd thing. If I execute the following SQL query in a PHP script:

$db = new PDO("mysql:host=localhost;dbname=test", /*other db info*/);

$quotesSql = "CREATE TABLE `Quotin'it!'` (num INT(255), `title` VARCHAR(30));";
$quotesStmnt = $db->prepare($quotesSql);
$quotesStmnt->execute([]);

The name of the output table that is created has the single quotes converted to HTML entities:

+----------------------------------+
| Tables_in_test                   |
+----------------------------------+
| Quotin'it!'              |
+----------------------------------+

(ignore the funny syntax highlighting sitepoint is adding. The point is that the table name is Quotin'it!')

Now obviously you are pretty much never going to have table names with special characters like single quotes in them, so I’m not concerned with the usefulness of this. But I am curious from a technical side: why does PDO encode certain characters this way? Is it even PDO that is doing this? Which special characters get encoded and which don’t? (You’ll notice the exclamation point in my example name did not get encoded.)

The reason I think PDO is the one auto-encoding this is because when I run the same statement in my console [mysql> CREATE TABLE `Quotin'it!'`(num INT(255),`title` VARCHAR(30));] the table name created is unencoded!

+----------------------------------+
| Tables_in_autocomplete_items     |
+----------------------------------+
| Quotin'it!'                      |
+----------------------------------+

The backticks protect the table name correctly in that case.

Has anyone else encountered this? Or does anyone have a link for where to find the documentation about this? Google searching & browsing PHP’s website hasn’t helped me.
Thanks!

Those aren’t HTML entities in this case; they’re UTF8 character codes. It’s likely that your PHP installation is configured in such a way that your PDO connection is set up with a different character set, giving the slightly different results you see.

This might not work, but will ensure your connection is using the correct encoding - try adding a charset parameter to your database dsn, like so:

$db = new PDO("mysql:host=localhost;dbname=test;charset=UTF8", /*other db info*/);

I ran your script and did not get any html encoding.
Using php 7.4.6 and Server version: 5.7.30-0ubuntu0.16.04.1 (Ubuntu)
Pretty standard installation.

Why do you assume it was PDO that did it, and not the database engine?
Your database has a default_encoding … how does that default encoding encode an apostrophe?

Thanks! I tried it adding that parameter, but unfortunately as you predicted, it didn’t end up doing anything.

That’s odd that yours didn’t encode! That is on the right track to discovering why it occurs for me, probably. I ran this script on two machines:

  • One had PHP 7.1.11 and MySQL server version: 5.7.20 (Windows)
  • The other was running PHP 5.4.45 and MySQL server version: 5.6.47 (Red Hat Linux)

And both of them displayed the table names as encoded when running that script. None of the software version differences were that far out of sync between you and I, so this behavior is even more strange.

Like I was saying in the OP:

I’ve not heard of this default_encoding configuration before. Where do I change it, and how do I check my current default_encoding? I can’t seem to find any Google search results that mention that parameter name specicially. Are you talking about the MySQL character set configurations described in this article?

I suppose you could try a status:

use testdb;
status;
--------------
mysql  Ver 14.14 Distrib 5.7.30, for Linux (x86_64) using  EditLine wrapper

Connection id:		32
Current database:	symfony
Current user:		ahundiak@localhost
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		5.7.30-0ubuntu0.16.04.1 (Ubuntu)
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	latin1
Db     characterset:	latin1
Client characterset:	utf8
Conn.  characterset:	utf8
UNIX socket:		/var/run/mysqld/mysqld.sock
Uptime:			5 days 8 hours 21 min 14 sec

Ah, thanks!
Status for machine one:

--------------
mysql  Ver 14.14 Distrib 5.7.20, for Win64 (x86_64)

Connection id:          5
Current database:       test
Current user:           root@localhost
SSL:                    Not in use
Using delimiter:        ;
Server version:         5.7.20-log MySQL Community Server (GPL)
Protocol version:       10
Connection:             localhost via TCP/IP
Server characterset:    utf8
Db     characterset:    utf8
Client characterset:    cp850
Conn.  characterset:    cp850
TCP port:               3306
Uptime:                 2 hours 9 sec

Status for machine two:

--------------
mysql  Ver 14.14 Distrib 5.6.47, for Linux (x86_64) using  EditLine wrapper

Connection id:          855733
Current database:       test
Current user:           theUsername@localhost
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server version:         5.6.47-cll-lve MySQL Community Server (GPL)
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    latin1
Db     characterset:    latin1
Client characterset:    latin1
Conn.  characterset:    latin1
UNIX socket:            /var/lib/mysql/mysql.sock
Uptime:                 2 days 7 hours 9 min 24 sec

Interestingly, even though I have utf-8, latin1, and cp850 represented across both of the servers, both of the machines are still encoding the apostrophes. I can try editing the character sets (I’m not sure how to do that) to reflect yours, though I’m not sure if that will make a difference

I thought a bit more about this and I just cannot see any situation where anything to do with the database would ever use html encodings. Just makes no sense at all. I cant see how any sort of character set encoding would involve html at all.

I suspect that somewhere in your original test the output is being passed through an html encoder. Completely independent of the database. Can you conform that all of your testing is being done in a console window? No browser being involved anywhere at all? Not using phpadmin or any of that nonsense?

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.