Is there a way to prevent ampersand warnings

That’s not an entity reference, but a numeric character reference (NCR). :slight_smile:

Have you tried using digraphs? It’s a bit awkward, but if it’s only for the occasional character you can live with it. I often set my keyboard layout to US English when I’m coding, and use digraphs in Vim to enter the occasional å, ä and ö in body copy.

Type :help digraphs in Vim for more info.

If you memorise them, you can use Ctrl+V NNN in Vim to enter the literal characters, too. :slight_smile:

That’s not an entity reference, but a numeric character reference (NCR).

Have I been wrong all this time calling them “numerical character entity references”?

Type :help digraphs in Vim for more info.

Thanks for that.

If you memorise them, you can use Ctrl+V NNN in Vim to enter the literal characters, too.

I can for the 3 digit ones, but I’ve still gotten stuck on the ones I only know 4 digits for like euro symbol 8364;

Well, it’s not entirely correct. A character entity is similar to a macro: it’s a symbolic name for a character. An NCR is a numeric reference for a character with the specified code point.

You can use Ctrl+V u XXXX with hex values. E.g., Ctrl+V u 20ac for the ‘€’ character. You still need to memorise them, though. I manage to remember a handful, then I have a small app on localhost to help me with the rest. :slight_smile:
(That one would also be possible to do as an Opera panel.)

Bah. Looks like we’re in need of a single obfuscatory line of Perl : )

I get befuddled by these names too.

As I understand it,

& = hexadecimal character reference

& #38; = numeric character reference


& = entity reference

That works fine unless he has any other encoded entities in the page
eg " –> "
But by the sounds of it, that probably won’t be a problem :rolleyes:

Character references: are numeric or symbolic names for characters that may be included in an HTML document; they begin with a “&” sign and end with a semi-colon (:wink:

Character references in HTML may appear in two forms:

  1. Numeric character references (NCR) either decimal or hexadecimal.
  2. Character entity references.

The easiest way to remember: NCR means “numeric” and that way you don’t need to remember what a ‘Character Entity Reference’ is; since it’s the opposite.

Off Topic:

Look into my eyes, look deep into my eyes… not the hand, look into my eyes… memorise “NCR” to mean Number.

This “mind-trick” may not work on the Jedi Swede though, and he’ll probably try resisting or say I am wrong.

& = hexadecimal numeric character reference
& = decimal numeric character reference
& = character entity reference
%inline; = parameter entity reference (e.g., in a DTD)

Many thanks for the advice and the solutions. All warnings now removed and the page does seem to load that much quicker now that the browser does not have to decide what to do with the ampersands.

I opted to write a script to replace all ampersands where necessary. I did check and display the results before allowing the UPDATE script to operate.

Here is a link to the validation.

Here is the script:

function remove_ampersand()
    $sql 	= 'SELECT id, title FROM jokes ORDER BY id LIMIT 100000;';
    mysql_connect("MY_server", "MY_username", "MY_password");
    $result = mysql_query($sql);
    // iterate all records 
    while ($row = mysql_fetch_object($result))
        // is there an ampersand
        if (strstr($row->title, "&"))// returns FALSE if no ampersand
            $tmp = $row->title;

            // $tmp = str_replace('& ', '& ', $tmp);
            // $tmp =  htmlspecialchars($tmp); 
            $tmp =  htmlentities($tmp, ENT_QUOTES, 'UTF-8');

            // show new string title
            echo '<br />', $row->id;
            echo '<br />', $row->title;
            echo '<br />', $tmp;

            if (FALSE) //TRUE only when safe to replace all
                $sql = "UPDATE jokes SET title='" .$tmp ."' WHERE id='" .$row->id ."'"; 
                $result = mysql_query($sql);
    // die;
    // return $result; // not required

I will endeavour in the future to only save “clean code” to prevent GIGO :slight_smile:


I’m glad that worked. I would have done a mass regex search for &[A-Za-z]+[0-9]*; first, though, just to make sure you don’t have existing entities. But if it works, good for you…

It sounds like you’re very lucky you caught this before it got too large. I can picture myself experiencing the panic of discovery and then spending hours going through the tedium, not a pretty picture.

I have limited knowledge of Regex but realised that just to replace the ampersand would no doubt have caused further problems because of quotes.

I decided to just display the problematic titles and if there was likely to be a problem in the global replace then I manually corrected the warnings.

Once satisfied there would be no repercussions I opened the if(FALSE) test and UPDATED the remainder.

The problem looked far worse than it actually was I am pleased to say. I was lucky and delighted that there were only a couple of dozen problematic titiles.

I was trying to ensure there were no warnings or errors which I firmly believe slow down loading and page display. Where the script is not specific, each browser has to check for every possible alternative and then make a best guess.

The page now loads quite quickly considering there are over 1,700 titles, each with links, mostly hidden and revealed by jQuery, without having to re-display the complete page after a single selection.

I am thorougly impressed with jQuery and will try to use it more often.