Yet another take on this...
Code:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>•• toXHTML ••</title>
<style type="text/css">
body {
background: buttonface;
}
form {
width: 90%;
font: 11px monospace;
margin: 40px auto;
}
.tbox {
width: 100%;
height: 200px;
font: 12px monospace;
padding: 4px;
border: 1px #000 solid;
</style>
</head>
<body>
<script type="text/javascript">
String.prototype.H2X =
[
new RegExp().compile(/[< ]+([^= ]+)/gi),
new RegExp().compile(/(\S*\s*=\s*)?_moz[^=>]*(=\s*[^>]*)?/gi),
new RegExp().compile(/\s*=\s*(['"])?(([^>" ]| (?=[^"=]+['"]))+)\1?/gi),
new RegExp().compile(/\/>/),
new RegExp().compile(/<(br|hr|img|input|link|meta)([^>]*)>/gi),
new RegExp().compile(/(checked|compact|declare|defer|disabled|ismap|multiple|no(href|resize|shade|wrap)|readonly|selected)/gi),
new RegExp().compile(/(="[^']*)'([^'"]*")/),
new RegExp().compile(/&(?=[^<]*>)/g),
new RegExp().compile(/<\s+/g),
new RegExp().compile(/\s+(\/)?>/g),
new RegExp().compile(/\s{2,}/g)
]
String.prototype.toXHTML = function()
{
return this.replace(this.H2X[0], function($1){return $1.toLowerCase();}).
replace(this.H2X[1], ' ').replace(this.H2X[2], '="$2"').replace(this.H2X[3], '>').
replace(this.H2X[4], '<$1$2 />').replace(this.H2X[5], '$1="$1"').replace(this.H2X[6], '$1$2').
replace(this.H2X[7], '&').replace(this.H2X[8], '<').replace(this.H2X[9], '$1>').replace(this.H2X[10], ' ');
}
</script>
<form>
1) → enter HTML ↓
<textarea class="tbox" onblur="readout.value=this.value.toXHTML()"></textarea>
2) → click below ↓
<textarea class="tbox" name="readout"></textarea>
</form>
</body>
</html>
A little more OO, I think. Lost that callback, by the simple expedient of pre-stripping any slashes from closing brackets first. Added some cleanup.
Are you sure using .compile() makes a difference? All the documentation I could find would indicate that regular expression literals are compiled at load-time, and only RegExp objects evaluated from strings need be compiled to run more efficiently. Obviously wrong, based on your tests.
As always, feedback welcome. Every time I dump real-world tagsoup in there, I discover something else unexpected happening. cheers /adios
Bookmarks