Regex pattern to strip HTML comments but leave conditonals

I’m using regular expressions to search the document and strip out all comments but leave in the conditionals.

function callback($stripComments)
{
    return preg_replace('/<!--(.|\\s)*?-->/', '', $stripComments);
}

ob_start("callback");

However I’ve started using a rather unique conditional to serve a stylesheet up to every other browser apart from IE:

<!--[if !lte IE 6]><!-->
	<link rel="stylesheet" href="/css/style.css" />
<!--<![endif]-->

Notice the extra

<!-->

.

My regex isn’t too hot and I’m struggling to change the regular expression to match this extra condition.

Can anyone suggest a fix for that extra condition?

There is the downlevel-revealed syntax for [url=“http://msdn.microsoft.com/en-us/library/ms537512(VS.85).aspx”]conditional comments makes it visible to all web browsers except for IE, where it doesn’t match the condition. You really should use it instead.


<![if !lte IE 6]>
	<link rel="stylesheet" href="/css/style.css" />
<![endif]>

That might make it easier to achieve your end result too.

maybe you can update some of you code, in order to analyze it clearly

That means you then have garbage tags in all browsers except for IE. This can result in the web page being totally stuffed in some browsers - particularly in Opera.

Those downlevel-revealed variants are only supposed to be used when you know that the file is only going to be used by Microsoft products. Where any non-Microsoft program needs to be able to read the file you MUST convert them into comments so as to not get the other program confused.

A better way for the OP to resolve their problem is to only use the conditional comments around <link> tags in the head of the page and use IE specific stylesheets to apply all the rest of the desired changes to the page. That way all the comment tags in the body of the page can be easily removed.

@pmw57 downlevel-revealed comments are something I wasn’t aware of and they do sound interesting but I don’t want to risk the display of the site in other browsers as @felgall pointed out.

To re-post what I posted earlier before it was apparently edited by someone…

@vinnz21 Update some of my code? It is completely up to date and I don’t understand what you mean by this. Can you elaborate?

@felgall Thanks for the warning regarding downlevel-revealed comments. I don’t want to risk my site not rendering correctly in other browsers but I dont’ fully understand what you mean by your suggestion. I am using and only use conditional comments around the link elements to target styles as IE. The reason I am wrapping the mains stylesheet in a conditional comment is to target everything except IE6. I have dropped support for IE6 in my client websites so now just feed them a universal IE6 stylesheet. For further information and examples please refer to the Google hosted page of the Universal IE6 stylesheet.

I simply require an adjustment to my regular expression or PHP function which will prevent that additional and specific conditional comment from being stripped. Or any alternative suggestion which will prevent this.

To target everything except IE6 you don’t need conditional comments at all. All you do is place the IE6 specific stylesheet inside conditional comments AFTER the other one.

The most complex that you ever need to get with conditional comments is when you need to target both IE6 and IE7 separately and then your code could be:

<link rel="stylesheet" href="/css/main.css" />
<!--[if eq IE 7]-->
    <link rel="stylesheet" href="/css/ie7.css" />
<!--[endif]-->
<!--[if lte IE 6]-->
    <link rel="stylesheet" href="/css/ie6.css" />
<!--[endif]-->

If you code it that way then to get rid of all comments without touching those you’d run the comment removal first on the section from the start of the file up to the first <link and then from </head through to the end of the file

Just one other point is that there is no record of anyone editing your prior post at all (I’d be able to see a message as to when it was edited if it had been as well as who last edited it).

Yes, indeed I used to do it exactly the way you described when I was still supporting IE6 and yes my regex worked perfectly well removing comments without removing the conditionals then.

However, the Universal IE6 provides plain text on a background styling which makes the site both legible and usable so although I’m not officially supporting IE6 in terms of styling it to look the same I’m not completely ignoring uses who may be stuck with IE6 for whatever reason. They can still use the site perfectly well to retrieve the information but they just miss out on the overall design ‘experience’.

CSS, as I’m sure you know, cascades. Hence Cascading Style Sheets. This means that if IE6 has already picked up any styling information from the main site stylesheet then it will add to this. The IE6 stylesheet doesn’t override it. So this makes a jumbled mess of the Universal IE6 stylesheet. And that’s the reason I need to wrap the main stylesheet in a ‘If NOT IE6’ conditional comment. It needs to start from a clean slate. It just so happens that this conditional comment doesn’t follow the same format as the other conditional comments and therefore my regex or the function needs adjusting. And that’s the reason for my post.

Do you follow me?

I didn’t test this against anything other than the conditional you posted, but it seems to work here.

<?php

$html = '<!-- a comment --><!--[if !lte IE 6]><!-->
	<link rel="stylesheet" href="/css/style.css" />
<!--<![endif]--><!-- a comment -->';

echo preg_replace('#<!--[^\\[<>].*?(?<!!)-->#s', '', $html), PHP_EOL;

?>

@joebert That’s it! I’ve given it a test with all comments and conditionals in my base template that I’m likely to use and it seems to pass the test. Much appreciated. I really need to brush up on my regular expressions. :stuck_out_tongue:

Although slightly off-topic is there a way that you’re aware of which will trim the whitespace from the document too? By this I mean the new lines, carriage returns, and tabs (/n,/r,/t) to close up the gabs left by the comments and compact the html output a little? I don’t want full on compression but just want to tighten things up a little to reduce overall filesize by cutting out wasted space.

Thanks for your help.

You can tack on optional whitespace meta-characters to the beginning and end of the pattern. “\s” will match the characters you mentioned as well as a literal space and a form feed (whatever that is). Inversely, if you ever want to match non-whitespace characters you can use an uppercase S “\S”.

#\\s*<!--[^\\[<>].*?(?<!!)-->\\s*#s

You sir, are a legend! Spot on again. That’s just what I needed and a helpful hand into some useful regular expressions which I wasn’t aware of.

Many thanks for the help.