Regular Expressions: Remove everything between <script> tags?

Hi folks,

Yeah, this is a lame question, but I figured that someone here might have a quick answer.

I need a regular expression to match a string of characters between <script> tags, including the <script> tags. For some reason the solution isn’t coming immediately to mind.

Anyone have any regexp suggestions?


Well, I dug a little deeper on Google and found this:


Thanks to this page.

if you wanted to “match a string of characters between <script> tags, including the <script> tags”:

my $string = 'some text here<script language="javascript">script text here</script>some text here';
$string =~ /((<[\\s\\/]*script\\b[^>]*>)([^>]*)(<\\/script>))/i;
print qq~\\$1 = $1
\\$2 = $2
\\$3 = $3
\\$4 = $4

$1 matches the entire pattern: <script language=“javascript”>some text here</script>

$2 matches the opening tag: <script language=“javascript”>
$3 matches the text between tags: script text here
$4 matches the closing tag: </script>

may need changing if the string is on more than one line.


The script that I’m writing is intended to extract the textual content from a HTML document, leaving only the “important-to-humans” information. For this purpose, I found a great PEAR package (PHP) that does this very well: HTML_Safe. It works better than any combination of the regular expressions that I used.

To get the desired effect, I used this line of code to produce amazing results:

$result = strip_tags(html_entity_decode($safehtml->parse($result)));

… where $result contains the HTML document and $safehtml is an instance of the HTML_Safe class.

It is SO darned cool! :slight_smile:

I was using the above regular expression to remove script tag from html page. but i found its not satisfying all criteria.
I have the below script tag which is not filtered can you suggest some improvement in it

<script type=“text/javascript” language=“javascript”>
// some code here

Can anyone suggest some improvent