Blog Post RSS ?

Blogs » .NET » Working with HTML markup
 

Working with HTML markup


  • Save to
    Del.icio.us

by miseldine

This happened to me recently. I needed a way of highlighting keywords in a chunk of HTML for when users visited the site through Google. In this way, you can help your users locate the information they’ve searched for in Google quickly.

However, a simple string.Replace function wouldn’t cut the mustard. Obviously, it would also replace any mention of a keyword in the HTML markup too, and so would kill links or images.

For example, take the keyword “sitepoint” and I wish to replace it with some HTML “sitepoint“. If my image name had “sitepoint” in it, I’d end up with sitepoint.jpg”>. Not what I want.

So, I hacked together a little function to first remove all the HTML tags in a string, and then replace them once the replacement has been made. I hope it is of some use, fellow readers:

private string highlightText(string text, string keyword, string highlightColour) { //ok strip the tags, but keep them safe System.Collections.ArrayList a = new System.Collections.ArrayList(); string temp = text; //ok, find an < while (temp.IndexOf("<") != -1) { int start = temp.IndexOf("<"); int end = temp.IndexOf(">"); //ok. remove a.Add(temp.Substring(start,end-start+1)); temp = temp.Substring(0,start)+"¬"+temp.Substring(end+1); } //ok. string has no html now string body = temp.Replace(keyword,""+keyword+""); string keyUp = keyword.Substring(0,1).ToUpper()+keyword.Substring(1,keyword.Length-1); if (keyUp != keyword) { body = body.Replace(keyUp,""+keyUp+""); } //right. re-insert the tags while (body.IndexOf("¬") != -1) { int pos = body.IndexOf("¬"); body = body.Remove(pos,1); body = body.Insert(pos,(string)a[0]); a.RemoveAt(0); } return body; }

It’s not optmised, and it isn’t pretty, but tinker and expand upon it at your will :)

This post has 5 responses so far

  1. A shorter way of doing this is with a regular expression replacement tool that supports callbacks - i.e one that lets you find a certain pattern and replace it with the return value of a function that takes the matched pattern. You can see an example of the technique using PHP here:

    http://simon.incutio.com/archive/2003/09/20/pirateCode

    The same technique can also be used in Python and Javascript. I’ve never used .NET but from glancing over the docs it looks like the Regex.Replace(String, MatchEvaluator) method would do the job.

     
  2. Ah yes. You’d need to use the MatchCollection I believe from a regular expression and process it accordingly.

    Regular expressions have always confused me to be honest…they’re so pretty yet ugly at the same time :)

     
  3. I reckon Harry Potter would be into regular expressions. Concoct some obscure incantation, unleash it, and it does something very cool but slightly scary. Definitely a dark art.

     
  4. THank You, We Got the Solution with your Article what we were searching thanks a lot.

     
  5. using System.Text.RegularExpressions; ... public static string RemoveHTML(string in_HTML) { return Server.HtmlDecode(Regex.Replace(in_HTML, "", "")); }

    if not in HTTP Context page, then use the fully qualified reference System.Web.HttpContext.Current.Server.HtmlDecode if this function is in a class file rather than a page, usercontrol etc.

     

Sponsored Links

Leave a response

You are not logged in, log in with your SitePoint Forum username and password.

-OR- Post Anonymously

* Make sure any code samples are escaped (i.e. ‘<b>’ becomes ‘&lt;b&gt;’).

If not logged in, your comments will be placed in a moderation queue. This means your comment may not appear until one of our moderators approves it.

SitePoint Marketplace

Buy and sell Websites, templates, domain names, hosting, graphics and more.

Logo Design, Web page Design and more!

99designs

  • Custom logo designs created ‘just for you’.
  • Pick the design you like best.
  • Only pay if you’re satisfied with the result.

It's Back!
FREE PDF with any printed book!