  #1
    SitePoint Evangelist
    Join Date
    Dec 2004
    0 Post(s)
    0 Thread(s)

    Find/replace query to tidy up URL data

    I have a table with "news" in it. Some of the content contains links to other sites in the form of html markup. I have extracted the data into a text file, and I want to tidy it up so that it validates as xhtml strict.

    I would like to be able to do an SQL to pull out the fldBody field. Then loop through the recordset, and use a regular expression to do some kind of find/replace routine to tidy up the HTML.

    I know this is asking a lot!

    So, if I had this for example, in the fldBody field

    Here is some news about <a href=>google</a> which I read today

    It'd be good to find a way to convert it to:

    Here is some news about <a href="">google</a> which I read today

    I have no idea if this can be done, and I am totally useless at Regular Expressions. I've spent ages looking at info about them on the web, but cannot work out how they work.

    Any ideas would be much appreciated.



  #2
    hgilbert
    Join Date
    Dec 2004
    0 Post(s)
    0 Thread(s)
    That is going to be virtually impossible.

    If pushing to achieve the impossible in ASP.
    You would have to use components and have access to a dedicate server.
    You can then use a combination of third party tools to tidy up things.

    HTMLTidy is one of them and its open source.
    There are probably others.

    If the problem is only links (ie <a href=> -> <a href="")
    its easier to write down a setup of rules.
    It is probably easier to do a series of tag inspections + VB search/replace
    then using Regular Expression which I am not sure exists in Classical ASP anyway.


