SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Member HuevosRancheros's Avatar
    Join Date
    Mar 2004
    Location
    Santa Monica, CA
    Posts
    19
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question MS SQL : select with regular expressions

    My specific need is this:

    I have a bunch of data stored as ntext that people have submitted via a content management system. Many of the submitters put a bunch of white space in their articles which show up in html as:

    Code:
    <p>&nbsp;</p>
    It's easy enough to search for this string using a

    Code:
    select like '%<p>&nbsp;</p>%'
    But what I want to do is only replace this string if it's followed by a line return character "\n?" and another "<p>". Make sense?

    In other words, how are line feeds represented in ntext data and then how can I search for occurrences of a line feed in a query?

  2. #2
    SitePoint Addict StephenBauer's Avatar
    Join Date
    Apr 2004
    Location
    USA
    Posts
    263
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yeah, that is a toughy. I know in T-SQL (not sure if some of these are SQL standard - like the ranges may not be), you can use:

    % for wildcard matching zero or more characters
    _ for wildcard matching a single character

    [a-f] for matching a single character in the given range
    [^a-f] for matching a single character not in the given range

    (the range being "a" through "f" in those examples)

    You can also use the "ESCAPE" keyword to specify cetain characters.

    Also, you can combine these such as: F%RED%99_9

    A more powerful feature would be to use the Windows scripting regex support. These objects can be accessed within T-SQL via the "object" stored procs:

    Code:
    sp_OACreate
    sp_OASetProperty
    sp_OAGetProperty
    sp_OAMethod
    sp_OADestroy
    sp_OAGetErrorInfo
    sp_OAStop
    Here is an example user-defined function (from the net) that allows regex searching (but no replace functionality) and it works:

    Code:
    CREATE FUNCTION dbo.find_regular_expression
    	(
    		@source varchar(5000),
    		@regexp varchar(1000),
    		@ignorecase bit = 0
    	)
    RETURNS bit
    AS
    	BEGIN
    		DECLARE @hr integer
    		DECLARE @objRegExp integer
    		DECLARE @objMatches integer
    		DECLARE @objMatch integer
    		DECLARE @count integer
    		DECLARE @results bit
    		
    		EXEC @hr = sp_OACreate 'VBScript.RegExp', @objRegExp OUTPUT
    		IF @hr <> 0 BEGIN
    			SET @results = 0
    			RETURN @results
    		END
    		EXEC @hr = sp_OASetProperty @objRegExp, 'Pattern', @regexp
    		IF @hr <> 0 BEGIN
    			SET @results = 0
    			RETURN @results
    		END
    		EXEC @hr = sp_OASetProperty @objRegExp, 'Global', false
    		IF @hr <> 0 BEGIN
    			SET @results = 0
    			RETURN @results
    		END
    		EXEC @hr = sp_OASetProperty @objRegExp, 'IgnoreCase', @ignorecase
    		IF @hr <> 0 BEGIN
    			SET @results = 0
    			RETURN @results
    		END	
    		EXEC @hr = sp_OAMethod @objRegExp, 'Test', @results OUTPUT, @source
    		IF @hr <> 0 BEGIN
    			SET @results = 0
    			RETURN @results
    		END
    		EXEC @hr = sp_OADestroy @objRegExp
    		IF @hr <> 0 BEGIN
    			SET @results = 0
    			RETURN @results
    		END
    	RETURN @results
    	END
    The "source" is the string to search, the "regexp" is the regex regular expression to search for, and the "ignorecase" is just that, ignore case sensitivity (true|false).

    The function will return a "1" if found, otherwise I believe a "0" if not.

    What you can then do is if the regex is found, manipulate that string with a similar regex function. You could even use just a regular old "REPLACE" funtion if you can temporarily convert the 'text' to 'varchar(8000)'...on larger entries you will have to get creative.

    For more info on Windows scripting regex:
    http://msdn.microsoft.com/library/en...asp?frame=true

    If you do develop your own "replace" regex function, be sure to post it here!

    Quote Originally Posted by HuevosRancheros
    My specific need is this:

    I have a bunch of data stored as ntext that people have submitted via a content management system. Many of the submitters put a bunch of white space in their articles which show up in html as:

    Code:
    <p>&nbsp;</p>
    It's easy enough to search for this string using a

    Code:
    select like '%<p>&nbsp;</p>%'
    But what I want to do is only replace this string if it's followed by a line return character "\n?" and another "<p>". Make sense?

    In other words, how are line feeds represented in ntext data and then how can I search for occurrences of a line feed in a query?
    Last edited by StephenBauer; May 26, 2004 at 11:22.

  3. #3
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,017
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    ... where ntextfld like '%<p>&nbsp;</p>' + chr(13) + '<p>%'

    at least, i think it's chr(13) -- look it up before you use this, okay?
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  4. #4
    SitePoint Member HuevosRancheros's Avatar
    Join Date
    Mar 2004
    Location
    Santa Monica, CA
    Posts
    19
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for that regexp script! That'll come in handy for sure.

    Now to figure out how to replace...



    Quote Originally Posted by StephenBauer
    Yeah, that is a toughy. I know in T-SQL (not sure if some of these are SQL standard like the ranges), you can use:

    % for wildcard matching zero or more characters
    _ for wildcard matching a single character

    [a-f] for matching a single character in the given range
    [^a-f] for matching a single character not in the given range

    (the range being "a" through "f" in those examples)

    You can also use the "ESCAPE" keyword to specify untypeable characters.

    Also, you can combine these such as: F%RED%99_9

    A more powerful feature would be to use the Windows scripting regex support. These objects can be accessed within T-SQL via the "object" stored procs:

    Code:
    sp_OACreate
    sp_OASetProperty
    sp_OAGetProperty
    sp_OAMethod
    sp_OADestroy
    sp_OAGetErrorInfo
    sp_OAStop
    Here is an example user-defined function (from the net) that allows regex searching (but no replace functionality) and it works:

    Code:
    CREATE FUNCTION dbo.find_regular_expression
    	(
    		@source varchar(5000),
    		@regexp varchar(1000),
    		@ignorecase bit = 0
    	)
    RETURNS bit
    AS
    	BEGIN
    		DECLARE @hr integer
    		DECLARE @objRegExp integer
    		DECLARE @objMatches integer
    		DECLARE @objMatch integer
    		DECLARE @count integer
    		DECLARE @results bit
    		
    		EXEC @hr = sp_OACreate 'VBScript.RegExp', @objRegExp OUTPUT
    		IF @hr <> 0 BEGIN
    			SET @results = 0
    			RETURN @results
    		END
    		EXEC @hr = sp_OASetProperty @objRegExp, 'Pattern', @regexp
    		IF @hr <> 0 BEGIN
    			SET @results = 0
    			RETURN @results
    		END
    		EXEC @hr = sp_OASetProperty @objRegExp, 'Global', false
    		IF @hr <> 0 BEGIN
    			SET @results = 0
    			RETURN @results
    		END
    		EXEC @hr = sp_OASetProperty @objRegExp, 'IgnoreCase', @ignorecase
    		IF @hr <> 0 BEGIN
    			SET @results = 0
    			RETURN @results
    		END	
    		EXEC @hr = sp_OAMethod @objRegExp, 'Test', @results OUTPUT, @source
    		IF @hr <> 0 BEGIN
    			SET @results = 0
    			RETURN @results
    		END
    		EXEC @hr = sp_OADestroy @objRegExp
    		IF @hr <> 0 BEGIN
    			SET @results = 0
    			RETURN @results
    		END
    	RETURN @results
    	END
    The "source" is the string to search, the "regexp" is the regex regular expression to search for, and the "ignorecase" is just that, ignore case sensitivity (true|false).

    The function will return a "1" if found, otherwise I believe a "0" if not.

    What you can then do is if the regex is found, manipulate that string with a similar regex function. You could even use just a regular old "REPLACE" funtion if you can temporarily convert the 'text' to 'varchar(8000)'...on larger entries you will have to get creative.

    For more info on Windows scripting regex:
    http://msdn.microsoft.com/library/en...asp?frame=true

    If you do develop your own "replace" regex function, be sure to post it here!

  5. #5
    SitePoint Member HuevosRancheros's Avatar
    Join Date
    Mar 2004
    Location
    Santa Monica, CA
    Posts
    19
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    That works! Here are the special characters that I found in help:

    Tab CHAR(9)
    Line feed CHAR(10)
    Carriage return CHAR(13)



    Quote Originally Posted by r937
    ... where ntextfld like '%<p>&nbsp;</p>' + chr(13) + '<p>%'

    at least, i think it's chr(13) -- look it up before you use this, okay?


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •