Using RegEx with LINQ

I’m having a little trouble with using regex in linq. I want to be able to perform a search on values in my database. The goal is to insert a value into the regex and then use that to search a column in a table in my database for that term. I want it to work so that it gets all exact matches and also returns matches in which the search term is a substring of the term in the column.

I was trying to follow this tutorial on msdn, but it doesn’t quite fit perfectly with my problem:

        
public List<Event> GetEventsByName(string eventString)
        {
            PollDbDataContext db = new PollDbDataContext();

            Regex eventName = new Regex(eventString, RegexOptions.IgnoreCase);

            var matchingEvents =
                from events in db.Events
                let matches = eventName.Matches(events.name)
                where eventName.Matches(events.name).Count > 0
                orderby events.name.Count() descending
                select events;

            return matchingEvents.ToList();
        }

I’d appreciate any advice.

Do you really need RegEx for that? Why not just search the string for an .IndexOf() instead? Unless of course you are searching for patterns and not fixed strings.

Using something like this?

        public List<Event> GetEventsByName(string eventString)
        {
            PollDbDataContext db = new PollDbDataContext();

            var matchingEvents =
                from events in db.Events
                where events.name.IndexOf(eventString) > 0
                orderby events.name descending
                select events;

            return matchingEvents.ToList();
        }

I’ve never used IndexOf before, and this is not quite working. Am I using it correctly?

i’m almost sure that regular expressions are not supported in linq queries in LinqToSql or EntityFramework. How would that be translated into SQL?

The only way I got SQLServer to execute regular expressions was with SQL CLR.

Try something like this:

	public partial class TestBed1 : System.Web.UI.Page
	{
		private class Event
		{
			public string text { get; set; }
		}

		private List<Event> GetSearchResults(string eventSearch)
		{
			using (var ldc = new TestBedDataContext())
			{
				return (from s in ldc.TestBeds
						where s.testText.IndexOf("should") > 0
						select new Event
						{
							text = s.testText
						}).ToList();

			}
		}
		protected void Page_Load(object sender, EventArgs e)
		{
			var searchResult = GetSearchResults("should");
			searchList.DataSource = searchResult;
			searchList.DataBind();
		}
	}

Well, I’m thinking that maybe if you want to search for whole words only, for example you’d want the word “will” to return but not the name “William” then maybe RexEx would be an option because you can use whitespace or word boundaries. If you just want to search for the input text in a string no matter if it’s a whole word or not then IndexOf() does the job just fine.

Are they supported in Linq to Objects? If so, you could run a Linq to Sql query using the IndexOf() option and then Linq the resulting List or something like that.

Whoops, refactor fail.

Change
… where s.testText.IndexOf(“should”) > 0
to
… where s.testText.IndexOf(eventSearch) > 0

Actually, with LINQ to SQL, I believe <i>Contains</i> method of System.String would be a better choice. It will translate into a <i>like</i> SQL condition which can be processed more efficiently by the SQL server.

If you want to make sure that the search term matches entire words only, you could filter out the “false positives” from the result set.

Ah, excellent point honeymonster.
Using .IndexOf():

SELECT COUNT(*) AS [value]
FROM [dbo].[TestBed] AS [t0]
WHERE (
    (CASE
        WHEN (DATALENGTH(@p0) / 2) = 0 THEN 0
        ELSE CHARINDEX(@p0, [t0].[name]) - 1
     END)) > @p1
-- @p0: Input NVarChar (Size = 1; Prec = 0; Scale = 0) [M]
-- @p1: Input Int (Size = 0; Prec = 0; Scale = 0) [0]

Using .Contains():

SELECT COUNT(*) AS [value]
FROM [dbo].[TestBed] AS [t0]
WHERE [t0].[name] LIKE @p0
-- @p0: Input NVarChar (Size = 3; Prec = 0; Scale = 0) [&#37;M%]

Sorry about the poor choice.

One way of using Regular expression with LinqToSql is to create a CLR scalar-valued function.


public partial class UserDefinedFunctions
{
[SqlFunction(IsDeterministic = true, IsPrecise = true)]
public static SqlBoolean IsMatch(string input, string pattern)
{
return Regex.IsMatch(input, pattern);
}
}

More info:

deploy the assembly to sqlserver and enable clr integration.
http://msdn.microsoft.com/en-us/library/ms131048.aspx

then add the scalar-valued function to your DataContext


[Function(Name="dbo.IsMatch", IsComposable=true)]
public bool IsMatch([Parameter(DbType="NVarChar(4000)")] string input, [Parameter(DbType="NVarChar(4000)")] string pattern)
{
return (bool)this.ExecuteMethodCall(this, ((MethodInfo)(MethodInfo.GetCurrentMethod())), input, pattern).ReturnValue;
}
 

now you can call it in linq to sql queries


using (var context = new LemonDataContext())
{
var results = context.Employees.Where(employee => context.IsMatch(employee.FirstName, @"^R\\w*$"));
foreach (var employee in results)
{
Console.WriteLine(employee.FirstName);
}
Console.ReadLine();
}

this result in the following sql statement.


exec sp_executesql N'SELECT [t0].[Id], [t0].

, [t0].[FirstName], [t0].[LastName], [t0].[MiddleName], [t0].[NickName]
FROM [dbo].[Employee] AS [t0]
WHERE [dbo].[IsMatch]([t0].[FirstName], @p0) = 1’,N’@p0 nvarchar(6)‘,@p0=N’^R\w*$’

Yea, I would also use the .contains method instead of index of. But just on a side note. If you where going to use index of. you would want to replace

stringval.indexOf() > 0;

with

stringval.indexOf() >= 0;

If the value you are looking for is at the beginning of the string it will return 0. It will return -1 if not found.

Oh, man. That’s right.