A query to find a carriage return and line feed

busboy · December 14, 2018, 5:17am

In my testimonials table I have two columns that contain the wording for the testimonials. Those two columns are body and compliantBody. The only difference between the two is that the body column is for the original text an author submitted, and compliantBody is the text that has been proofed by an editor.

One of the problems that the editor fixes is to take one massive paragraphs that authors sometimes write, and convert it into multiple paragraphs for easier reading. The following query will show me all testimonials in the table that don’t have any paragraph breaks:

select testimonialID, title from testimonials where approved = 'Yes' and body not LIKE CONCAT('%', CHAR(13), CHAR(10),'%');

However, when I change the query to focus on the compliantBody column, the results are not accurate at all. Meaning the database will return thousands of testimonials that really do have multiple paragraphs:

select testimonialID, title from testimonials where approved = 'Yes' and compliantBody not LIKE CONCAT('%', CHAR(13), CHAR(10),'%');

The only thing I can think that would cause this is that our editor uses a Windows based computer, and I use a Mac, and there may be a difference in how Windows adds a carriage return and line feed. If this is the case, does anyone know what codes I should have my query look for?

Thank you!

Mittineague · December 14, 2018, 5:45am

I don’t think that NOT LIKE is doing what you think it’s doing.

For example, something like

name
address

could be

name \r\n
address

I’m assuming what you intend is more like

… end.

Start …

which could be

... end. \r\n 
\r\n 
Start ...

You are correct that there are OS differences. AFAIK, they may be both \r\n or only an \r or only an \n

I think ideally the content should have been made consistent when it was INSERTed. Anyway, with the state it’s in you could chain different NOT LIKEs in the WHERE. But note that leading wildcards can be expensive. I have a feeling a regex would be more efficient.

m_hutley · December 14, 2018, 7:27am

This is actually more dependant on the program used rather than the OS, but in general Windows writes \r\n, and linux-based systems (including Mac) will write just \n.

The simple idea would be to search for the \n (0x0A, or 10), because \r by itself should be exceedingly rare.

The more complex idea would be to search for things that dont contain \n AND things that don’t contain \r.

Consider the logic table:

          NotContains R   NotContains N   AND
[none]      true         true            true
\n          true         false           false
\r         false        true             false
\r\n       false        false           false

Alternatively if your database engine provides a Regex implementation, you can do the same thing by using the presumptive pattern /\r?\n/ (or if you really want to be pedantic, /(\r?\n)|(\r\n?)/

system · March 15, 2019, 2:27pm

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.