Results 1 to 16 of 16
Aug 30, 2011, 00:57 #1
- Join Date
- Jan 2006
- 0 Post(s)
- 0 Thread(s)
Choose whole sentences and ONLY whole sentences RELIABLY with regex
I have found the solution to select ANY whole English sentence reliably regardless of quotation marks, or even punctuation marks used inside them for abreviations, decimals or whatever other purposes! Tests reliably on any non-accented string!
In human language it reads as follows:
Find a non-accented capital letter that might be preceeded by a quotation mark and check that it is not directly followed by any punctuation marks to exclude capital letter abbreviations inside sentences. Then crawl forward by repeating a group consisting of a negative look-ahead and the universal selector character until you arrive at the end of the sentence you are in. You will know you are there if you find the sequence of a possible quotation mark - the one closing its pair at the start of your sentence, followed by the sentence- closing punctuation mark and the white space that neccessarily separates your sentence from the next one. Then you repeat the criteria for the start of a sence to see it's already a new one! Because of the negative condition in the look-ahead the repeated group - the universal selector really - did not choose the closing punctuation mark + the possible quotation mark, so you should care for these separately.
SUGGESTION FOR FURTHER DEVELOPMENT:
Together with the starting non-accented capital letters you can also use hexadecimal notations to describe accented ANSI capital letters to select sentences in any other European languages. But this is not an issue for me at the present..