Converting paragraph words into an array

Here is some code I am testing. Notice the hard return:

$paragraph "This is the first sentence in my paragraph of text.

Then the next sentence begins with more text.";

$paragraph = strtolower(str_replace( array( "&", "!", '"', ".", "'", ",", "?", "\r", "\n" ), '', $paragraph));

$paragraph = explode(" ", $paragraph);

print_r($paragraph);

When I view the contents of the array, it is having problems when it encounters a hard return. This is an example:

=> textthen

The word “text” should be it’s own element in the array, as well as the word “then”. What is my str_replace() missing that is causing this?

Thank you!

a space in what you are converting those characters to.

Ok, so the first line deletes the special characters and then I also added a new line of code that adds a space if it comes across a \r or a \n.

$tempBody		= strtolower(str_replace( array("!", "@", "#", "$", "%", "^", "&", "*", "(", ")", "-", "+", "{", "}", ";", ":", "<", ">", "?", ",", "."), '', $body));
$tempBody		= strtolower(str_replace( array("\r", "\n" ), ' ', $tempBody));

When I print out the array the original problem is gone, where it would combine the last word of a sentence with the word that begins the next sentence. The only problem now is that some elements are empty, and they occur exactly at the end of each paragraph, like so:

[42] => children
[43] => to
[44] => care
[45] => for
[46] => 
[47] => i
[48] => know
[49] => that
[50] => my

Not a huge deal, as I’m only putting the paragraph into an array so I can test each word against a list of other keywords. So an empty element is something I can live with, unless you have another idea.

Thanks!

You are replacing both “\r” and “\n” with a space.
Some OS use “\r\n” for newlines. This will result in 2 spaces.
For those, if you split on spaces you get empties.

I think if you add a “\r\n” to the beginning of that array it should work better.

Thank you Mittineague, that seemed to do the trick after taking what you gave and tweaking it for a Mac. Here is the code:

$tempBody		= strtolower(str_replace( array("\n\n"), ' ', $body));
$tempBody		= strtolower(str_replace( array("!", "@", "#", "$", "%", "^", "&", "*", "(", ")", "-", "+", "{", "}", ";", ":", "<", ">", "?", ",", "."), '', $tempBody));

only needs to be run once - running it a second time doesn’t do anything.

Good catch, removing it now.

Thank you.

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.