Preg-Split - explode word and punctuation but keep apostrophes with words

Booktagon is developing a library of best books with summary descriptions. I also split that description into tag words, so people can search for the books.

However, I am having a hard time splitting various arrays into individual words while keeping apostrophes with associated words.

The following preg_split code splits words and punctuation into individual portions, which is what I want, but it also splits apart words with apostrophes, which I do not want.

$commentArray = preg_split('/\\s|(\\W+)/', $trimmedText, -1, PREG_SPLIT_DELIM_CAPTURE);

Example: > > The dog’s bone was dirty. should be [0] The [1] dog’s [3] bone [4] was [5] dirty [6] .
Six elements total. Five words plus one dot / period.

Any help in keeping the apostrophes with the words while also dividing the rest of the sentence into individual words + punctuation would be greatly appreciated.

Try this:

$aa = "The dog's bone was dirty.";
# should be [0] The [1] dog's [3] bone [4] was [5] dirty [6] .
$aa = explode(' ', $aa );
echo '<pre>'; print_r($aa); echo '<br />';

Output:

Array
(
    [0] => The
    [1] => dog's
    [2] => bone
    [3] => was
    [4] => dirty.
)

Version #2 - eliminate duplicates

$aa = "The dog's bone was dirty. duplicated word duplicated word duplicated word duplicated word";
# should be [0] The [1] dog's [3] bone [4] was [5] dirty [6] .
$aa = explode(' ', $aa);
$aa = array_flip($aa);
echo '<pre>'; print_r($aa); echo '<br />';

Output:

Array
(
    [The] => 0
    [dog's] => 1
    [bone] => 2
    [was] => 3
    [dirty.] => 4
    [duplicated] => 11
    [word] => 12
)

Excellent. Is there a way to separate that period . from “dirty”

Basically, it seems PHP or Regex codes classify apostrophes the same way they classify all other non-alpha-numeric characters. However, I need a way to classify the apostrophe as an alpha-numeric character (so that it sticks with the words) without having all the other punctuation assigned as such (so that they separate).

Try this:

  $aa = str_replace(array("'", '!', '?'), '', $aa);

http://php.net/manual/en/function.str-replace.php

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.