How to search and highlight words of the same meaning such as "pride", "proud"

I would like to highlight search words of the same meaning in common between two texts. Let’s say “pride” appears in one text and “proud” appears in the other. I would like to see them in bold and in color.

colors.php (832 Bytes)
highlight.php (9.1 KB)

This will highlight words that are found across one or more passages of text. (This example uses 3).

The passages are passed in as an array.


$passages = [
    'passage one',
    'passage two'
];

Keywords are also passed as an array.

$words = [
    'throne',
    'judgment'
];

Words that share the same meaning are grouped as an array. These words will also be highlighted with the same color.

$words = [
    'throne',
    ['pride', 'proud'],
    'judgment'
];

This example will also re-cycle the colors. For example, if you have a selection of 5 colors but 6 words are matched, the sixth word will be highlighted using the first color!

function highlight($passages, array $words, array $colors, $ignoreCase = false)
{
    $wordToColor = [];
    $colorIdx = 0;
    $highlighted = [];

    foreach ($passages as $passage) {
        foreach ($words as $wordGroup) {
            if (is_array($wordGroup)) {
                $re = '~\\b('.implode('|', $wordGroup).')\\b~';
            } else {
                $re = '~\\b('.$wordGroup.')\\b~';
            }

            if ($ignoreCase) {
                $re .= 'i';
            }

            $passage = preg_replace_callback($re, function ($m) use ($re, $colors, &$wordToColor, &$colorIdx) {
                $word = $m[0];
                if (!isset($wordToColor[$re])) {
                    $wordToColor[$re] = $colors[$colorIdx];
                    $colorIdx = $colorIdx < count($colors) - 1 ? $colorIdx + 1 : 0;
                }

                $color = $wordToColor[$re];

                return sprintf('<b style="color:%s">%s</b>', $color, $word);
            }, $passage);
        }

        $highlighted[] = $passage;
    }

    return $highlighted;
}

$passages = [
    'Once more unto the breach, dear friends, once more. Or close the wall up with our English dead.',
    'On, on, you noblest English, whose blood is fet from fathers of war-proof!',
    "Follow your spirit, and upon this charge cry 'God for Harry, England, and Saint George!"
];

$words = [
    'once more',
    'god',
    ['england', 'english'],
    'blood',
    'spirit',
];

$colors = ['red', 'green', 'blue', 'yellow'];

$highlighted = highlight($passages, $words, $colors, true);

$p = array_reduce($highlighted, function ($html, $passage) {
    return $html .= "<p>$passage</p>";
}, '');

echo <<< EOF_HTML
  <!doctype html>
  <html>
    <head>
      <meta charset="utf-8">
      <title>Highlight</title>
      <style>
        p {
          margin: 16px;
        }
      </style>
    </head>
    <body>
    $p
    </body>
  </html>
EOF_HTML;

I haven’t been following the earlier thread on this topic, but I can’t understand why your solution has highlighted “Once more” when it only appears in one of the phrases. Or “Blood”, for that matter. What did I miss?

Updated version that can either highlight words only if they appear in all the passed passages, or, highlight the words if they appear in one or more of the passages.

function highlight($passages, array $words, array $colors, $ignoreCase = false, $mustMatchAllPassages = false)
{
    $wordToColor = [];
    $colorIdx = 0;
    $highlighted = [];

    $regExp = array_map(function ($wordGroup) use ($ignoreCase) {
        if (is_array($wordGroup)) {
            $re = '~\\b('.implode('|', $wordGroup).')\\b~';
        } else {
            $re = '~\\b('.$wordGroup.')\\b~';
        }
        if ($ignoreCase) {
            $re .= 'i';
        }
        return $re;
    }, $words);

    if ($mustMatchAllPassages) {
        $regExp = array_filter($regExp, function ($re) use ($passages) {
            foreach ($passages as $passage) {
                if (!preg_match($re, $passage)) {
                    return false;
                }
            }
            return true;
        });
    }

    foreach ($passages as $passage) {
        foreach ($regExp as $re) {
            $passage = preg_replace_callback($re, function ($m) use ($re, $colors, &$wordToColor, &$colorIdx) {
                $word = $m[0];
                if (!isset($wordToColor[$re])) {
                    $wordToColor[$re] = $colors[$colorIdx];
                    $colorIdx = $colorIdx < count($colors) - 1 ? $colorIdx + 1 : 0;
                }

                $color = $wordToColor[$re];

                return sprintf('<b style="color:%s">%s</b>', $color, $word);
            }, $passage);
        }

        $highlighted[] = $passage;
    }

    return $highlighted;
}

$passages = [
    'But when the blast of war blows in our ears then imitate the action of the tiger. Stiffen the sinews, summon up the blood.',
    'On, on, you noblest English, whose blood is fet from fathers of warproof!',
    'Be copy now to men of grosser blood, and teach them how to war.'
];

$words = [
    'god',
    ['war', 'warproof'],
    'blood',
    'sinews',
    ['england', 'english'],
    ['blast', 'noblest', 'grosser'],
];

$colors = ['red', 'green', 'blue', 'orange'];

$highlighted1 = highlight($passages, $words, $colors, true, true);
$highlighted2 = highlight($passages, $words, $colors, true, false);

$html = "<h1>Keywords</h1><p>'god', ['war', 'warproof'], 'blood', 'sinews', ['england', 'english']</p>";
$html .= '<h2>Keywords must appear in all passages</h2>';
$html = array_reduce($highlighted1, function ($html, $passage) {
    return $html .= "<p>$passage</p>";
}, $html);
$html .= '<h2>Keywords can appear in one or more of the passages</h2>';
$html = array_reduce($highlighted2, function ($html, $passage) {
    return $html .= "<p>$passage</p>";
}, $html);

echo <<< EOF_HTML
  <!doctype html>
  <html>
    <head>
      <meta charset="utf-8">
      <title>Highlight</title>
      <style>
        h1, h2 {
          font-size: 16px;
        }
        p {
          margin: 16px;
        }
      </style>
    </head>
    <body>
    $html
    </body>
  </html>
EOF_HTML;

That’s not what I’m looking or. Although if both texts mention either England or English it should highlight them. Let’s say text 1 says:

I speak English.

and text 2 says

I’m from England.

it should highlight with one color. But the words have to be searched. If I search England,English.

Within my url I have semicolons separating different words. And I have words separated by a coma because they are somehow similar - like England and English.

Don’t let how the data is passed into the code dictate what you can or can’t do with it. Parse and convert this data if its’s not in a format that is acceptable to the function. Write some code that takes your $_GET data and converts it into an array of words.

I have already. But this highlights only if the searched word is found in both texts.

    if( isset($_GET["keywords"]) ):
        $kw = $_GET["keywords"];
    endif;

    $keywords = explode(";", $kw); //breaks the unrelated keywords into an array
    $keyword_color = array(); //creates an empty array to put together the keyword and it's color
    $kwNotFound = array(); //creates an empty array to put together the keyword not found  and it's color
    $text = "";
    for($j=0; $j < count($id); $j++){ //counting the db table id
        $text = $text.stripslashes($textData[$j]); // loads the results from the db table for the 1st text
    }
    $text2 = "";
    for($j=0; $j < count($id2); $j++){
        $text2 = $text2.stripslashes($textData2[$j]); // loads the results from the db table for the 2nd text
    }

    for($i=0; $i < count($keywords);$i++){ //counts the number of keywords in total
        $keywordArr[$i] = explode(",", $keywords[$i]); // splits the resembling words into an array. This and what follows is probably where I need to work on.


        for($c=0; $c < count($keywordArr[$i]);$c++){
            if ((preg_match("/".$keywordArr[$i][$c]."/i", $text)) && (preg_match("/".$keywordArr[$i][$c]."/i", $text2))) {
                array_push($keyword_color, array($keywordArr[$i][$c], $colors[$i])); //stores keywords found and it's assigned color into the array
            } else {
                array_push($kwNotFound, array($keywordArr[$i][$c], $colors[$i])); //stores keywords NOT found and it's assigned color into the array
            }
        }
    }

// the rest shows it in writing.
    $kwstring = "";
    $wordsfound = "";
    for($j=0; $j < count($keyword_color);$j++){
        if($j > count($keyword_color)-2){
            $kwstring .= "<br />+ ";
            $wordsfound .= " and \n";
        }
        $kwstring .= $keyword_color[$j][0];
        $wordsfound .= "<span style=\"font-weight: bold; color: ".$keyword_color[$j][1]."\">".$keyword_color[$j][0]."</span>";
        if($j < count($keyword_color)-2){
            $kwstring .= "<br />+ ";
            $wordsfound .= ", \n";
        }
    }
    $wordsfound .= ". \n";

At this point I’m a bit confused. Is it my example code that is not highlighting correctly or is it your code that you have just shown? If it’s your code you may want to look at the line

if ((preg_match("/".$keywordArr[$i][$c]."/i", $text)) && (preg_match("/".$keywordArr[$i][$c]."/i", $text2))) {

This reads as “if the word is in both texts then we have found a match”. You may need to do this instead

  if ((preg_match("/".$keywordArr[$i][$c]."/i", $text)) || (preg_match("/".$keywordArr[$i][$c]."/i", $text2))) {

This now says " if the word is in either text then we have found a match" Without trying your code I can’t be 100% sure.

yes that’s my code which I would like to be fixed or if there’s a better/simpler way to write.

ok. Let’s say the word “pride” appears in one text but “proud” appears in the other text it should highlight both.

Or let’s say there I’m looking for a combination of words that are either synonymous or have a root. For example “sit”, “sat”, “sitting”, or perhaps even “seat” to appear in one color. If either one of these words appears in both texts it should highlight.

I have to ask this, and please don’t take offence, but did you even try my examples? Unless I missed something my code should be doing what you asked for.

Why do you want me to try something that’s not what I’m looking? The words you highlighted are in different colors on the highlight2.png picture.

Blast is in blue and so is noblest and grosser and they are not related to each other. The words related to each other should be highlighted by the same color such as “write”, “written”, “writing”, “wrote”.

I explained it so many times.

In my second example, blast, noblest and grosser are highlighted in the same color because in my example they are related. I appreciate that in a dictionary they don’t share the same meaning. The words were just chosen to demonstrate the code.

$words = [
    'god',
    ['war', 'warproof'],
    'blood', // Word on its own that has no other meanings.
    'sinews',
    ['england', 'english'],
    ['blast', 'noblest', 'grosser'], // Array indicates related words.
];

Sorry if this comes across harsh. But you asked.

Because when someone is nice enough to give you example code, the nice thing to do is try it.
Because there is a slight chance you may be able to tweak it to exactly what you’re looking for if it doesn’t do so as is.
Because even if you can’t bang it into shape, you just might learn something from studying it.
Because no member here is obligated to give you custom code to meet your rather vague and changing specifications.

If you really want specific help, you should provide specifics.

That is, outline all the conditions and goals as clearly as you can. For example, as best as I have gleaned from scattered posts

Haystack - paragraphs of text
Needles - array of words and variants, words may contain spaces
Colors - array of color values to be used to highlight needles found in the haystack

all occurrences of a needle and its variants should have the same highlight color
no same color should used for different needles

4 Likes

I’m trying your code. Is this an array within an array?
$words = [ 'god', ['war', 'warproof'], 'blood', 'sinews', ['england', 'english'], ['blast', 'noblest', 'grosser'], ];

WHat is the difference between this and the Array() because I want to replace it with the array of $_GET[“”].

That’s correct. The example below should parse $_GET and convert it into an array that highlight can work with.

// Assume site was passed the following parameters:
// www.example.com/index.php?keywords=god;war,warproof;blood;sinews;england,english;blast,noblest,grosser
$words = array_map(function ($wordGroup) {
    return explode(',', $wordGroup);
}, explode(';', $_GET['keywords']));

I’m working on what you gave. I’m having second thoughts. My goal is to find common words or phrases between two texts. What I’m doing right now is guessing and searching one by one which words are in common.

But I’m thinking, why not convert the 1st text into an array of words and search them by highlighted them in the 2nd text. But my dilemma is that I will run out of colors from my colors array to highlight them.

Is there something that will differentiate the words in order of importance? If so then sort the array of words from the first text in that order and then stop processing them when you run out of colours.

I’m working on what you gave. I’m having second thoughts. My goal is to find common words or phrases between two texts. What I’m doing right now is guessing and searching one by one which words are in common.

But I’m thinking, why not convert the 1st text into an array of words and search them by highlighted them in the 2nd text. But my dilemma is that I will run out of colors from my colors array to highlight them.

Looking at your post, I think there’s a misunderstanding. Here’s an attachment of the htm file of what I’m trying to do:
Downloads.zip (749.0 KB)

does php have a way of creating colors?