Any ideas how to compare and merge array values

Hello all,

I am importing some fields from a cvs and also pulling some fields from a db.

I would like to know if there is a way to compare the values on a character by character of 2 arrays so that the field can be updated with the concatenation but without duplication.

$subject value might be “1948”
$output value could be “1948 the next war”

Ideally I would like to be able to combine these after comparing them but hopefully without duplicating the content in the first variable - $subject.

I found some info on array_filter but I don’t know if it can be used in this case along with array_merge.

I have put together a test control as follows:


//$row["subject"] = $row["subject"];
//$marc["606"]     = $output;
//array_filter($array2, "even")

        print "output ".$output    ." ".strlen($output) ."<br />";
        if ( trim($subject) != trim($output) ) {
                $temp = $output;
                $subject .= $temp;
                //$typographer = substr($output, 0, strrpos($output, ","));
                $marc["606"] = $subject;
                print "not equal ".$marc["606"]." ".strlen($marc["606"]) ."<br />";
            }else{
                $marc["606"] = $output;
                print "equal ".$marc["606"].strlen($marc["606"]) ."<br />";
        }

Thank you,
Peter

I find your question vague.

Maybe you could provide some carefully chosen sample input and output to help express yourself.

Well, using the example variables shown above,

If the
$subject value might be “1948”
$output value could be “1948 the next war”

but if the values differ
$subject value might be “1948 gothem story”
$output value could be “1948 the next war”

then I want to merge them as
“1948 gothem story, 1948 the next war”

rather than
“1948, 1948 the next war”
or
“1948 gothem story, the next war”

I hope that this example is clearer.

Like this?


if (0 === strpos($output, $subject)) {
    $result = $output;
} else {
    $result = "$subject, $output";
}

If you want to do this automatically on an array of values, then it sounds like maybe you want to do something like
“remove any array element that is a left prefix of any other array element”

Thanks but that code is not what I am trying to figure out.

What I want to know is if it is possible to scan the first array and compare it with the second and what array commands might one use to do it.

The only other way that I can think of is to pull the array data into temporary variables and then try a comparison with string commands.

You’re probably going to have to do what you said, do a string comparison.

But if you want help with some arrays then it might be best to post some test values, and be explicit about what you want - something people can just copy and paste.

Making us write out (possibly different) test cases each is a barrier to getting definitive answers.


$a = array(
  '1948'=>'1948 was not 1984'
, 'and' => 'so on'
);
$b = array(
  '1948 gothem story'
, '1948 the next war'
);


What are the keys? Are the keys important or relevant? Is the above a meaningful representation of sample code, if not think deeply about some of the edge cases you might come across - and like I say, be specific about what should the result be.

Hello Cups,

No, the keys are not important. The actual data is an exported file that is a variable multi-dimensional array.

I am not sure, but I think that the structure of the file has a problem and that is why some of the data is partially duplicated in the sub arrays of sub arrays.

I will have to think about this one because from what I have found on the forum, the comparison that I am thinking about is not that easy.

Thanks

The point I was trying to make is that I gave you a working solution to the example you posted. I won’t continue asking you for a better example set of input/output or explanation.

crmalibu,

I tried your suggestion and even made a small change as follows:


if (0 === mb_strpos($rights, $output)) {
    $result = $output;
} else {
    $result = "$rights, $output";
} 
print str_repeat("-", 100)."<br />\
";
print $rights."<br /><br />";
print $output."<br /><br />";
print $result."<br /><br />";
print str_repeat("$", 100)."<br />\
";

but unfortunately the results were still the same as follows:


----------------------------------------------------------------------------------------------------
&#914;&#949;&#957;&#940;&#961;&#948;&#959;&#962;, &#935;., &#922;&#945;&#952;&#951;&#947;&#951;&#964;&#942;&#962; &#924;&#945;&#952;&#951;&#956;&#945;&#964;&#953;&#954;&#974;&#957;

&#922;&#945;&#952;&#951;&#947;&#951;&#964;&#942;&#962; &#924;&#945;&#952;&#951;&#956;&#945;&#964;&#953;&#954;&#974;&#957;, &#922;&#945;&#952;&#951;&#947;&#951;&#964;&#942;&#962; &#934;&#953;&#955;&#959;&#963;&#959;&#966;&#943;&#945;&#962;

&#914;&#949;&#957;&#940;&#961;&#948;&#959;&#962;, &#935;., &#922;&#945;&#952;&#951;&#947;&#951;&#964;&#942;&#962; &#924;&#945;&#952;&#951;&#956;&#945;&#964;&#953;&#954;&#974;&#957;, &#922;&#945;&#952;&#951;&#947;&#951;&#964;&#942;&#962; &#924;&#945;&#952;&#951;&#956;&#945;&#964;&#953;&#954;&#974;&#957;, &#922;&#945;&#952;&#951;&#947;&#951;&#964;&#942;&#962; &#934;&#953;&#955;&#959;&#963;&#959;&#966;&#943;&#945;&#962;

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

I am looking at 2 alternatives, string comparison and array_unique.

Thank you

Just finished trying array_unique and array_combine but that doesn’t appear to work properly with the utf-8.

Hi Peter,

I’m with going to reiterate what the others are saying I’m afraid.

As per Cups’ excellent example, you’re going to have to put together a self-contained small code sample for us.

By the sounds of it, post 2 records from each of the potential sources would do. Then post what you would like to be produced given the examples supplied, detailing how you get there.

Eg.

I have flour, I have eggs, I’d like pancakes. :wink:

Hello Anthony, crmalibu, and Cups,

Sorry for the delay but I had to do some testing and finally put together a short list of records with duplications to understand this better.

From what I see, it looks like the duplication problem can be solved.

Anyway, I am attaching a short list of records to test.
I think that a string comparison from commas going backward with an exact match that leaves only one of the duplicates per record might be a solution.

A problem that I see might be comparing utf-8 for an exact match.

Peter

Hi Peterb,

It seems like you have just solved the problem.

I just thought an alternate solution. From your example above, it seems like there is a format for every record that you want to have an output.

input: “[year] [words]”
output: “[year] [words], [year] [words]”

So, instead of trying to compare character by character, you can put a record into a string, then extract the [year] part of that record.

hope you understand what i just said up until this point :smiley:

then, you can do the comparison.

Also, for the purpose of concatenating, you just check the string whether it contains [words] or not.

Let me know if you need more explanation. I gotta go! :slight_smile: