Problems extracing keyword from GA cookie

Morning all - I’m hoping someone might be able to help me with a function problem I’m having.

I’m trying to extract keyword information from the Google Analytics cookie with the following code:


<?
function parse_ga_cookie($cookie)

{

    $values = sscanf($cookie, '%d.%d.%d.%d.utmcsr=%[^|]|utmccn=%[^|]|utmcmd=%[^|]|utmctr=%[^|]');

    if (count($values) !== 8) {

        // return false; trigger_error(...); ... or whatever you like

        throw new InvalidArgumentException("Cookie value '$cookie' does not conform to the __utmz pattern");

    }

    $keys = array('domain', 'timestamp', 'visits', 'sources', 'campaign', 'source', 'medium', 'keyword');

    return array_combine($keys, $values);

}

// Cookie info
$keyword=print_r(parse_ga_cookie($_COOKIE['__utmz']));
?>

Now unfortunately, the $keyword variable is coming up empty all the time and I was hoping someone might be able to tell me where I’m going wrong.

Having looked through it, I"m wondering if the problem lies with the sscanf() function which parses a string based on placeholders. However, “%[^|]” looks more like a regular expression which isn’t valid in sscaf()

If that’s the case, how should I go about changing this? Is it just as simple as replacing sscanf() with preg_match()?

Thanks a lot for your help!

$keyword will contain the return value from print_r() which will be TRUE.

Those %[^|] are not regular expressions. They are basic wildcard patterns (in this case, “some non-pipe characters”), a perhaps small but important distinction.

Finally, a few questions. What does the print_r() actually output, I’m guessing the expected array? Why would you expect a function that returns an array to return a single value (the keywords)?

Sorry, you’re right - I posted the wrong snippet. I included the print_r() variable like that, just to make sure that the array was being formed correctly.

I guess I was originally concerned that nothing was coming through in the keyword section of the array, which was I wasked the question about regular expressions. The “%d” seems to indicate I was looking for an integer, where as the keyword part of the array will obviously be containing a string.

If you think it still looks ok though, I’ll continue to test.

Show us some examples of the cookie values that cause the error, and the output from the function.

Thanks for taking the time to look at this Salathe - it’s much appreciated.

Here’s a screenshot of the populated array the last time that I ran the script.

As you can see, the integers have populated without a problem, but the strings aren’t there, which is why I thought there might have been a problem with the sscanf() function.

Show us some examples of the cookie values that cause the error… (:

Sorry Salathe,

Here you go:

217416581.1296138642.1.1.utmccn=(organic)|utmcsr=google|utmctr=best+ereader+reviews|utmcmd=organic

Ahh, the utm* items are in a different order than required for the sscanf() to read them. A regex, or more involved approach, might be necessary.

That would make sense.

When you speak of regex, if I’m only looking for one value (in this case the utmctr value) would it be possible to set up a simple pattern match which would simply look for the string ‘utmctr=’ and then assign everything from there up to the pipe in a variable?

Thanks,

Sure, why don’t you have a go while I finish up here at work. :slight_smile:

I think I managed to get it sorted out last night - it’s not quite as elegant as the previous code, but it seems to solve the problems that were being caused.

Thanks for highlighting the issues Salathe, and for your help!