GROUP BY intervering with other output

I have a small statistics page for a pay per click advertising campaign. Most of it is woking great thanks to some people here on the forum. I have a small detail that I can’t get to work though.

I have a table click_statistics:


CREATE TABLE IF NOT EXISTS `click_statistics` (
  `click_id` int(8) unsigned NOT NULL auto_increment,
  `advertiser_id` int(8) NOT NULL,
  `ip_address` char(15) NOT NULL,
  `country_code` char(2) NOT NULL,
  `country_name` varchar(50) NOT NULL,
  `click_date` timestamp NOT NULL default CURRENT_TIMESTAMP,
  `referrer` varchar(128) default NULL,
  PRIMARY KEY  (`click_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8;

I query I use to get total clicks, clicks per country and percentages per country is this:


		SELECT
    a.advertiser_id
	, c.country_code 
	, c.country_name
  ,	COUNT(*) AS total_clicks
  , ROUND( 100.0 * COUNT(*) / ( SELECT COUNT(*) 
		FROM click_statistics c )) as perc_clicks
		FROM click_statistics c
    INNER JOIN advertisers a ON c.advertiser_id = a.advertiser_id
    WHERE a.advertiser_id = 1
    GROUP BY country_code
    ORDER BY perc_clicks DESC

The problem I experience is with additional information I would like to provide per country as well. Such as ip_address, referrer and click date. When I tried to integrate these values in the above query the return was just one row per country. So I decided to create a separate query for those values and link them to the first query:


	SELECT
    ip_address
	, referrer
  , click_date
  FROM click_statistics
  WHERE advertiser_id = #getCountryclicks.advertiser_id#
  AND country_code = #getCountryclicks.country_code#

Where getCountryclicks is the first query mentioned! This works just partly. It gives the right number of records per country, but it just show the first row of each value multiple times. (first: ip_address, referrer and click_date).

What am I doing wrong here?

GROUP BY really seems to cause people a lot of issues

below is a brief summary of the rules regarding aggregation i wrote recently in another thread

after you’ve had a chance to absorb this, see if you can apply it to your situation, and if not, just post back and let me know


rule #1 of aggregation is explained briefly as follows

if there is no GROUP BY clause, then every expression in the SELECT clause must be an aggregate function, e.g. COUNT() or SUM() or MIN(), etc., and cannot be one of the columns

break dis rule and de results be unpredictable

rule #2 of aggregation is explained briefly as follows

if there is a GROUP BY clause, then columns are allowed in the SELECT clause alongside aggregate expressions (which are then optional), but every such column in the SELECT clause must be one of the columns in the GROUP BY clause

mysql docs call this a “hidden” column, because it’s in the SELECT clause but hidden (missing) from the GROUP BY clause

break dis rule and de results be unpredictable

:cool:

Hi rudy. Thank you for the reply. Let me see if I get you right. No matter which rule I follow, these three


    a.advertiser_id
  , c.ip_address  
  , c.country_code

need to be removed from the query? Or do i see it wrong

there’s no point in removing columns from the query if you actually need what’s in those columns

just remove the advertiser_id, and then you will see that the query works correctly

you will have both country code and country name in the SELECT clause, but only country code in the GROUP BY clause, and although that would be non-standard SQL, mysql will run it anyway, and what’s more, it’ll work correctly

remember i called it a “hidden” column? see this explanation for why country name is okay in the SELECT clause – http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-columns.html

why did you want advertiser? more specifically, which advertiser did you want?

Hi rudy I think I get your point and obviously removed the advertiser_id and JOIN from the query, so now I have the ip_address from the same query but It it keeps giving me just one ip address per country. As a test I have three countries in the database. Since, when I would actually click my self for testing the same ip address would appear, I have changed them all (26) manually. One country should show 12 ip addresses, the second one 8 and the third 6 but like I say I just get 1 ip per country in the output

and it’s an unpredictable one, at that :slight_smile:

perhaps you could explain, in words, what you actually want?

Rudy I got it solved. I created a separate query within the output referring to the first one and now its okay. But it definitely had to do with the advertiser_id since I tried this earlier as well but it wasn’t working. Thank you again

you’re mixing several different types of totals, so you have to break them apart in a UNION query

SELECT 1              AS rowsort
     , 'total clicks' AS rowtype
     , NULL           AS country
     , NULL           AS ip
     , COUNT(*)       AS total
     , 100.0          AS pct          
  FROM click_statistics
UNION ALL
SELECT 2
     , 'clicks per country'
     , country_name
     , NULL
     , COUNT(*)
     , 100.0 * COUNT(*) /
       ( SELECT COUNT(*) FROM click_statistics )
  FROM click_statistics
GROUP
    BY country_name
UNION ALL
SELECT 2
     , 'clicks per ip'
     , country_name
     , ip_address
     , COUNT(*)
     , NULL
  FROM click_statistics
GROUP
    BY country_name
     , ip_address
ORDER
    BY rowsort
     , country
     , rowtype
     , ip

How about showing the final product?

Andy

Hi Rudy you’re a star. That would have probably been my next question in line.

Still working on it. If ready and the site owner agrees I’m more than willing to show the end result. And otherwise I will try to write it down as reference for someone else to use.