SUM() is not giving accurate value

I have the following mySQL query:

"SELECT 
     (SELECT COUNT(id) FROM clients 
          WHERE community_id = " . $community . " 
          AND visit_date BETWEEN CAST('" . $start_date . "' AS DATE) AND CAST('" . $end_date . "' AS DATE))
     AS 'New Clients',
     (SELECT SUM(amount_given) FROM contact_records 
          JOIN users 
          ON users.id = contact_records.careplanner_id 
	  JOIN churches 
	  ON users.geo_id = churches.id 
	  JOIN communities 
	  ON churches.community_id = " . $community . " 
	  WHERE contact_records.created_at BETWEEN CAST('" . $start_date . "' AS DATE) AND CAST('" . $end_date . "' AS DATE))
     AS 'Benevolence Funds ($)' ";

It is consistently giving me five times the correct value for the SUM() in the second part. Could this be because of the JOINs? If so, what kind of JOIN should I be using?

yes :winky:

I think you’re asking the wrong question.

The correct question is “What rows are the joins returning?” If you know that, you can see how you need to adjust your join conditions (or your selection criteria).

Change the SUM() selection to return all the fields being joined and amount_given. I think you’ll see you’re getting duplicate rows based on your selection criteria. Is amount_given on the table you think it is, or is it somewhere which is creating dupes?

of course

in the 2nd subquery, you are selecting contact_records for a certain date range, then finding their careplanners, then the churches they belong to, and then you join to all communities

Sorry I left this for so long - weekends tend to be busy. I am going to try to break this down and figure out exactly what I have written in this query and why it is giving me five times the real value. I don’t use JOINS often enough to have a solid understanding of how they behave.

This in a nutshell is what I am dealing with: I need to find the total amount of funds (which are recorded in the contact records) handed out by all the care planners in a given community for a given time period.

Thank you to both of you for your suggestions. So my last JOIN should have just been a WHERE clause and I get the correct amount.

FROM contact_records 
JOIN users 
ON users.id = contact_records.careplanner_id 
JOIN churches 
ON users.geo_id = churches.id 
WHERE churches.community_id = " . $community . " 
AND contact_records.created_at BETWEEN CAST('" . $start_date . "' AS DATE) AND CAST('" . $end_date . "' AS DATE)
   

I’m confused. The example query has “clients” but I’m not seeing how that relates to “sum” which looks to be per “community”.

I don’t know if I would use JOINs but instead nested SELECTS with GROUP BYs.

And instead of going for the whole enchilada first bite, I usually work my way out from simple to final version fixing breakage as I go.

Does this work?

SELECT `contact_records`.`careplanner_id`
  , SUM(`contact_records`.`amount_given`) 
FROM `contact_records` 
WHERE `contact_records`.`created_at` 
  BETWEEN CAST('" . $start_date . "' AS DATE) 
  AND CAST('" . $end_date . "' AS DATE)
GROUP BY `contact_records`.`careplanner_id`

The problem only existed in the second sub-SELECT of the query. The one that generates “Benevolent Funds ($)”. The other part of the query is fine. And now the second part is also working.

This query is generating a monthly report for an Excel spreadsheet. That is why the two parts don’t seem to be related. Sorry for the confusion.

Your suggestion does not work because the sum has to be for a selected community, not grouped by careplanners.

Sorry, I didn’t mean “does the query give you the final results you’re after” but whether or not the query grouped amounts to users/careplanners so that you could then add getting the geo_id’s to get the community_id’s passing and summing the amounts along the way.

I’ll try that (maybe tomorrow, though). Would that be a more efficient way of getting the results?

TBH, I really don’t know. It could be that both a query using joins and a query having subselects have similar if not identical query plans. A way to find out would be to do EXPLAIN followed by a SHOW WARNINGS for both and compare them.

As I understand the rationale, (admittedly a limited understanding) JOINs are best for gathering unrelated fields from multiple tables. For example, if I wanted church.pastor, community.population, user.age then JOINs would be better.

This is interesting. I ran this

CREATE TABLE `test_contact_records` ( `amount_given` DECIMAL(5,2) 
  , `careplanner_id` TINYINT UNSIGNED 
  , `created_at` VARCHAR(10)
  );
INSERT INTO `test_contact_records` 
  SET `amount_given` = 25.00 
  , `careplanner_id` = 1 
  , `created_at` = '2018/01/01';
INSERT INTO `test_contact_records` 
  SET `amount_given` = 50.00 
  , `careplanner_id` = 2 
  , `created_at` = '2018/02/02';
INSERT INTO `test_contact_records` 
  SET `amount_given` = 75.00 
  , `careplanner_id` = 1 
  , `created_at` = '2018/01/01';
INSERT INTO `test_contact_records` 
  SET `amount_given` = 100.00 
  , `careplanner_id` = 2 
  , `created_at` = '2018/02/01';
INSERT INTO `test_contact_records` 
  SET `amount_given` = 10.00 
  , `careplanner_id` = 3 
  , `created_at` = '2018/03/01';
INSERT INTO `test_contact_records` 
  SET `amount_given` = 20.00 
  , `careplanner_id` = 4 
  , `created_at` = '2018/04/02';
INSERT INTO `test_contact_records` 
  SET `amount_given` = 30.00 
  , `careplanner_id` = 3 
  , `created_at` = '2018/03/01';
INSERT INTO `test_contact_records` 
  SET `amount_given` = 40.00 
  , `careplanner_id` = 4 
  , `created_at` = '2018/04/01';
CREATE TABLE `test_users` ( `id` TINYINT UNSIGNED 
  , `geo_id` TINYINT UNSIGNED 
  );
INSERT INTO `test_users` 
  SET `id` = 1
  , `geo_id` = 10; 
INSERT INTO `test_users` 
  SET `id` = 2
  , `geo_id` = 10; 
INSERT INTO `test_users` 
  SET `id` = 3
  , `geo_id` = 20; 
INSERT INTO `test_users` 
  SET `id` = 4
  , `geo_id` = 20; 

SET @start_date = '2017-05-07';
SET @end_date = '2019-05-06';
SELECT `test_contact_records`.`careplanner_id` AS `ci`
    , SUM(`test_contact_records`.`amount_given`) AS `ag`
    FROM `test_contact_records` 
    WHERE `test_contact_records`.`created_at` 
      BETWEEN CAST(@start_date AS DATE) 
      AND CAST(@end_date AS DATE) 
    GROUP BY `test_contact_records`.`careplanner_id`; 

SELECT `test_users`.`geo_id` 
   , SUM(`tcr`.`ag`) 
FROM `test_users`, (SELECT `test_contact_records`.`careplanner_id` AS `ci`
    , SUM(`test_contact_records`.`amount_given`) AS `ag`
    FROM `test_contact_records` 
    WHERE `test_contact_records`.`created_at` 
      BETWEEN CAST(@start_date AS DATE) 
      AND CAST(@end_date AS DATE) 
    GROUP BY `test_contact_records`.`careplanner_id`) AS `tcr` 
WHERE `test_users`.`id` = `tcr`.`ci`
GROUP BY `test_users`.`geo_id`;   

EXPLAIN SELECT `test_users`.`geo_id` 
   , SUM(`tcr`.`ag`) 
FROM `test_users`, (SELECT `test_contact_records`.`careplanner_id` AS `ci`
    , SUM(`test_contact_records`.`amount_given`) AS `ag`
    FROM `test_contact_records` 
    WHERE `test_contact_records`.`created_at` 
      BETWEEN CAST(@start_date AS DATE) 
      AND CAST(@end_date AS DATE) 
    GROUP BY `test_contact_records`.`careplanner_id`) AS `tcr` 
WHERE `test_users`.`id` = `tcr`.`ci`
GROUP BY `test_users`.`geo_id`;
SHOW WARNINGS; 

DROP TABLE `test_contact_records`; 
DROP TABLE `test_users`; 

and the query plan is (“discourse-csv” is the test db I used)

/* select#1 */ 
select `discourse-csv`.`test_users`.`geo_id` AS `geo_id`
  ,sum(`tcr`.`ag`) AS `SUM(``tcr``.``ag``)` 
from `discourse-csv`.`test_users` 
join (/* select#2 */ 
  select `discourse-csv`.`test_contact_records`.`careplanner_id` AS `ci`
    ,sum(`discourse-csv`.`test_contact_records`.`amount_given`) AS `ag` 
  from `discourse-csv`.`test_contact_records` 
  where (`discourse-csv`.`test_contact_records`.`created_at` 
    between <cache>(cast((@`start_date`) as date)) 
    and <cache>(cast((@`end_date`) as date))) 
  group by `discourse-csv`.`test_contact_records`.`careplanner_id`) `tcr` 
  where (`discourse-csv`.`test_users`.`id` = `tcr`.`ci`) 
group by `discourse-csv`.`test_users`.`geo_id`

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.