Any ideas how to find some duplicates?

I have an issue with a very complex query. The database is Informix but don’t worry about that.

This query is the result of a prepared procedure which needs 8 tables to get to this point.

This is just a small extract of the result, the problematic part.ñ


sessionid	startdatetime		enddatetime		talk		ring		queue
30000034352	11/12/2014 8:34		11/12/2014 8:34		0		0		17
30000034352	11/12/2014 8:34		11/12/2014 8:34		0		0		17
30000034352	11/12/2014 8:34		11/12/2014 8:34		0		0		17
30000034352	11/12/2014 8:34		11/12/2014 8:34		0		0		17
30000034352	11/12/2014 8:34		11/12/2014 8:34		0		0		17
30000034352	11/12/2014 8:34		11/12/2014 8:34		0		0		17
30000034352	11/12/2014 8:34		11/12/2014 8:34		0		0		17
30000034352	11/12/2014 8:34		11/12/2014 8:34		0		0		17
30000034352	11/12/2014 8:34		11/12/2014 8:34		0		0		17
30000034352	11/12/2014 8:34		11/12/2014 8:34		0		0		17
30000034352	11/12/2014 8:34		11/12/2014 8:34		0		0		17
30000034352	11/12/2014 8:34		11/12/2014 8:34		0		0		17
30000034352	11/12/2014 8:34		11/12/2014 8:34		0		1		17

As you can see all the rows are exactly the same except for the last one which has a ringtime of 1 sec. In this particular case. You would think that adding a condition of ringtime > 0 or using a MAX(ringtime) would do the trick and I would get rid of all those rows which worth nothing for me.

My problem is that I can’t do that because there are some records where the ringtime is 0 but it should be counted.

So with a DISTINCT I can, at least, reduce the problem… but I can’t get rid of it… Any ideas?

[quote=“molona, post:1, topic:115193”]… there are some records where the ringtime is 0 but it should be counted.[/quote]and which ones are those?

I don’t have any examples in this particular moth but I know there are some in some other months. I can post an example tomorrow, when I’m back at the office

Basically, the software that registers the calls in the database needs one second to do so. So a ringtime of 3 secs is, in real life, 4 secs.
Also, the software rounds the number down to the lower integer. So a ringtime of 1,8 secs is recorded as 1 sec only.
And some calls where the caller hang up very quickly (like when he realized immediately that he dialed the wrong number) show with a ringtime of 0.
The only way to know is because there are no two entries of that call.

A duplicate, on the other hand, can have a ringtime of 0 or 12 seconds. You know that it is a duplicate because all the information is exactly the same, including the id of the agent who answered the call (info that, obviously, I haven’t posted here). From all the duplicates, only the real one will have a difference and, if it was answered, a talktime.

wow… i’m not sure i’m going to be able to help

That’s my feeling too. That’s why I asked for help. :smiley:

It would be easier to do a query to find those duplicates and then delete them but… I can’t. I only have a read-only access to the database anyway. Which is just as good… I would delete the whole thing! lol

But it is OK. This whole thing is complicated and a piece of s…

Please do the following…

1.) Type all relevant Column Names in an easier to read format - give us a break here!!

Use the [ code ] tags to make it easier to read…

id
first_name
last_name
address
city

2.) Write the Business Requirements in bullet points

  • The system shall register phone calls
  • It takes the system 1 second to log a call
  • A ring time of X seconds is actually X + 1 seconds in real life
  • and so on…

Clearly and succinctly spell out what your system does and what you need.

Do that and I’m sure we can help.

Not sure if I understood what you want the result to be exactly. I haven’t used Informix tho, but could something like this work for you?

SELECT * FROM test_table WHERE ringtime = (SELECT MAX(ringtime) FROM test_table) GROUP BY ringtime

And you would use your current result set in the place of “test_table”.

Thanks @TeNDoLLA

Informix is very special and it doesn’t work well with modern standard SQL.

My problem is that if I use max(ringtime), calls with a ringtime of 0 will not be counted. And there are some valid calls that do have ringtime=0.

Basically, because the system does round the number down and there’s also a small, tiny delay till the software logs the info. So a call the ringed for 0.8 sec will be shown with a ringtime of 0.

@mikey_w I’m sorry about the format. Discourse and tables don’t go well. I’ll try to reformat it and see if it can look easier to read. Yet, all the columns are in one line, at the top.

  1. What you mean with “ring time of 0 will not be counted” if you use the MAX()? Why not? It should return a result still if the max is 0 then it is the max. Or is this something Informix specific that 0 can not be max value or something?

  2. Lets imagine all the calls has ringtime of 0, what makes a call valid in that scenario? What is a valid call?

  3. Is the result you are looking for to get out the valid call with highest ringtime from that result set you described in the first post? Or all the valid calls? Something else?

I tried to fomat the table better, the only think I could do is to reduce the number of fields, so I guess now it looks a bit tidier.

That’s the question :smiley:

This query controls every call made to the call center. A called can be answered or can be ignored (well, probably ignored is not the best term. Basically, a unaswered)

Sometimes, the software does weird things, and records the same call several times, like the example I posted above. In this particular case, the real call has a ringtime (I called it “ring” to make it shorter to fit in the format) of 1.

But this is not always the case.

Sometimes, the software records this data when the call is transferred to another agent. Then ring would have a value of 12 seconds. Those only cases where I can have a duplicate.

Some calls are left unanswered but have a ring of 0. Still valid though. I would say that those calls do not have any other records with the same sessionid (let’s use this field as ID, for simplicity)

I tried to use MAX() before and didn’t work. I’ll try again, just in case I was wrong. Defintively, ring > 0 will not work because sometimes ring=0

Sounds like you have a race condition issue in the application if it log duplicate records.

Does the image you posted contain the whole table?

If, what is the primary/composite key on the table? Does it have one?

While it is not exactly what you asked for, the task you have could most probably easily be handled by the use of a temporary table, and then invalidate records, though not sure if you have the rights to create one?

[quote=“TheRedDevil, post:11, topic:115193”]
Sounds like you have a race condition issue in the application if it log duplicate records.
[/quote] If I do, I can’t do anything to avoid it

[quote=“TheRedDevil, post:11, topic:115193”]
If, what is the primary/composite key on the table? Does it have one?
[/quote]This result is the combination of 5-6 tables, that I can recall. I’m trying to get it down to something mroe manageable because I only need to count records.
I can show you the query but it is kind of complicated :smiley:
Most tables use a composite key made by four fields: sessionid, sequencenumid, nodeid, profileid
One table has a composite key with only two fields. I’ll try to draw the schema and post it

No. A regular month can have between 5,000 to 10,000 records.

[quote=“TheRedDevil, post:11, topic:115193”]
the task you have could most probably easily be handled by the use of a temporary table
[/quote]I thought about it too. But I still need to figure out how to program this thing so duplicates don’t show up.

Although my user is read-only, I can create a temp table. The temp table can be available while I’m connected. If the connections breaks, then the temp table will be deleted.

I cannot create stored procedures but maybe, if I come up with a solution, I can ask someone to do it for me… although I’m not sure who’s the database administrator. In this company everything is Oracle or SQL Server, this is the only Informix database and it is only here because it was installed by the Call Center Software.

This is to be implemented in an automated system that will create reports on the fly

It would most probably be easier if you posted the queries, as right now we are looking at a tiny part of a larger puzzle.
If you know, it would be great if you could also write up the relationship between the tables.

With this information it will be much easier to take a look and perhaps help you with the problem

Not thinking about a query yet, but I’m trying to think of what could be used to distinguish what you want to get from what you want to not get.

I’m unsure what fields are in play or even how many tables are involved so this is a guess.

  • you are interested in getting duplicate caller records
  • all duplicate rows will have different ringtimes

Same originatorid ?
Non-overlapping startdatetime and enddatetime ?
Always have different angentid ?
Sequential sequencenumid ?

You asked for it… but, if it is too much asking, then don’t worry… I won’t sleep in month but somehow I’ll get.

My goals are two:

Remove the duplicates, if possible. Else I’ll ask my boss to allow a 2-10 record difference per month, and see what he thinks.

Reduce this query to give the same results when using COUNT. Basically, I need to give the number of total calls answered and unanswered (the field contanctdisposition indicates if the call has been answered or not)

After that, I need to do a second query with the average time for ringtime, talktime, etc. But that’s something else


    SELECT  DISTINCT       
         fr.sessionid, 	
         fr.sessionseqnum, 	
         contactdisposition,	
         callednumber, 
         acdr.talktime, 
         acdr.ringtime, 	
         acdr.holdtime, 
         acdr.worktime,	
         fr.profileid, 
         fr.nodeid, 
         Nvl(fr.qindex, acdr.qindex) qindex, 	
         fr.queuetime,	
         csq_names
    FROM 
        (SELECT
             fr1.sessionid, 	
             fr1.sessionseqnum, 	
             fr1.profileid, 	
             fr1.nodeid, 	
            contactdisposition, 	
            callednumber, 	
            crdr.qindex,	
           crdr.queuetime
       FROM 
                   (SELECT 	
                        sessionid, 		
                        sessionseqnum,
                	 contactdisposition,         
               		callednumber,
         		origcallednumber, 
         		profileid,
        		nodeid
                  FROM contactcalldetail         
             	WHERE
         		startdatetime between '2013-12-1 00:00:00' and '2013-12-31 23:59:59' AND
                        callednumber='6666' AND         
                        DATEDIFF('ss',startdatetime, enddatetime) >=10 ) fr1,  
        	(SELECT
                       crd.qindex,
         		crd.sessionid,
         		crd.sessionseqnum,
        		crd.profileid,
         		cqdr.targetid,
        		crd.nodeid,
         		cqdr.queuetime
        	FROM  contactroutingdetail crd, contactqueuedetail cqdr, 
               (SELECT 
                       c.recordid  AS recordid 
                   FROM contactservicequeue c 
                WHERE c.csqname like 'CSQ_CC%')  t 
	WHERE 
                cqdr.sessionid = crd.sessionid AND 
                cqdr.sessionseqnum = crd.sessionseqnum AND
 		cqdr.qindex = crd.qindex AND
		cqdr.nodeid = crd.nodeid AND
		cqdr.profileid = crd.profileid AND
		cqdr.targetid = t.recordid) as crdr 
   WHERE 
         fr1.sessionid = crdr.sessionid AND 
         fr1.sessionseqnum = crdr.sessionseqnum AND 
        fr1.profileid = crdr.profileid  AND
	fr1.nodeid = crdr.nodeid ) fr, 
OUTER (agentconnectiondetail acdr, 
             (SELECT r.resourceid, r.resourcename, r.profileid 
              FROM resource r) sa)
 WHERE fr.sessionid = acdr.sessionid AND 
     fr.sessionseqnum = acdr.sessionseqnum AND 
     fr.profileid = acdr.profileid AND 
     fr.nodeid = acdr.nodeid AND 
     (Nvl(fr.qindex, acdr.qindex) = acdr.qindex or acdr.qindex <= 0) AND 
    acdr.resourceid = sa.resourceid AND
     acdr.profileid = sa.profileid

I tried to format it as clearly as possible. But I guess I still need to control tabs here :slight_smile:

OMG that’s way too complex for my abilities. If anyone can read it and see what it does my bet is on @r937

Do you see why it is driving me crazy? And informix doesn’t like INNER JOINS, you have to use the old style way… placing the conditions in the WHERE

1 Like

nope, sorry… not going near it

You’re a my hero! It is an intelligent decision… It doesn’t help me much but I fully understand it :smiley:

@molona what database server is being used (if it’s MySQL, are you using InnoDB?)

If the server-side language is PHP and the server is MySQL using the InnoDB engine, you might want to look into the use of PDO and the use of PDO’s transactions.

Can you please post the output of SHOW CREATE TABLE queries for each table ( @r937 would that be the same query to get the table structure no matter what database server @molona is using) and can you also post some sample data for each table?