Best approach for adding timestamp to return most popular rows

Hi all,

The goal is to query my db_table and return the most popular rows based on the hits count for the last 7 days, 14 days and so on, something I can then easily change if needed.

I need a way to check the hits count over a certain date.

Small example

I have a table called: venues_table with the following columns:

id (int)
name (varchar)
hits (mediumint)
SELECT name 
  FROM venues_table
  ORDER BY hits

Typical row:

id: 2
name: Venue name
hits: 34

How and where do I set the timestamp?

Do I need another table?
What is the best approach for something like this?

Any guidance thanks.

Barry

i would have a table where each row is a hit, along with the timestamp

it’ll be a “tall but narrow” table, and properly indexed, it’ll perform just fine

I think I’ve just realised what you mean :slight_smile:

Ok, so we create a hits table and record every hit for each venue, then, when we want to find out the popular venues, we simply query the table based on the date and count the hits for each name(index).

Is that what you mean, sound correct?

Example

table_hits

id (int)
hits (mediumint)
name (varchar/index)
date (timestamp or date)

Could I use date instead of timestamp, does it matter?

Quick draft

SELECT name 
  FROM hits_table
  WHERE date(?)
  AND name = ?
  ORDER BY hits
  LIMIT 10

Also wondering, if the above is correct…

Say eventually we have over hundred venues, each venue is receiving 100+ hits a day, spread that over a couple of years… very big table thousands of rows.

Will this cause any problems?
Slow queries? Processing power?

not quite

you want this –

[code]CREATE TABLE venues
( id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT
, name VARCHAR(37) NOT NULL
);

CREATE TABLE venue_hits
( venue_id INTEGER NOT NULL
, dtstamp DATETIME NOT NULL
, PRIMARY KEY ( venue_id , dtstamp )
);[/code]

notice what happens if you run a query like this –

SELECT COUNT(*) as hits FROM venue_hits WHERE venue_id = 937 AND dtstamp >= '2016-10-01' AND dtstamp < '2016-11-01'
notice this gets all of October’s hits for that given venue, and specifically includes timestamps right up until midnight of the last day of the month

but more important, because of the way the primary key is defined, it acts as a covering index and the actual table rows are not accessed at all!!

Thanks for example r937

The problem I have is the venues table is already built which already has venue_id(varchar) as the PK.

notice this gets all of October’s hits for that given venue, and specifically includes timestamps right up until midnight of the last day of the month

This makes sense, though the idea is to show the popular 10 venues in a list. I want to show the names of the venues with the most hits for that date, not count hits for only one venue. Unless I have misunderstood?

October
name 1 (20 hits)
name 2 (14 hits)
name 3 (12 hits)
...

I did think we would need to use a JOIN.

but more important, because of the way the primary key is defined, it acts as a covering index and the actual table rows are not accessed at all!!

Just read a little about covering index, first time I have heard of this, sounds good, this will help with the speedy queries :sunglasses:

@computerbarry

This link maybe useful

[quote=“computerbarry, post:5, topic:242540, full:true”]
This makes sense, though the idea is to show the popular 10 venues in a list. I want to show the names of the venues with the most hits for that date, not count hits for only one venue. [/quote]
this “top ten” report by necessity requires counting all hits (for the selected time period) for all venues

you need another index …

ALTER TABLE venue_hits ADD INDEX ( dtstamp , venue_id )

this, too, will be a covering index

As mentioned @r937, the venues table already has venue_id as the PK which is varchar, I’m generally using the name of the venue as the PK in the venues table.

Slightly confused if I create venue_id as (int) in the hits table, they will not match up when I run the query?
One is int and the other is varchar.

What I have done is create:

id (int) | primary key | index
venue_id (varchar 60) | index
dtstamp (datetime) | primary key | index

Is this correct?

And then run a query something like:

SELECT COUNT(*) as hits
  FROM venue_hits
 WHERE venue_id = "name of venue"
   AND dtstamp >= '2016-10-01'
   AND dtstamp  < '2016-11-01'

I can then query the hits table based on the matching venue_id, which is the name of the venue.
Will this work the same r937 and benefit from the covering index?

And why do we have both id and dtstamp as PKs, I thought only one PK was allowed per table?


And thanks for the link @John_Betong will keep this for reference no need to alter things just yet, this is a new table. No data as yet.

Barry

1 Like

[quote=“computerbarry, post:8, topic:242540, full:true”]
As mentioned @r937, the venues table already has venue_id as the PK which is varchar, I’m generally using the name of the venue as the PK in the venues table.[/quote]
not a good idea… this is where you would use an AUTO_INCREMENT integer PK

[quote=“computerbarry, post:8, topic:242540, full:true”]
What I have done is create:

id (int) | primary key | index
venue_id (varchar 60) | index
dtstamp (datetime) | primary key | index

Is this correct?[/quote]
nope… this is where you would ~not~ use a surrogate PK

you want this –

CREATE TABLE venue_hits ( venue_id INTEGER NOT NULL , dtstamp DATETIME NOT NULL , PRIMARY KEY ( venue_id , dtstamp ) );

absolutely true

perhaps you weren’t aware that the PK could be composite, i.e. consisting of more than one column

not a good idea… this is where you would use an AUTO_INCREMENT integer PK

Seemed like a good idea at the time, the venue name will never change and will never be duplicated in the venues table. It was mainly an SEO purpose for url inclusion.

I have a live instance and the venues table has been running for some time with lots of live data running off venue_id which is tied up with multiple queries already.

you want this –
CREATE TABLE venue_hits
( venue_id INTEGER NOT NULL
, dtstamp DATETIME NOT NULL
, PRIMARY KEY ( venue_id , dtstamp )
);

So there is no way to query the hits table if I don’t create a AUTO_INCREMENT integer PK on the venues table?

One thing is certain, I can’t change the name of venue_id or the type inside the venues table.

Could we change the venue_id inside hits table to id? And then add a id onto venues table? (which would mean moving the PK from venue_id onto id)

And just wondering, how do I group the hits if every id is different?
Meaning, how do I associate the venue to the hit if I have no name?

perhaps you weren’t aware that the PK could be composite, i.e. consisting of more than one column

I wasn’t aware, thanks.

Barry

[quote=“computerbarry, post:10, topic:242540, full:true”]
So there is no way to query the hits table if I don’t create a AUTO_INCREMENT integer PK on the venues table?[/quote]
what??!! of course you can query the table!!!

look, you mention a number of things that can’t change… and i’m getting lost

why don’t you post the tables that ABSOLUTELY CANNOT CHANGE and we’ll go from there

Ok the venues table has lots of columns.
The only column we need is the venue_id, everything else is just details, street name etc.

I have no id column with AUTO_INCREMENT the main column is venue_id which is the venu name.

Table: venues_tbl

venue_id | varchar(60) | PRIMARY
hits | smallint(16)
description | varchar(1600)
…

I also have a hits column which I use currently, hence me creating the new hits table to get the extra queries for the different dates.

In short, I need to create the hits table so I can record all the hits for each venue from the venues_tbl which I’ll then query and show a separate list based on most popular hits for that time period, weather it be 7days october etc.

I think what we’ve done so far is good, just the little issue linking the hits to the venue_id above.

Cheers,
Barry

are there any tables which have a FK to the venue_id in the venues_tbl table?

as a trivial aside, you do not need to append _tbl into the name of a table

also, SMALLINT(16) will hold ~exactly~ the same range of numbers as SMALLINT(2) or SMALLINT(937)

are there any tables which have a FK to the venue_id in the venues_tbl table?

FK in my head, I just remember.
Its not a proper FK in DB terms - I have venue_id inside my reviews table which I use and query for comments about the venue.

I know it’s probably not the best db design, a little old. Seems to work ok for small site easy to remember.

as a trivial aside, you do not need to append _tbl into the name of a table

This is a little old, at the time everybody was using it, I thought it was cool, advanced :grimacing:

also, SMALLINT(16) will hold ~exactly~ the same range of numbers as SMALLINT(2) or SMALLINT(937)

I had an issue some time back if I used 2 the third 3 char would not show, thanks for the sharing, am I little rusted ha

I’ve been writing lots of prepared statements which I might use, in the process of switch from mysql to mysqli.

Barry

i would suggest you alter the venues table to add an auto_increment (call it id) and make the VARCHAR venues_id a UNIQUE key

then you can create the hits table as i outlined before, with venue_id as FK to id

Ok, sounds like a plan :slight_smile:

So keep venue_id inside the venues table as PK, but also add unique to it?
Create the id auto_increment, nothing else?

then you can create the hits table as i outlined before, with venue_id as FK to id

This is what I don’t understand.

You say:

CREATE TABLE venue_hits
( venue_id INTEGER NOT NULL 
, dtstamp DATETIME NOT NULL
, PRIMARY KEY ( venue_id , dtstamp )
);

How do I link the hits to the venue table venue_id without the name inside hits?

Barry

  1. do a SHOW CREATE TABLE for venues_tbl

  2. grab the SQL from that, copy it into a text editor

  3. change it to CREATE TABLE venues_new

  4. remove the PK declaration

  5. add id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT as the first column

  6. add a UNIQUE index on venue_id

you should now have something like this –

CREATE TABLE venues_new ( id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT , venue_id VARCHAR(60) NOT NULL , hits SMALLINT , description VARCHAR(1600) NOT NULL , other_columns WHATEVER , UNIQUE INDEX youneek ( venue_id ) );

now load the new table from your existing table, specifying every existing column name –

INSERT INTO venues_new ( venue_id , hits , description , other_columns ) SELECT venue_id , hits , description , other_columns FROM venues_tbl

after this is done, browse the data inside your new table to assure yourself it is exactly the same as your existing table, except it has an auto_increment id

when you’re happy, run this –

[code]DROP TABLE venues_tbl;

RENAME TABLE venues_new TO venues_tbl;
[/code]

you can now proceed to create the venue_hits table as previously described

Thanks a lot!

I’ll give this a try, just two questions:

, UNIQUE INDEX youneek ( venue_id )

What is youneek?

And back to the hits table, just to clarify, want to make sure I get this right:

CREATE TABLE venue_hits
( venue_id INTEGER NOT NULL 
, dtstamp DATETIME NOT NULL
, PRIMARY KEY INDEX ( venue_id , dtstamp )
);

Where do I add the venue_id name, else how do I know which hits relate to what venue from the venues table?
I’ve also added the INDEX above as you mentioned in the alter table snippet, is this correct?

Barry

it’s the name of the index (useful if you want to drop it)

you don’t

with a join

SELECT v.venue_id AS venue_name
     , h.hits
  FROM venues_new AS v
LEFT OUTER
  JOIN venue_hits AS h
    ON h.venue_id = v.id

look at the ON condition until you understand :slight_smile:

Hi r937

I’ll continue with this tomorrow big task and late now :sleeping:
Will report back once I have set things up with outcome if your about

And thanks for your time!

Speak soon,
Barry