SQL that mimics Google's "Searches related to" SQL

Mr_Tuition · May 22, 2011, 4:37pm

Since someone helped me on this forum, I thought I would give a little something back. My design might not be the greatest, alternative designs would also be appreciated

If you do a Google search and notice the links at the bottom “Searches related to”, this design try’s to do a similar thing:

Searches related to Design

“searches related to” - relevant search terms:

What my design really does, is return the most common queries that other users were searching (1500 seconds before and after) at the time they searched the same term q

Design:
$q = php to get the users query (make sure you have made this safe)
$ip = php to get the users ip (also, make safe)

Log all queries that users make, log the query term (query), log the users ip address (ip) and log the timestamp (ts). I also chuck away duplicates queried (q) from the same person (same ip) within the same hour (ts > ( NOW() -3600 )


"INSERT INTO searchqueries(query, ts, IP)". 
		" SELECT '".$q."', NOW(), '".$ip."'". 
		" FROM DUAL". 
		" WHERE NOT EXISTS".
		" (SELECT query, ts, IP FROM searchqueries WHERE".
		" (query = '".$q."' AND ts > ( NOW() -3600 ) AND IP = '".$ip."')". 
		" ) LIMIT 1";

Next, when a user searches for terms q, find all the ip addresses that have searched the same q:

::SQL1:: select ip as q_ip, ts as q_ts from searchqueries where query = “.$q.”;

for each of these ips get all of the queries (around the hour at the time the user queried q) for each q_ip (and q_ts)

::SQL2:: select query as relatedqueries from searchqueries where ip = q_ip and (ts > (q_ts - 1500)) and (ts < (q_ts + 1500))

combining the ideas of SQL1and SQL2
::SQL1+2:: select rq.query as relatedqueries, q_ip, q_ts from searchqueries rq, (select ip as q_ip, ts as q_ts from searchqueries where query = “Your Query”) as t where ip = q_ip and (ts > (q_ts - 1500)) and (ts < (q_ts + 1500))

now we need to count all of the relatedqueries, order by the most common then return only the top 11

::SQL3:: SELECT relatedqueries, count(relatedqueries) as c FROM ::SQL1+2:: group by query order by c desc, Limit 0, 11

now combine the queries into a another single SQL statement


SELECT relatedqueries, count( relatedqueries ) AS c
FROM (
		SELECT rq.query AS relatedqueries, q_ip, q_ts
		FROM searchqueries rq, (
			SELECT ip AS q_ip, ts AS q_ts
			FROM searchqueries
			WHERE query = '".$q."'
		) AS t
		WHERE ip = q_ip
		AND (
			ts > ( q_ts -8500 )
		)
		AND (
			ts < ( q_ts +8500 )
		)
	) AS x
GROUP BY relatedqueries
ORDER BY c DESC
LIMIT 0 , 11

You might want to check the returning matches, and chuck away the one that is an exact match of q (unless you want it)
So we now have 2 SQL statements above, 1 to insert the users query, another to find the relevant matches

You will now have a list (potentially up to 11 matches) of “searches related to”, or “searches that other people were searching when also searching this query”. This design should get more accurate the more people use it, the beauty is that you don’t have to put in your own data (people are making the related queries every time they query more than once within the same couple of hours).

Obviously, you can go into the DB and tweak things / add rows to begin with, but I haven’t found much of a need to do this so far, it seems to be working well. Something I might add is a spell checker, and then dont add and any wrongly spelt queries / profanities (but this will be done with server side php code)

r937 · May 22, 2011, 7:17pm

tl;dr

sorry, but that’s how it is on a sunday afternoon

Mr_Tuition · May 22, 2011, 8:21pm

yup, I guess there’s a bit much there for a single post

A quick example is the “Related Search Terms” found on my search page:
rightsfortenants.co.uk/search.php?q=Rats
(not a live link)

All these related links were added by the people searching for Rats (and anything else they may have been searching at that time) without them knowing
The above SQL shows you how to make the “Related Search Terms” section (and the data creates its self from other peoples search queries). Everything you need to know SQL side for this code is in the above post.