What's an alternative to MySQL's "ORDER BY RAND()" where limit > 1?

As many know, MySQL’s ORDER BY RAND() does HUGE load to the server when dealing with huge databases. I’ve read many alternatives, but they all seem to only consider getting one random row. What would I do if I wanted, say, 100 random rows?

Here is one alternative code for 1 row that I found:

// what NOT to do:  
$r = mysql_query("SELECT username FROM user ORDER BY RAND() LIMIT 1");  
  
// much better:  
  
$r = mysql_query("SELECT count(*) FROM user");  
$d = mysql_fetch_row($r);  
$rand = mt_rand(0,$d[0] - 1);  
  
$r = mysql_query("SELECT username FROM user LIMIT $rand, 1");  

If I do LIMIT 100, the 100 rows wouldn’t be random anymore; they’d just have an offset, as far as I can see. Anyone have any ideas on ways to do this, that will go easy on the server?

Thanks a lot.

You could always fetch the rows in order, load them into an array and randomize that after the SQL is done using shuffle().

So out of 100,000 rows, I’d still only be getting the same 100 rows in a row? That still doesn’t quite do the trick.

I see now. What you could do is generate a list of random numbers between 1 and the number of rows with PHP’s random function, and use that list to generate one SQL query to get those rows.

Hmm… how would I go about doing that with one SQL query?

assign an ID field to the table, and use the IN(num1,num2…numn) syntax…

This code will generate a query for you using the SQL’s IN() syntax, as suggested by StarLion.

All you have to do is get your maximum, and the actual number of rows you want and plug it in. You may even wish to make a function out of it.

<?php

$randArray = array();

while(count($randArray) <= 100)
{
	$randArray[rand(1, 100000)] = true;
}

$randArray = array_keys($randArray);
sort($randArray);
$sql = "SELECT * FROM table WHERE id IN(";

$firstRow = true;
foreach($randArray as $randRow){
	if(!$firstRow) $sql .= ', ';
	$sql .= $randRow;
	$firstRow = false;
}
$sql .= ')';

echo $sql;

This is a fantastic solution, thanks!

To take it further, would it be possible to only use IDs that exists? Meaning, if there are holes in the ID count, I would only want to choose IDs that exist. Is there a way to do this that would take less resources than ORDER BY RAND()?