Prepared Statement with multiple IDs

There have been a few threads lately which have had pdo queries or prepared statements within loops, which is obviously a silly idea and makes no sense. Prepare outside the loop, then execute within the loop.
However, I have a script where I am preparing a statement within a while loop, to be executed later in further conditions within the loop. I know it’s wrong, but don’t know how to prepare it before the loop in this case.
I think it’s some confusion on my part about what can/cannot be put in a prepared statement as a blind or named param.
It is a query that has a WHERE IN clause and comma separated list which will alter for each loop of the while. The end result looking something like this:-

"SELECT id, Name, Email FROM People WHERE id IN ('1, 3, 6, 10')"

To put this in more context, the script is part of a task management system. This is a small part of the code:-

$sql = $db->query("SELECT * FROM TaskList WHERE Notifcation != '4' AND Status != 'c' ");
while($row = $sql->fetch()){
   $assigned = unserialize ($row['AssignedTo']); // Array of people assigned (IDs)
   // also get other task info... not relevant here
   $alist = implode("', '", $assigned) ; // create CSV of assigned people's IDs
   $qstr = "SELECT id, Name, Email FROM People WHERE id IN ('".$alist."')" ; // create query string
   $sqlass = $db->prepare($qstr); // prepare the statement
   // A load more script that uses this to do stuff
} // end while

It seems I can’t set the CSV part, (the only bit that will change) as a param outside of the while loop, then define it’s value within the loop, as I would prefer to do. When I tried, it does not work.
Is it possible? If so, how?

No, databases generally have no support for passing an array of values to a prepared statement, nor has PDO such a feature. You have to basically do what you did by imploding a list of values into the query (but first sanitizing the values!). Some database libraries like Doctrine DBAL can emulate this feature.

Anyway, storing a serialized list of assigned IDs in a database column is very bad practice. You should have a separate table with assigned IDs of people to the tasks and keep the ID pairs there. In this way you gain some advantages:

  1. Instead of the code above you can use a single query with a JOIN to get a list of assigned people - shorter code and much faster - no need for loops, passing of IDs, etc. - the database does all the work for you.

  2. By using foreign keys you are sure that the assigned IDs are always correct (existing in the database).

  3. You data become universal - at the moment you can query your data only with a language that supports unserialize(), which is probably only PHP. With a related table you can draw the data with SQL, which works in any language.

I don’t know if this will help. I purposely created a poorly designed database

create table people (name varchar(10), skillset varchar(20), primary key (name)); 
create table skills (id tinyint unsigned, skill varchar(10), primary key (id)); 

insert into people (name, skillset) values ('Tom', '1,3,5,7');
insert into people (name, skillset) values ('Dick', '1,2,3');
insert into people (name, skillset) values ('Harry', '7,8');
insert into people (name, skillset) values ('Jack', '1,2,3,4');
insert into people (name, skillset) values ('Jill', '4,5,6');
insert into people (name, skillset) values ('newbie', '1');
insert into people (name, skillset) values ('guru', '1,2,3,4,5,6,7,8');

insert into skills (id, skill) values (1, 'HTML');
insert into skills (id, skill) values (2, 'CSS');
insert into skills (id, skill) values (3, 'JavaScript');
insert into skills (id, skill) values (4, 'PHP');
insert into skills (id, skill) values (5, 'SQL');
insert into skills (id, skill) values (6, 'REGEX');
insert into skills (id, skill) values (7, 'WordPress');
insert into skills (id, skill) values (8, 'SEO');
mysql> select skills.skill from skills
    ->  inner join people
    ->  on find_in_set(skills.id, people.skillset)
    ->  where people.name like 'Jill';
+-------+
| skill |
+-------+
| PHP   |
| SQL   |
| REGEX |
+-------+
3 rows in set (0.04 sec)

mysql> select people.name from people
    ->  inner join skills
    ->  on find_in_set(skills.id, people.skillset)
    ->  where skills.skill like 'HTML';
+--------+
| name   |
+--------+
| Tom    |
| Dick   |
| Jack   |
| newbie |
| guru   |
+--------+
5 rows in set (0.00 sec)

Almost right.

$sqlass = $db->prepare("SELECT id, Name, Email FROM People WHERE id IN (?)");
$sqlass->execute([$alist]);

everything down to the implode statement stays the same.

Each value in the array should be a separate placeholder.

An IN clause expects a comma separated list, not an array - therefore there is only one placeholder as the comma separated list is considered a single value in SQL.

There are no arrays in SQL.

You can have a comma separated list of placeholders. Not to mention if each of those values were strings using a single placeholder would defeat the purpose of variable binding.

Why would it defeat the purpose of variable binding - the purpose of variable binding is to be able to prepare an SQL call once and use it multiple times. There is nothing in the number of placeholders you use for an IN clause that has any impact on this.

Or are you under the mistaken impression that the side effect of keeping the data and code separate is the purpose of prepare rather than just a side effect?

This is wrong. You cannot pass a list of values to IN() with a single placeholder. $alist is a comma separated string like ‘2,4,5’ - this code will not produce an error but a string to integer cast will happen, which in effect will only fetch one row (with id=2 in case of list ‘2,4,5’).

@oddz oddz is right, you have to use multiple placeholders like this:

$sqlass = $db->prepare("SELECT id, Name, Email FROM People WHERE id IN (?,?,?)");
$sqlass->execute([$assigned[0], $assigned[1], $assigned[2]]);

I tried this, and it nearly works. I made a much simpler script for testing purposes, as the real one is a cron job with no visible html. This version (should) just lists the tasks with with a list of assigned people.
But, it is only listing the first ID in the list. I thought it may be to do with quotes, so tried adding the quotes to the beginning and end of the list and removing them altogether, but none worked.

    $sqlass = $db->prepare("SELECT id, Name, Email FROM People WHERE id IN (?)");
    
    $sql = $db->query("SELECT * FROM TaskList WHERE Notify != '4' AND Status != 'c' ");

    echo "<h1>All Tasks</h1>";
    while($row = $sql->fetch()){
    $title = $row["Title"];
    $people = unserialize ($row['AssignedTo']);
    $alist = implode("', '", $people) ;
    //$alist = "'".implode(", ", $people)."'" ;
    //$alist = implode(", ", $people) ;

        echo "<hr>\n<h2>".$title."</h2>\n<p>Assigned To:</p>\n<ul>\n";
    $sqlass->execute([$alist]);
    while($row = $sqlass->fetch()) {
        echo "<li>".$row['Name']."</li>\n";
    }
    echo "</ul>\n" ;
    } // End while row for all tasks

That explains the above.

OK, but I don’t know how many are in the list, until I execute the first query.

You have to construct each query separately with the desired number of question marks, for example:

$sqlass = $db->prepare("SELECT id, Name, Email FROM People WHERE id IN ("
   . rtrim(str_repeat('?,', count($people)), ',')
   . ")");
$sqlass->execute([$people]);

So I’m still re-preparing the statement for each task within the while loop, as before.
Which means the answer is: I can’t prepare the statement before the loop, unless I use Doctrine, as per your first reply.
Well, it’s not a big deal really, the script “works”, but I just thought I may be missing a trick, whereby it could be streamlined a bit by preparing the statement only once.

Correct, you can’t prepare such a statement once for a loop. Even Doctrine doesn’t do this - it only emulates the preparing process so it appears like one but under the hood it detects the question marks in your SQL and constructs the final query each time - and obviously this comes at some performance cost.

1 Like

So there is no real gain there. Thanks for clearing that up.

The only gain is in a nicer way to write your SQL as if it really supported arrays as a parameter. Nowadays, the performance gain from preparing an SQL statement is so small that it is inconsequential in most cases - at least for such simple MySQL queries because the process of parsing a query is extremely fast in MySQL and whatever speed you gain by preparing a statement it is so small that it’s hard to measure. It’s just that prepared statements appear to be more elegant to many programmers.

Anyway, the issue of preparing the statement or not is a pretty minor one in your case - why don’t you acknowledge the elephant in the room and restructure you database accordingly? There is the real speed gain for you and real flexibility gain.

I think it’s a case of not being aware of an elephant. How should I structure the database?

You are right that separate placeholders (defeating the purpose of using prepare in the first place) would be required were a situation to arise where the values to use in the IN clause needed placeholders.

Of course since with a properly designed database the clause only ever gets used where the list of values is already known and can be hardcoded it is easy to forget that you’d need separate placeholders.

That what I mentioned in my first reply here.

Something like this:

CREATE TABLE `people` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(50) NOT NULL,
  `email` varchar(100) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `tasklist` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(200) NOT NULL,
  `notification` smallint(6) NOT NULL,
  `status` char(1) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `tasklist_people` (
  `tasklist_id` int(10) unsigned NOT NULL,
  `people_id` int(10) unsigned NOT NULL,
  PRIMARY KEY (`tasklist_id`,`people_id`),
  KEY `FK2_people` (`people_id`),
  CONSTRAINT `FK1_tasklist` FOREIGN KEY (`tasklist_id`) REFERENCES `tasklist` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
  CONSTRAINT `FK2_people` FOREIGN KEY (`people_id`) REFERENCES `people` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

The new table in this design is tasklist_people - there you will add IDs of tasks and people who are assigned with them - each in a separate row. The database will not allow you to insert invalid IDs and also it will not allow you to assign the same person to the same task multiple times (because of the primary key on tasklist_id and people_id).

After you fill in your data you can query for people assigned to tasks filtered by your criteria:

$result = $db->query("SELECT p.id, p.name, p.email FROM people p

  INNER JOIN tasklist_people tlp
  ON tlp.people_id=p.id
  
  INNER JOIN tasklist tl
  ON tl.id=tlp.tasklist_id
  
  WHERE tl.notification != '4' AND tl.status != 'c'");

$people = $result->fetchAll(PDO::FETCH_ASSOC);

And this is all you need instead of the code in your first post - no need for loops or passing lists of IDs. MySQL is a relational database and storing data related to other data with the use of foreign keys is one of its main features. If you store serialized lists of related IDs in a text column instead of having separate related tables with associations then you are giving up on all the advantages that the database is giving you - and querying for various data becomes complicated and slower. This can look difficult at first but once you learn about creating related tables and foreign keys you will see how much more elegant and powerful they are and you will never look back!

3 Likes

Up to now, I have only used DBs/tables in a very basic form. For the most part, my tables are stand-alone sets of data that don’t relate to one another. So I have yet to learn about how to make tables relate.
I have just a couple of instances such as this where the tables are related in some way.
So it is something I will want to look into and learn when I find the time.
I will probably want to test it on some dummy scripts and tables, as changing this will have a knock-on effect on a number of scripts in the system, those that populate, edit and display the task data will all have to change.