Threading comments

I’m making a simple script for posting comments on a page. It’s a rerereinvention of the wheel, I know, but it’s an exercise in learning more PHP and MySQL for me.

I’ve figured it all out now, all but how to thread the comments to cater for people replying to each other. Like Wordpress 2.7+ does.

The way I see it, there are a few possible solutions:

  1. Get all the comments from the DB and loop through them, populating a new array based on the order the comments should be output in, including some information on which comment the parent is, the depth it is at, etc.
  2. Create an HTML fragment using PHP’s DOM capabilities and build the comments thread like that, inserting a new DOM element after the parent comment where needed.
  3. Some kind of MySQL black magic.

In the database there would simply be an INT column indicating the ID of the parent comment.

Any constructive criticism and advice on this is very much appreciated.

Would the parent id column be for the id of the previous comment in a thread or for the initial post/comment of the thread?

I personally would have the parent id for a comment indicating the first comment or the thread it belongs to and selecting the comments in order which belong to that thread.

It would be for the comment that the person is replying to, i.e. the one whose “reply” button was clicked.

The database holds all the comments in chronological order, but that’s not how they are to be displayed in the HTML. I suppose another alternative is to scrap auto-incrementing IDs and instead insert the comment into the table immediately after the comment being replied to (taking previous replies to that comment into account) and have a column indicating the depth. Seems a bit messy though.

Threaded comments are always fun, both from the backend database design part and end-user interface part. I have my own solution for 2-way switched view list/thread comments, but it need through refactoring as I learned many things since it was invented.

Speaking of technology, I can’t see much difference between [1] and [2], I take it as just 2 parts of one process, unless you’re talking of [2] as of storing all comments in one solid HTML block, which is just unacceptable.

It is good enough solution as long as we have considerable small amount of comments, say, some dozens of them, or even a few hundreds. But then it goes for thousands or even less but you want to paginate output, there is the only one way - [3].
Getting all comments already in the proper order with one query - so, you can filter it, order it, limit it in any way using SQL features - isn’t it a Holy grail to pursue for?

MySQL has good article for the beginners in the black magic,
http://dev.mysql.com/tech-resources/articles/hierarchical-data.html
But personally I’d go for the thing called Materialized Path, because it’s fairly plain and understandable on one hand, and quite fast on other.
It doesn’t covered in the article above, but it is as simple, as storing the “path” to each node, say, child of node 1 would have path 1, and it’s own child would have path 1,2 etc

I always go with just storing the parent ID - your original idea. Then you need a recursive function to echo it all out because the comments might be nested to n-level depth.

Although yes, some mysql black magic would be the ideal solution if you can :slight_smile:

I would also go for ParentId and recursive function:
I had done something like this before. I don’t want to say that this is the efficient way. I would like to have some comments on this way too.

The table strcture is something like this:


CREATE TABLE `comments` (
	`id` INT(10) NOT NULL AUTO_INCREMENT,
	`parent_id` INT(10) NOT NULL DEFAULT '0',
	`comment` TEXT NOT NULL,
	`date_added` DATETIME NOT NULL,
	`status` SMALLINT(4) NOT NULL DEFAULT '1',
	PRIMARY KEY  (`id`)
);

And data will be stored something like this:


insert  into `comments`(`id`,`parent_id`,`comment`,`date_added`,`status`) values 
(1,0,'Comment 1','2010-02-15 08:40:45',1),
(2,0,'Comment 2','2010-02-15 08:41:15',1),
(3,0,'Comment 3','2010-02-15 08:42:05',1),
(4,1,'Reply 1','2010-02-15 08:44:05',1),
(5,4,'Reply 1.1','2010-02-15 08:40:55',1),
(6,4,'Reply 1.2','2010-02-15 09:10:55',1),
(7,6,'Reply 1.2.1','2010-02-15 10:20:25',1),
(8,6,'Reply 1.2.2','2010-02-15 11:30:15',1),
(9,2,'Reply 2','2010-02-14 11:00:00',1),
(10,9,'Reply 2.1','2010-02-14 10:00:00',1),
(11,9,'Reply 2.2','2010-02-14 12:00:00',1),
(12,10,'Reply 2.2.1','2010-02-14 13:30:00',1),
(13,10,'Reply 2.2.2','2010-02-14 14:00:00',1),
(14,3,'Reply 3','2010-02-13 07:00:00',1),
(15,14,'Reply 3.1','2010-02-13 07:30:00',1),
(16,14,'Reply 3.2','2010-02-13 08:00:00',1),
(17,15,'Reply 3.1.1','2010-02-13 08:20:00',1),
(18,16,'Reply 3.2.1','2010-02-13 08:40:00',1);

Now my PHP script is as below.


mysql_connect("localhost", "root", "raju") or die(mysql_error());
mysql_select_db("test") or die(mysql_error());
$data = array();
$sql = "SELECT * FROM comments WHERE status=1 ORDER BY date_added DESC";
$result = mysql_query($sql) or die(mysql_error());
while($row = mysql_fetch_object($result)){
    $parent = !empty($row->parent_id) ? $row->parent_id : false;
    if($parent)
        $data["parent_" . $parent][] = $row;
    else
        $data["main"][] = $row;
}

foreach($data["main"] as $value){
    echo $value->comment . '<br />';
    if(count($data["parent_" . $value->id]) >= 1){
        showReplies($data, $value->id);
    }
}
function showReplies($data, $id, $counter = ''){
    if(count($data["parent_" . $id]) >= 1){
        $counter .= ' --';
        foreach($data["parent_" . $id] as $value){
            echo $counter . ' ' . $value->comment . '<br />';
            showReplies($data, $value->id, $counter);
        }
    }
}

Thanks for the replies. I hadn’t thought of pagination, Shrapnel, so thanks for pointing it out. It opens a whole new can of worms:

Suppose the pagination limit is 100 comments. 150 people reply to the first commenter (he wrote something very controversial). What do you do then? Start the next page with the same first commenter and continue with the replies there? Or is the limit to 100 first comments (depth = 0)?

Digg seems to have a solution in having a “viewing threshold” system, and their system is pretty complicated. I don’t want to go down a road like that.

Shrapnel, that black magic may be for beginners, but it’s pretty advanced, especially when you have to consider that the depth of comments is potentially infinite.

Thanks for the function, Raju. I’d thought of the recursive function, just not how to do it yet!

What do you do then?

I just limit them.
Next page begins from the middle of the thread

I’d thought of the recursive function

I’d make it slightly different. I use recursive function to “unwrap” hierarchal data to the plain list that can be used in the template.

Yes, that was what I was thinking of, rather than echoing out directly like Raju’s function does.

I use a templating system that can handle recursion so I have a recursive function that uses the template, but doesn’t output HTML directly.

Is it that much tough to manage recursive in my above function? Is the following way is too bad for templating system?


mysql_connect("localhost", "root", "raju") or die(mysql_error());
mysql_select_db("test") or die(mysql_error());
$data = array();
$res = array();
$sql = "SELECT * FROM comments WHERE status=1 ORDER BY date_added DESC";
$result = mysql_query($sql) or die(mysql_error());
while($row = mysql_fetch_object($result)){
    $parent = !empty($row->parent_id) ? $row->parent_id : false;
    if($parent)
        $data["parent_" . $parent][] = $row;
    else
        $data["main"][] = $row;
}

foreach($data["main"] as $value){
    $res[] = $value->comment;
    if(count($data["parent_" . $value->id]) >= 1){
        showReplies($data, $value->id);
    }
}

function showReplies($data, $id, $counter = ''){
    global $res;
    if(count($data["parent_" . $id]) >= 1){
        $counter .= ' --';
        foreach($data["parent_" . $id] as $value){
            $res[] = $counter . ' ' . $value->comment;
            showReplies($data, $value->id, $counter);
        }
    }
}
echo implode("<br />", $res);

The threaded comments system is a very very very complex issue. There are 2 distincly different approaches to implementing this: one is to re-arrange the data (update the value of ‘level’) for all existing messages when a new reply is added.
second solution is to not update existing messages, only include the parent-id for a new reply, then create the threaded view during the display of messages.

The third solution, the one that has been adapted by giants like youtube and facebook is to limit the replies to only allow a reply to the top level comments. This is sort of like the best of both worlds - easy during the insert, no need to update records for existing messages and for a display of messages there is only one simple SQL that selects messages and their replies (if there are any)

It also make pagination easy because you paginate by the comments (top-level only)

I thinks it’s no accident that Youtube and facebook adapted this approach.
The true threaded replies with unlimited levels is just too complicated and almost always should be used with some sort of server caching.

I also decided to go with this approach in all my current and future comments scripts - only allow a reply to a first level comment, and a reply to a second level (reply to a reply) will be also be allowed but will be treated as a reply to the parent comment