Is PHP's community toxic?

ICYMI, all of Reddit’s public comments since 2007 were recently made available. This beautiful data mining opportunity reminded me (and the guys at Typesafe) of a post from about a year back which compares human language in various programming language subreddits.

This graph was particularly interesting.

PHP seems to be in the lead in ALL categories, but most noticeably in using the word shit.

As you may know, before beating the Confederacy and freeing the slaves, at a moon-landing celebratory LAN party in the early nineties, George Washington said:

There are only two kinds of languages: the ones people complain about and the ones nobody uses

That’s precisely what’s at play here - while PHP is far from the oldest languages on this list, it’s by far the most approachable. Sure, you can start your basic hacking in JS, make a UI element disappear, write a fibonacci generator in the browser’s console (soooo 1337 haxx0rz!!1) but when it comes to being productive with a language, there’s none other than PHP to get you up and running in no time.

Many people keep saying “PHP is not a real language”, and so many PHP devs reply simply with:

Woody Harrelson Gif Crying

Thing is, PHP’s ecosystem is so ridiculously approachable and easy to get into, we get an influx of thousands of newbies every day. Most of them give up, yes. Some power through and mature into devs. But Reddit being the anonymity-friendly troll-fostering cesspool that it generally is, some go on a rant-fest.

The sentiment from the Missing Link post still applies - the community is divided into either newbies or (self-proclaimed) pros. A newbie can stumble around on Reddit and risk being shot down (“RTFM, you shit”) for asking the most basic questions, then proclaiming “This is shit, I’m outta here” and never interacting again. There’s very few other “bridges” from beginner to intermediate people can rely on, and Reddit defaults as one, so it should be our duty to make it friendlier.

There’s also the problem of the community’s general health. The community’s attitude, particularly on Reddit, is thoroughly toxic. I’d go as far as saying that had I stumbled upon Reddit when I was first starting out, I never would have gone past newbie status.

For instance, all the framework-fanaticism - some people worship some frameworks and utterly hate others solely on the fact that they differ, but in reality they’re pretty much the same, like in real religion. If they just gave the alternative a go, they’d realize they all advocate the exact same stuff and none of it makes much sense. Again, just like in real religion.

The NIHilists and their opponents don’t help either - people desperately reinventing the wheel, and those desperately trying to oppose them.

These, I believe, are some of the main reasons for the high cursing frequency in the PHP subreddit.

How do you feel about the health and patience of the PHP community? Do you feel like it’s poisonous, too? What can we do to change things? Should we just give up on trying to heal Reddit and move to a friendlier-by-default platform?

I think that the Reddit community, in general, is growing toxic in a lot of places. If you keep up with the news, it’s getting ugly there, and I think I recall reading about some migrations of users of some subreddits to other platform(s) - but I don’t recall details.

I think that the newbie vs (self-proclaimed) pro problem is a real one, not just on Reddit - it’s just exacerbated there. The community of programmers in general, but PHPers sometimes, it seems, more in specific, is very “RTFM” (“Google it, noob”). That’s definitely not conducive to fostering growth in low-to-mid-level PHP developers. I like environments like SPF for this, but really it’d be nice to have communities like r/PHP that were perhaps more moderated, or something - as an option - for developers.

1 Like

“Contains word/10000 comments”.

Well that header is -potentially- misleading due to missing information. How many comments were made about each overall? Is the number statistically large enough to make this chart valid? (The article mentions 300,000 comments… but divided between 22 categories, if all were even, means that each category barely has 10,000 comments to make the word/10000 comments metric seem needed. That said…

Where’s the chart for overall percentage?
If the subreddit for visualbasic has 200 comments, and javascript has 15000, then yeah, that’s going to affect the data presented in all of the charts.

I agree with your overall statement that a more open language leads to more ‘newbies’ and immaturity. I also notice that ASP isnt even on the chart (and yet somehow mathematica is).

I think the reddit PHP ‘community’ (we are not one single community.), based on this data alone, is about as ‘poisonous’ as reddit itself is. Reddit more often than not is a wild-west for people to go and vent and scream. Moderation tends to be either non-existent or heavy-handed (another factor in the data, no doubt). It’s not designed nor intended to be a helpful place. So to honestly draw inferences beyond the scope of reddit itself is silly.

EDIT: Additional. If the comment contaiins "fuck i hate this crap, shit "… it counts for all 4 categories, right? So… 1 comment = 4 “word/10000 comments”? Are you still measuring “word/10000 comments” anymore by using a stacked bar?

1 Like

And there are also those even noteworthy PHP devs, who also like to embellish their prose with Rabelaisian verbiage. Most great comedians also use rough language to make a point, uh…um…I mean a punch line. :smiley:

What gets me is that too often this “friendlier-by-default” platform seems to invite people, who try and take advantage of that friendliness. You even see it here on Sitepoint. Some users come to a problem and just throw up a “I need a solution to a problem, can you code it for me?” post and hope for some gullible dev to do all the work for them. They show not one iota of an attempt to solve the problem themselves. That, on a forum like Reddit, would probably get a bevy of inimical responses. Thankfully, the people here are more civil. :smile:


1 Like

That is a big problem, yes.

Sometimes I feel like a heavily-moderated real-identity-only platform is the way to go for a proper programming community. All this anonymity and free-for-all-ism just assist people in creating these “do my homework” throwaway accounts.

With Reddit’s familiarity and popularity, people tend to think of it as huge. But in fact, Reddit is a drop in the bucket that is the PHP community. The PHP community is massive and the toxicity of a few members of that community have no effect on it in the least. Trying to learn PHP from Reddit would be like getting tips on creating great art from a bathroom wall, though, and I think anyone serious about building their programming skills would come to realize that.


I beg to differ - I think a vocal member of the community, positive or negative, easily has enough influence to direct (either away from the language or towards it) a thousand silent newbies.


I can’t disagree with someone’s ability to dissuade those willing to be dissuaded. But I stand by my further comment on the quality of education one might receive on Reddit. You have to admit that’s kinda scraping the bottom of the barrel as far as PHP instruction on the web is concerned.

I think my main point is that you can’t judge the population of an entire country by the actions of a family from that country. That’s only logical. A further point, I started programming with php in the late 1990s. I know the number of PHP searches I’ve done since then must number in the millions and I can’t remember ever landing on an answer from Reddit. Maybe I just haven’t made the right queries.


Somewhat the same. Never have I intentionally browsed reddit nor ended up there from a search related to web development. As for people dumping on php I don’t think anything of it. It comes with age and everyone wants to use the hip, new cool stuff and I can’t blame them. Give it a few years and there will be something brand new and everyone will be dumping on the “modern” technologies of today. Which is why I always encourage computer science fundamentals over any language. What is in now will be out next year but the fundamentals are mostly consistent across all languages – especially web oriented ones. While I know PHP well I wouldn’t mind at all if it starts to fizzle out in light of more adequate technologies. However, I think that is FAR from happening.

1 Like

Reddit is a link aggregator. It doesn’t actually have instructions on it.

It does however have comments and self posts. Laracasts is quite popular in /r/webdev and mentioned frequently when someone comes asking for help. The community is very helpful on quite a few subs, with /r/learnprogramming, /r/webdev, /r/web_design, and then the various misc language subs. /r/php however is, and always has been, a cesspool.

So this got me curious, and i went looking into the data this article is based upon. Assuming i trust his data as is.

And i found something quite interesting.
The chart, while it looks pretty and all, is about half of his data. It’s very much not complete. In fact, if the bars were actually stacked in the way he seems to want them to be, here are some fun facts.

The full breadth of his data collection on “negative emotions” actually looks like this:

(Don’t ask me why verbose is a negative emotion.)

If I pair his data down to just the 4 ‘toxic words’ he selected from his list, but include ALL of the data, i get a chart that looks something like this:

Apparently the groovy guys and gals really like their fecal matter.

But this still isnt really reflective of the ‘community’, because as i mentioned previously, someone using all 4 words in one comment makes the community four times as toxic. So… I took his program, and analyzed it.

What he does is take every comment written in a subreddit, and compress it down into a SQLite database. He then runs the folllowing query:

(10000.0 * COUNT(*) / cached_subreddit_comment_counts.cnt) as result\
FROM comments, cached_subreddit_comment_counts\
WHERE comments.subreddit = ? \
    AND cached_subreddit_comment_counts.subreddit = ? \
    AND body like ?"

With the ?'s bound as the subreddit identifier (twice), and the body as %<chosen word>%. So there’s some good numbers here - he counts each post once only per word (so “shit shit shit shit” doesnt get multi-counted).

However posts with multiple words in the data pool do get counted more than once - so using a stacked bar is somewhat misleading in his chart. Even the ‘sum’ column of his data is not really valid data - while factually accurate, it doesnt reflect a community to say that ‘there were 500 curses used in 10,000 posts’, for the reason i’ve stated above about multi-counting.

We can, however, infer the actual “number of posts containing a curse word” from his data, by tweaking his code slightly. So let’s stop trusting his data and do it ourselves; we can even do it with more recent data!

I added the following to the python file.

def relative_word_group_count(c, subreddit, words):
subwords = words.join("%\" OR body like \"%")
command =\
    "SELECT \
    (10000.0 * COUNT(*) / cached_subreddit_comment_counts.cnt) as result\
    FROM comments, cached_subreddit_comment_counts\
    WHERE comments.subreddit = ? \
        AND cached_subreddit_comment_counts.subreddit = ? \
        AND (body like \"%"+subwords+"%\")"
c.execute(command, (subreddit, subreddit))
res = c.fetchone()[0]
if not res:
    return 0
return int(res)	

def show_word_group_table(c, subreddits, words):
    print words
    result = ','.join(["subreddit"] + ["4big"] + ["all"])
    result += "\n"
    for subreddit in subreddits:
        print subreddit,
        result += subreddit + ","
		result += str(relative_word_group_count(c,subreddit,["shit","fuck","hate","crap"]))
		result += str(relative_word_group_count(c,subreddit,words))
        result += "\n"
    return result	

(and inside the count_word_mentions function)

write_str_to_file('analysis/words_group_all.csv', show_word_group_table(c, subreddits, negative_emotions))

Because this program pulls an entire YEAR’s worth of data out of reddit for all of the subreddits listed, it will take the program some time to run (the number of wget calls is absolutely insane). I will update this thread if/when it completes. (I’d put money on my PC crashing/rebooting before it comes close)


@StarLion that’s an awesome reply, thanks, sure sheds some new light on things :slight_smile:

Is there actually a language called “brainfuck”? :open_mouth:


sure:, there is even a JavaScript variant:

Thanks, though it was meant to be a more rhetorical question, given the topic revolves around bad language. Hehe… a language named with bad language. LOL! :smiley:


1 Like

You should add the more active subs to it. Many of the ones you listed are basically dead.

Here is my list of tech realted subs. Not all are programming, most are at least somewhat active. I don’t have some on purpose, like /r/php or /r/html or /r/html5.

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.