I am considering building a niche social network and I am concerned about Drupal’s ability to scale when there are a lot of logged in users. I have read numerous posts saying that with caching, Drupal fairs pretty well for anonymous / non-logged in users, but that scalability is an issue with a lot of logged in users. The site I am developing would have to be able to handle between 200,000 and 1,000,000 logged in users a month. I could not find a lot of cases where Drupal is being used for a social network with a high amount of logged in users besides drupal.org. If you know of any, please list some below.
If you have experiences building social networking sites in Drupal, and have also done similar sites with a custom CMS, what would consider the ideal approach for developing this web application? Drupal or a custom CMS? What lessons have you learned in trying to scale Drupal for sites with a lot of logged in users? If custom, php, python or ruby?
So thats about 1400 users per hour? I don’t think drupal was designed to handle that ammount of traffic. In particular you will get database problems. say those 1400 users do 3 things in that hour that means atleast 1400*3 database queries. Thats atleast a query per second. And now we’re still being very very positive. I’d recommend you don’t use php and mysql.
I use Drupal primarily at work, I feel it’s fine for straight forward sites, but I would be far more comfortable using a PHP framework for a complex site involving lots of users and most likely - very custom functionality.
Drupal doesn’t scale very well, it is incredibly resource inefficient and requires caching right out of the box in order for it to be even remotely tolerable.
I personally have a custom framework I built that I use for freelance work, but look at various MVC frameworks (look for a light one - not one with a million feature) and go from there.
I have been coming to the same conclusion. It took a lot of Googleing to gain some insights. I found a case study about teamsugar.com over on Drupal.org, and they had to hack Drupal to death to get it to perform. There seems to be a lot of queries that are inefficient. I am going the framework route. Now it is between PHP and C#/.Net. I am thinking of Yii, if I go the PHP route. What do you use?
Yii was the one I was thinking of suggesting, but I have never built any real projects with it, except some tutorial on the popular MVC frameworks (just so I could have cursory knowledge of how they work), so I wasn’t going to suggest one over another as I don’t feel qualified in that aspect. I just simply wouldn’t recommend Drupal, I actually despise Drupal but it’s what my work uses so I don’t have a choice.
Mine is one I wrote from scratch, rebuilt 3x completely over the 7 years I’ve been a PHP dev. But it was designed only for me (and another dev friend who uses it for his work).
But Yii from my understanding is pretty light, so definitely explore that option.
Despise Drupal? Wow! That is pretty strong. I loved it when I first found out about all the things you can do with it, but I later found out that it performs like a dog. The turning point is when I attempted to start using Drupal 7. The slow performance out of the box really turned me off. I have been looking at Umbraco, Concrete5, and Modx. None of them seem to really fit the bill. They are all compromises.
I have one site that could be very high traffic, so I am looking at C# and .Net. I also have another one where the audience is smaller, so I am thinking PHP. This should be fun…
Why the strong dislike for Drupal? Do you have any thoughts on other CMSes?
Don’t get me wrong, it’s good for general purpose websites, but then any framework/platform is.
I dislike it’s performance ability as well as it’s inability (like every other large platform) to be ultra flexible. It’s great if you don’t like getting dirty in actual PHP - as the deepest you’ll hit are hooks/custom modules, which give quite a bit of flexibility, but I always seem to run into issues where its not sufficient and bash my head in trying to find a non-hack solution (this is I suppose the main reason I despise it).
It’s all preference of course, I prefer to have absolute control over my code, if there’s a bug - I want to be responsible for it.
With those numbers, even if it was coded in SuperEfficientAndFastImaginaryLanguage, you’ll still need to cluster/load balance at some point.
That said, I always consider CMS’ like Drupal and Wordpress to cap out at “high” performance. What you want is “ultra high” performance. (Those are my own terms I use at work when we discuss this stuff =p). For ultra high performance, the lower level you can work with code the better. Depending what you want, you may work with a PHP framework, or you may work directly in PHP (this is what I’m doing at this moment actually =p).
As for language, I think PHP can be just -almost- as fast as any compiled language (with APC and proper memcaching). I personally find PHP a little more flexible, so it’s my tool of choice.
Afraid I can’t offer any insights into frameworks since I rarely use them. =p
It seems like you have to invest in more hardware to make Drupal truly high performance. I think Drupal is not really designed to be high performance (high query levels and poor database design for modules such as cck). I have been googleing like crazy trying to find out about scaling Drupal, and most of the examples given are for sites that are serving up static pages that are easily cached. I can also identify with wonshikee’s statement about having to turn on caching from the very beginning to make Drupal bearable. All things being equal, I have come to the conclusion that Drupal will always need more hardware than something that is created custom, and even in those cases, it feels like you are using hardware to cover up something that is sluggish from the start.
Yeah. I personally use Wordpress as my basic CMS for most things instead of Drupal, but it’s basically the same deal.
There are lots of performance tweaks you can do on the server end that help:
OpCode Caching (APC is my favorite)
MemCaching (this gets very tricky, use with caution)
Optimizing queries (at least the ones you write)
Using functions that don’t run the same query again
Add caching on your database (lots of options here)
You’ll inevitably have to scale your servers as well (you aren’t going to handle 1,000,000 users / month on a social media site on one server). Clustering your database is relatively easy to set up, but can be costly (MySQL for example requires a minimum of 4 boxes to properly cluster). Clustering your web server can be a lot trickier, depending on if you have local caching and how you handle it. Personally I prefer to offload any caching to it’s own server (cluster) and keep the static files in another place as much as possible (especially images).
If you want to support 1,000,000 users, you have a lot of research ahead. There is no way around getting into server management when you are those numbers.
Back to my point though, the better your code, the less hardware you’ll need to run it.
If you are looking to handle that many users you are going to need some serious hardware, not just software. I would recommend taking advantage of the many “cloud computing” services out there. Amazon has EC2, Microsoft has Windows Azure…so on. Lots to choose, from Infrastructure as a Service (IaaS) which is Amazon EC2, Platform as a Service (PaaS) which is Windows Azure.
The difference between IaaS and PaaS is with IaaS you manage the whole thing from the OS and up, Amazon manages the hardware. With PaaS, you only manage your application, Microsoft manages the OS and its updates for you (Concerning Windows Azure of course). There are far too many cloud computing services out there for me to keep track of. But Windows Azure is what I’ve been using lately with clients and my own projects. (No, Windows Azure is not only for Microsoft’s technology. PHP works perfectly so can Ruby and Java, even Node.js)
I do think you would be the first php/mysql website to handle this amount of traffic per hour. That is… If you succeed.
There is no question that apache can handle this. Apache has advanced features for handling traffic like this.
PHP and mysql however are not designed for this (not even talking about drupal now). You might get pretty far if you make your own custom optimized framework. But when the traffic increases to several hundreds of users per hour your site will become increasingly unstable (you get more and more failed queries and script timeouts).
I think at this point however you do not have this many users yet so you should start small. See where it goes. Once you have money you hire some professionals to do the job for you. PHP should get you far enough to make some money.
I have realized that potentially I will have to do quite a bit of work to create a good architecture for the application. I really appreciate the options you have presented, and I am taking note of things that I need to research and implement. I used to be a system admin back in the days of Novell, and the growth of Windows NT. It feels like a lifetime ago, but it seems that I cannot escape my old self. At this point, I have to develop the solution, and then try to grow the community. It could be years before I have to focus more on the architecture, although I am a planner and like to have a clear path forward. The options being presented are great food for thought. I guess we have left Drupal in the dust.
I will start slow and go from there. It is just the planner in me trying to have a clear path forward, and trying to start of in the best possible way to maximize performance and scalability later on. Would you recommend something else other than mysql or mysql clustering?