50,000 articles in Joomla - Need to split the database

This is perfect.

Now, how do I get this done for 49,000 articles and then I finally get to clear up my Joomla database?

I am glad it worked.

Read my post #18

It is tedious to setup and to debg but well worth the effort :slight_smile:

I read. Did some searches on what to add in the .htaccess for the redirection and stuff. I found how to redirect.

Unless I am reading wrong…
I couldn’t find how to create the actual static file (49k of them) in the cached folder.

  1. create a /CACHED-FOLDER-NAME/sub-menu/DUMMY-TEST.html

  2. insert .htaccess script to test for file existence

  3. Browse URL: http://www.wrestleview.com/sub-menu/DUMMY-TEST

When the above is working, search for how to dynamically create the static files.

Why do you want to delete the 50,000 articles? Database content usually compresses remarkably well and indexes ensure the related content is retrieved very quickly.

Edit:
To create a demonstration link required:

  1. rendering your web-page
  2. viewing generated content (Right-click - View Page Source)
    3, Ctrl-A then CTRL-C to copy all web-page content into buffer
  3. save content and upload to a new file in the cached folder with a URL with an .html extension.

Many thanks.

I want to delete 50k articles because of performance bottleneck issues. Too many to fix. It’s expected. The bigger it gets the slower it will be. Plus my visitors don’t care about yesterday’s news, they want the daily and weekly news. Archives are only visited via search engine. By clearing up thousands of articles, the site will perform 10x faster.

I am using HTTrack and already at 7K static articles created. I will try the WGET command to copy my entire Joomla site to a static site. Will see which one works better.

Will reply back tomorrow and let you know.

Thanks

With regard to the above URL I assume the integer 57814 refers to a unique indexed field?

Try stripping 57814 from the URLs and search just on the index:

$url = $_SERVER['REQUEST_URI'];
$url = 'misc-news/57814-zayn-vs-joe-on-tonight-s-2-17-wwe-nxt-on-wwe-network';
echo '<br> $url ==> ' .$url;

$idx = substr( $url,10);
echo '<br>$idx ==> '.$idx;

$idx = intval($idx);
echo '<br>$idx ==> '. $idx;

$sql = 'SELECT * FROM your-misc-news-table WHERE idx = '. $idx;
echo '<br>$sql ==> ' .$sql;

Output:

$url ==> misc-news/57814-zayn-vs-joe-on-tonight-s-2-17-wwe-nxt-on-wwe-network

$idx ==> 57814-zayn-vs-joe-on-tonight-s-2-17-wwe-nxt-on-wwe-network
$idx ==> 57814

$sql ==> SELECT * FROM your-misc-news-table WHERE idx = 57814

Edit:
The search result should be virtually instant and even trebling the number of records will not make your site slower.

1 Like

I know I’m a bit late to the party here but I’d say 50k articles isn’t that much in the grand scheme of things.

If you say your site is slow I’d be looking at ways to optimise your server, database and content system because even if you remove 49k articles now you’ll get the same problem once the numbers build up again.

You say that after a month articles get viewed less and less and (unsurprisingly) people don’t access older articles very often. That’s normal, but my question is why do you actually need to keep old articles, especially if nobody has read them in the last 12 months? Older articles that never get read are of no use to your “SEO” at all?

1 Like

But by removing the unique identifier from articles, it wouldn’t change much would it?

Google crawls my site several times per day. People link to my articles. Wikipedia being one of many who links many old articles of mine.

With that being said, I get error / page not found notifications from Google’s webmaster tools if a page is missing. Trust me, tried it before on old articles and received thousands of error messages from Google about “page not found”. This will make me lose points in Google’s SEO algorithm and drop my ranking. I am already paying that price a bit.

Plus I have a large archive of news and information. Some people search for new results on Google and find old results, they still click. I get thousands of clicks on old articles via search engine daily.

1 Like

Yes

However, you could create a script that runs through and hits those pages. The typical term for it is a “cache warmer”.

1 Like

@paulwv

But by removing the unique identifier from articles, it wouldn’t change much would it?

Using the above script in a site that is not Joomla driven will make the page result return virtually instantly as long as there is an index on the search field.

I don’t know how Joomla works or how your site is designed as far as indexing web-pages is concerned. To analyse the web-page performance I tried using Pingdom and was amazed that for a page that appears to not have a great deal of content, images and adverts requires 498 http page requests and the we-page size is 1.1MB?

Pingdom Site Speed Tests

There is something drastically wrong because I am sure the majority of http requests are not needed.

It can be a very difficult and high risk task to optimize open source CMSs such as; Joomla. Fronting the website by a cache layer can be much easier and has lower risk to break complex systems like Joomla. I’m sure the CSS and JavaScript isn’t aggregated but making it aggregate and compressed can be a exercise in fatality in systems as complex as Joomla. That is not to say one shouldn’t investigate the options to properly optimize the site given the proper budget and time. Joomla is also withering away as a relevant project which makes it difficult to find such extensions in the community.

1 Like

I do like your “withering away” quote and agree with all the points raised.

I would be tempted to transfer the data to a Simple PHP Framework or CMS.

I never liked Joomla and far prefer PHP Frameworks where more control and tweaking can be applied to optimize web-pages

Historical information is very useful if and only if searching is very easy.

Simple cms always comes at the cost of the powerful editing capabilities of packages like joomla. Not to mention the cost of migrating to a new platform is always high. A lot of people whom use platforms like joomla don’t want to migrate to a new one because they are so familiar and I’m talking about the clients not developers. There is not only costs associated with building it but teaching as well.

2 Likes

You may not like them but it is what many clients want. Open source CMSs offer a considerable amount of power and flexibility at a fraction of the cost it would take to build an even somewhat similar custom system. In many respects it just makes more sense to use a well known open source CMS than build one custom. I think in many cases it is the less selfish thing to do myself. Projects that are likely to touch the hands of many developers are best off using things that have large community backing and thorough resources including documentation. At least then the next person on the project has a place to start vs. custom solutions which never have any documentation worth a damn in my experience. I just inherited a Code Ignitor 1.7 project built 5 years ago by indians and not one lick of documentation besides for some word docs written right before the last guy left who maintained it for a couple years. Code Ignitor sucks and the documentation is deplorable but at least it is a starting point. My experience hasn’t been good following developers who have time to thoroughly document software and processes. It is always a Jungle besides for any documentation related to open source software they might have used.

1 Like

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.