50,000 articles in Joomla - Need to split the database

I have over 50,000 articles in my database (Joomla). It’s really slowing things down. I need to archive 49,000 articles and and leave the last 1,000 on my current server.

I cannot delete those 49K articles, because they are accessible via search engines and it would ruin my SEO if they were to be deleted. Those articles are indeed accessed several times a day, but not as much as the website itself (with the last 1,000 articles), since we have daily news. Like 50 articles daily. So in a month, those articles become less and less viewed. And people tend to view the current articles. (kind of like CNN.com) you read the daily news and don’t really access older articles.

Any idea how to split that database and put it on a different server and still have the URLS forwarded?

I hope I didn’t confuse anyone.

Any help would be appreciated.

Thanks in advance

1 Like

Hi paulwv, welcome to the forum

Rather than expand to more databases, might it not make more sense to convert the older pages into static files?

I don’t imagine the content would be changing at this point, and other than design changes not being applied I don’t see where it should be a big problem to visitors.

1 Like

Hi,

Thank you for responding.

By converting them to static files, would that mean that the 49K articles won’t be called from the CMS to the database? They would be in a folder and every file would have a .php extension?

Please explain.

Thank you

1 Like

When SitePoint changed from vBulletin to Discourse, rather than importing everything, it was decided to “archive” the “older” inactive threads
The archived pages still had the vBulletin design, and a lot of the links would no longer work, but the pages were essentially static HTML pages.

I was hoping to provide an example, but it seems all of the links to old content now go to a generic “we’ve moved” page with links that redirect to the Discourse forum

https://www.google.com/search?q=site%3Awww.sitepoint.com%2Fforums

Anyway, it essentially “freezes” dynamic pages into static pages

Maybe a better option would be to cache pages that aren’t being updated?

1 Like

Here is an example of a record from a database that has been archived.

Is it possible to add php code to html page w/o converting the page to php? - #10 by John_Betong

1 Like

That approach is going to be problematic if/when the site is redesigned or features are added to the global layout which are expected to persist to those articles. A simple example would be adding a menu link. If the client by chance changes their mind than you would have to crawl all the articles manually and add the customization on each static page.

Clients always change their mind. What is an ok sacrifice one second is not an ok sacrifice the next. So if you were to go the static route I would still leave all those articles in tact in the database to be regenerated when content is added to the global layout.

I don’t know a whole lot about Joomla but does Joomla have some type of internal caching you can turn on? If all those articles are the same for every user full page caching is most definitely an option.

2 Likes

Regardless wether they are HTML or PHP pages, I am afraid it would hurt my SEO.

If turning them static won’t hurt SEO, then it’s simple. I will create the old fashioned SSI tags inside each article. So whenever I need to update header and footer links, all I have to do is change two pages and will apply to all.

But the question would be, by switching static, would it hurt my SEO?
Also, how do I switch 49,000 articles static?

1 Like

Like I said I’ve never used joomla but I have used Drupal and magento. In both platforms there are ways to turn on different types of caching including full page. I would think something similar exists in joomla or via an add on. Search google for caching pages in joomla. That is where I would start.

1 Like

Caching is not the issue. I have caching on, tried plugins… URL forwarded would be required.

I will look up how to switch articles to static pages. I have an instance with ordpress also. I converted all 50k+ articles to Wordpress too. To see which is easier.

It comes down to the same solution.

1 Like

Check the supplied Url and if it is lowercase and without any extension the cache page is selected from the .htaccess file.

Any variations of case or adding any extension is not an exact match results in the page being regenerated.

Check the last Debug line which shows {elapsed_time}, {memory_usage} and time the file was cached.

Deleting the cached file generates a replacement with the latest changes.

1 Like

If the static content is identical to the dynamic content then the chances are the page will load a lot faster and benefit the SEO.

I have long forgotten how Joomla displays final content, with a bit of luck the page contents can be easily saved.

Failing that, after the page has rendered, use Php file_get_contents(…) and save the result to your cached folder if and only if the cached file does not exist :smile:

Edit
Actually if the web-page does exist in the cached folder then the .htaccess should render the page and not call, Php, MySQL or Joomla. This is what makes my pages fast.

1 Like

There are a multitude of caching levels. I’m referring to caching the whole page which is effectively the same thing as generating static pages. The static page would be served up until the page becomes expired via an edit or update of its content. Furthermore, there are various services such as varnish which can be placed in front of your application that effectively do the same thing with a boatload of options for handling dynamic content.

If 49,000 pages are cached, don’t they need to be visited once for them to be cached?

Also, if I decide to change ad banners on those pages, I would need to modify the SSI tags (if I use static)…

Plus I need to maintain the same URLS.

If the pages are dynamically cached then the page has to be visited once to generate the static page. The .htaccess file checks it the URL is in the static cache folder and if it is then the static [age is generated.

Deleting the cache folder contents will start generating new static pages with the new banners, etc

Thanks John.

Now, how is that done? What do I use to convert to static? HTTrack? What abou tthe URL redirects?

The URLs are checked in the .htaccess file and if a matching web-page exists it will be rendered.

Here is my cache folder details which is dynamically generated and shows the elapsed time since the cache folder contents were deleted:

Summation of Cache Folder Contents

@paulwv

[quote]Thanks John.
Now, how is that done? What do I use to convert to static? HTTrack? What about the URL redirects?[/quote]
Did you follow the [color=red]Link[/color] in my previous post #5?

Can you supply a link to one of your pages and I will try it on my server?

Meanwhile search for “htaccess file exists redirection” and learn how to modify your .htaccess file.

Try creating a cache folder and save a static file with the URL name. Then call the URL from your browser and see if the static page is displayed.

Also search for how to save a Joomla web-page to a static folder.

Edit:
Spelling - not my fortay :frowning:

the website is www.wrestleview.com (click on any link on the newsboard in the middle) that’s how the urls are.

Try these links:

Link One

Link Two

Link Three