Disallow crawling of anything with a query string

MediaWiki CMS is a special CMS in the sense that all its “management” pages are not hidden from any user. This may cause UX as well as information security problems.

To me it causes an SEO problem because then the hundreds or easily thousands or tens of thousands or more of pages are being crawled (they will be crawled even if not indexed) and it takes crawling power from my already weakly-crawled website.


My MediaWiki website is all core.
I expect my readers to only read article pages and category pages.
I don’t recall ever finding a URL of an article page or of a category page to contain URL parameters.

As management pages tend to have URL parameters, I thought to put the following command in robots.txt

Disallow: /index.php?

Is that a problem?

Hi,

It seems you already got an answer here: https://www.mediawiki.org/wiki/Topic:Wrrku3zw2vsoboyh

Further to what the person in that thread said, here are a couple more points to be aware of:

Just out of interest, are you forced to use MediaWiki CMS? I have seen numerous other threads of yours and you seem to be spending a disproportionate amount of time and energy fighting the thing.

4 Likes

Hello James ! I am indeed focused on MediaWiki :slight_smile: I just want Google to crawl only a certain webpages in my website, as MediaWiki normally allow Google to crawl numerous pages which are irrelevant in my particular case.

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.