Disallow crawling of anything with a query string

bendqh1 · March 15, 2022, 12:11am

MediaWiki CMS is a special CMS in the sense that all its “management” pages are not hidden from any user. This may cause UX as well as information security problems.

To me it causes an SEO problem because then the hundreds or easily thousands or tens of thousands or more of pages are being crawled (they will be crawled even if not indexed) and it takes crawling power from my already weakly-crawled website.

My MediaWiki website is all core.
I expect my readers to only read article pages and category pages.
I don’t recall ever finding a URL of an article page or of a category page to contain URL parameters.

As management pages tend to have URL parameters, I thought to put the following command in robots.txt

Disallow: /index.php?

Is that a problem?

James_Hibbard · March 18, 2022, 7:45am

Hi,

It seems you already got an answer here: https://www.mediawiki.org/wiki/Topic:Wrrku3zw2vsoboyh

Further to what the person in that thread said, here are a couple more points to be aware of:

Just out of interest, are you forced to use MediaWiki CMS? I have seen numerous other threads of yours and you seem to be spending a disproportionate amount of time and energy fighting the thing.

bendqh1 · April 8, 2022, 3:37pm

Hello James ! I am indeed focused on MediaWiki I just want Google to crawl only a certain webpages in my website, as MediaWiki normally allow Google to crawl numerous pages which are irrelevant in my particular case.

system · July 8, 2022, 10:37pm

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Totally prevent crawling of anything which isn't something Server Config	17	2532	June 27, 2022
Query selection for search engine? Databases	3	393	September 20, 2014
A general warning about Query String rewriting and SEO Server Config	9	2054	October 8, 2014
Google webmaster tool crawl error "Restricted by robots.txt ‎(473)‎" Marketing	7	1072	September 28, 2014
Disallow Widget Php Directory in robots.txt Marketing	6	1422	January 29, 2021

Disallow crawling of anything with a query string

Related topics