For google bots, how to block pages from being checked?

Not sure this is the correct place to post this and if it isn’t please move it to proper category.

I got some immense help on revamping my old website to the new CSS format.

It’s still a work in progress, adding more galleries etc.
Now I have some questions.
Two links from my home page go to my old 20+ years websites:
http://www.moskovita-photography.com/stock_photography.htm
Hundreds of pics and old links, but stories nonetheless.
And this one:
https://moskovita-photography.com/Jacks__Blog.htm
I’d like to block those from being scanned by google.

When I ran this check from www.semrush.com with my main home page:
https://moskovita-photography.com/
It said these were a major problem:

So you can see I’d like to have those blocked from google bots as it would take way too long to fix all of them. Can I?
I don’t want this to happen:

Please take action and start fixing these issues to prevent search engines from ignoring or even punishing you.

Thanks in advance.

Hi,
Have you read up on the “robot.txt” option?
https://duckduckgo.com/?t=palemoon&q=robot.txt

Google introduction:
https://support.google.com/webmasters/answer/6062608?hl=en

2 Likes

Come to think of it, the robots meta tag might be what you could use:
https://duckduckgo.com/?q=robots+meta+tag

Google Dev:
https://developers.google.com/search/reference/robots_meta_tag

2 Likes

It didn’t go the way I posted it???
So I posted a jpg of it

Yes. Google warns that robots.txt may not prevent indexing and it is safer to use the meta tag. From the Google support link above:

A robotted page can still be indexed if linked to from from other sites
While Google won’t crawl or index the content blocked by robots.txt , we might still find and index a disallowed URL if it is linked from other places on the web. As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the page can still appear in Google search results. To properly prevent your URL from appearing in Google Search results, you should password-protect the files on your server or use the noindex meta tag or response header (or remove the page entirely).

However, I would question the wisdom of removing pages from search because a third-party site says there are “issues”. Do you have reason to believe those “issues” are causing problems with Google? What does Google Search Console say? (I can’t access your Semrush links, so I have no clue as to what the issues are.)

1 Like

137 images, 144 issues

14 images, 33 issues

I don’t know what the errors are, but i can guess what it’s complaining about…but i’m gonna guess it’s more of an issue of tweaks rather than chopping your nose off to spite your face. (The Nov22 one, at a quick glance, contains many images without an alt parameter, which is probably throwing some of those issues…)

3 Likes

If you run this check from www.semrush.com with my main home page:
https://moskovita-photography.com/
you should see the errors.
https://www.semrush.com/siteaudit/campaign/3804001/review/#issue/detail/30

That said, I can probably fix the five links in the “bogs” website,
https://moskovita-photography.com/Jacks__Blog.htm
but it will take forever to fix the ones in my old 20+ years websites/links/photos,
http://www.moskovita-photography.com/stock_photography.htm
so I want that not scanned by bots.

Meanwhile is this the correct way to do this?

I’m sorry, but I’m not signing up to Semrush to check the site.

If you want to use the robots meta tag, then you add it in the head of every page which you wish to exclude from indexing.

3 Likes

So putting that in the index page is wrong?


Put it in the webpage(s) I don’t want indexed instead?

A page’s meta tags apply to that page only.

2 Likes

Yes.

Placing it in your home/index page will only prevent that page being indexed.

https://support.google.com/webmasters/answer/93710

1 Like

So I only need to put this:

<meta name="robots" content="noindex">

in the pages/website I don't want indexed then?

From the Google link I posted above:

<meta> tag

To prevent most search engine web crawlers from indexing a page on your site, place the following meta tag into the <head> section of your page:

<meta name="robots" content="noindex">

To prevent only Google web crawlers from indexing a page:

<meta name="googlebot" content="noindex">

You should be aware that some search engine web crawlers might interpret the noindex directive differently. As a result, it is possible that your page might still appear in results from other search engines.

2 Likes

Like this then?


That is my old website that has hundreds of photos/links etc I don’t have the time to fix, so I don’t want it index. I also want to do the “blogs” website too, until I can fix that one.

Should I do this instead on that page with all those links?

Additionally, if you want search engines to both de-index your web page and not follow the links on that page (such in the case of thank-you pages where you do not want search engines to index the link to your offer), use the “noindex” with the “nofollow” metatag:

meta name=”robots” content=”noindex,nofollow”

That’s entirely up to you. If you don’t want that page to be indexed use noindex. If you don’t want bots to follow links in that page use nofollow.

4 Likes

I added
meta name=”robots” content=”noindex,nofollow”
(I cut <> from the above otherwise you can’t see it)
To my Bogs and stock photography pages and when I checked the source “live”
the blogs one was okay, but the stock photography had other symbols in it???
meta name=”robots” content=”noindex,nofollow”
(I cut <> from the above otherwise you can’t see it)

I pasted both the exact same… what gives?

You need to put a backtick at either end of the tag in order to display the code inline, or three backticks on a line above and below the code. Or highlight your code and use the </> button in the editor window.

You haven’t shown both pages, but the one above doesn’t seem to have character encoding declared. You should have that at the top of every page. e.g.:

<meta charset="utf-8">
2 Likes

Ah… that was it as it is a 20+ year old website.
Fixed the
<meta name=”robots” content=”noindex,nofollow”>
problem
As for the backticks, I learned something new. Thanks!

1 Like