Google is complaining that a pure php file does not have title/meta tags/ and is not mobile friendly
How do I go about putting a rel=nofollow on the file? Do I have to start with html/head/body/etc tags… and if so do I put that before the php code or after?
The file is in a subfolder that contains only js scripts. I could include the subfolder in the robots file, but I’m afraid that it might also block the main folder: /Main/scripts
Which is of course html, not php, and goes in the <head> of the html.
That depends on what the php is doing.
Though it sounds as if you should block the folder in robots.txt if there is nothing that needs to be indexed in there. The robots.txt does not forcibly block or deny access to a file or folder, it merely requests that crawlers don’t go in and index the things there, and “nice” crawlers will comply. This means that the files there will still be accessible if needed, they just won’t be indexed.
This also begs the question, how is Google finding this php file? There must be a link to it somewhere for the Googlebot to land there. Wherever that link is, it should be nofollow.
Yes, most pages have a currency widget that needs this php file. The file has no html whatsoever: pure php
So, maybe placing the file alone in the robots.txt file. But can I simply enter it there as /maindirectory/subdirectory/phpfile.php without upsetting (hiding) any other files in the main directory and its subfolder?
That should target just that specific file. Though I have not used robots.txt before to disallow specific files. I generally put files I don’t want crawled all in one directory and Disallow the whole directory.
Remember, robots.txt does not stop the files from being accessed, it just stops them being crawled and indexed.
I guess robots.txt could be used as a “don’t bother with it”
But if you are concerned about other files you do want the bots to find maybe you could use header() to indicate that the file is not text/html (the usual default content-type)?
That would be concerning me. If Googlebot has found it, so can others - and only “good” bots respect robots.txt. So in addition to the other steps, I’d be looking to see how this happened in the first place, and fix it.
Yes, I can’t imagine a scenario where you would have a hyperlink to a pure php script like this. I would expect a script of that type to be referenced as an include or cron job or similar.
It looks as if you could solve your Google problems by excluding the /scripts/ directory in robots.txt.
However, I can access that PHP file by direct URL, and my understanding is that that’s a security risk, so you might also want to look into fixing that.
Sorry - I was in a hurry when I looked at the file. I thought it was displaying the unparsed PHP, but now I look again, it appears to be JS. Which seems like an odd way to do things, but not my area of expertise, so if it works…
You brought up am interesting point. I thought that it was not possible to see php files (at least with "View Page Source=, but as you wrote I can see the whole file through it url…
So the script is php and js What we are seeing is the js output which has been modified by the php.
I suppose it is not unusual for raw js code to be visible.
So the solution would be:
I don’t think there is any need for crawlers to go in and index the things in there, or report on what tags are/aren’t present.
Just specify the content type to what the file is actually outputting
ie. JSON, XML, plain text, etc. using the appropriate header field
But as others have said, if the file s in a “scripts” folder that doesn’t contain anything you do want the bots to bother with, adding that folder to robots,txt should be fine.