Google custom search not finding files (pdf, docx etc)

I have set up a Google custom search (free version) for a website that contains three pages of links to documents. So far these documents are not found in the search using the preview provided. I believe it is caused by the indexing of these pages and/or their priority in the sitemap.

Does anyone have a suggestion about anything I maybe missing or the likely cause of the problems?

Thanks in advance.

If you go to google.com and do a restricted search for “site:example.com filetype:pdf”, do you get any results?

No results from the search. The pages are showing in the sitemap.

Are you sure you haven’t got any nofollows anywhere, and nothing untoward in robots.txt?

I haven’t added any nofollows (and can’t find any). Robots.txt looks OK. How often does google check the robots.txt?

All the pages are in the sitemap as seen through Google Custom Search and the sitemap is indexed. Text on the downloads page is not found while text on other pages on the same level is found. The three pages (below the downloads page) containing links to files are not found when searching for content text.

Thank you very much for your help, Stevie. I keep thinking it must be something obvious that I’m missing.

My robots.txt is below.

User-agent: *
Disallow: /admin
Disallow: /dev
Disallow: /?flush
Allow: /members-login/downloads

User-Agent: Googlebot-Image
Disallow: /admin
Disallow: /dev
Disallow: /?flush
Allow: /

I’m concerned by /?flush. I’m not the world’s expert on robots.txt but I’ve never seen that format before, and I’m not sure I understand what you’re trying to achieve with it. Also note that the specs always include the trailing slash, eg /admin/, I don’t know whether that makes a difference or not but I would change it just in case. And finally, there is no such command as Allow, so you can delete those lines.

The website uses the Silverstripe CMS and /?flush prevents search engines flushing the cache - according to SS web (I’m no expert in robots.txt either). I looked up “allow” - it appears to be valid but search engines may choose to ignore it. I will add the trailing slashes.

Google happily finds PDF etc files in another SS site I manage so the CMS shouldn’t be the issue.

I’ll create a ‘new’ page by renaming one of the troublesome ones and see if Google finds it any more palatable!

Thanks for your interest.

Apparently, it exists as a non-standard directive, so you shouldn’t rely on bots recognising it or following it, although it seems Google and Bing do. Personally, I’d stick to the accepted conventions.

To answer your question, Google checks the robots.txt file every time its bots visit your site. You can test your file for Google’s bots, to ensure it’s doing what you expect: https://support.google.com/webmasters/answer/156449?rd=1

You must have to use registered version. then you will able to do it.

Check up with Google’s webmaster tools and see how Google’s bot is behaving on your site. Let us know how it goes.

I have tried to find comparisons of the two versions of search engine. I found this link https://support.google.com/webmasters/answer/35287?hl=en. I have another site in which the pdfs are displayed in the Google search results (this site does not have a custom search).

I don’t see any issues in the webmaster tools (that I recognise). The pages containing the links are in the sitemap but cannot find ordinary text in those pages. I used the Google Fetch to ensure the pages are crawled. One of these pages did seem to appear and disappear from the sitemap.

What specifically in the webmaster tools would you like a report on?

Thanks.

The pages with links are on the third level - could this be the issue?