My developer has uploaded an XML sitemap to our website. We have lots of pages (approx 35k) and google has only picked up about 2k of them.

Google webmaster tools “is” listing the pages (slowly) but its also coming up with an error with Robots.txt:

User-agent: *
Allow: *

The sitemap appears to pass all tests for Yahoo and Bing, but not Google it seems. (Google is most important to our business, so I need to try work out why its coming up with this error)

I dont know how I got it, but I got an error saying that there were no “style data” associated with the sitemap.

My questions…

  1. Do you have any idea why we would be getting the report above? (There are no typos in the sitemap etc)
  2. How important is “style” info in the header? If so, what should a “generic” style header look like?

I didn’t think an xml sitemap had any style; it is just a list of URL’s.

If Google is indexing them eventually what is the problem. It they are not all totally unique they may not all get indexed anyway.

If your sitemap has every page there must be over 175,000 lines in the file; I would think it would take Google a while to index them all.

There is not as far as I am aware much you can do wrong with a sitemap and robots text file. They are what they are.

XML documents can be styled, if they are for humans, but a sitemap.xml is just for bots so it needs no styling.

If you view an XML file in a browser you will sometimes get a message along the lines of “This XML file does not appear to have any style information associated with it. The document tree is shown below.”

That’s not an error, and it doesn’t come from Google.


Thee robots.txt is saying there’s an error with styling, and coming up with this error. , but google is "(slowly) listiing the pages though, which contradicts google itself! (Can you see why i want to check myself for special mental treatment?)

Is there anything wrong with the google error report? And if so, what is wrong please?

What is the XML declaration of the file look like?

This the first item from my xml file:

<?xml version="1.0" encoding="UTF-8"?>
-<urlset xsi:schemaLocation="" xmlns:xsi="" xmlns="">

EDIT: I forgot to ask how long is your sitemap.xml?

It’s worth checking the size of the XML file, I’m not sure if it’s current but Google used to have a 50,000 URL limit and up to 10mb per file.

For that many URL’s it may be worth converting into 3 or 4 individual files, e.g.:

File 1 - Key pages, typically covering home, top level sections, categories etc
File 2 - Article/story pages
File 3 - General page types

Finally remove any pages you don’t really need indexed like password retrieval pages etc.

That sounds like the BOM (Byte Order Mark) gotcha.

Make sure your text editor is saving the file “without BOM”

there are a number of situations where the BOM, particularly because it is invisible, may cause a problem

