Google Contradicting Results For XML Sitemap

I’m new here, but you look like a really knowledgeable group of people, so I hope you can help me before I turn to psychiatry to sort me out!

My developer has uploaded an XML sitemap to our website. We have lots of pages (approx 35k) and google has only picked up about 2k of them.

Google webmaster tools “is” listing the pages (slowly) but its also coming up with an error with Robots.txt:

User-agent: *
Allow: *
Sitemap: https://mysite.com/sitemap.xml

The sitemap appears to pass all tests for Yahoo and Bing, but not Google it seems. (Google is most important to our business, so I need to try work out why its coming up with this error)

I dont know how I got it, but I got an error saying that there were no “style data” associated with the sitemap.

My questions…

  1. Do you have any idea why we would be getting the report above? (There are no typos in the sitemap etc)
  2. How important is “style” info in the header? If so, what should a “generic” style header look like?

I “really” appreciate you taking a moment to look at this as we seem to have reached an impass - developer says its fine, google doesnt!

Many thanks
Tim

I didn’t think an xml sitemap had any style; it is just a list of URL’s.

If Google is indexing them eventually what is the problem. It they are not all totally unique they may not all get indexed anyway.

If your sitemap has every page there must be over 175,000 lines in the file; I would think it would take Google a while to index them all.

There is not as far as I am aware much you can do wrong with a sitemap and robots text file. They are what they are.

1 Like

XML documents can be styled, if they are for humans, but a sitemap.xml is just for bots so it needs no styling.

If you view an XML file in a browser you will sometimes get a message along the lines of “This XML file does not appear to have any style information associated with it. The document tree is shown below.”

That’s not an error, and it doesn’t come from Google.

3 Likes

Thanks guys.

Thee robots.txt is saying there’s an error with styling, and coming up with this error. , but google is "(slowly) listiing the pages though, which contradicts google itself! (Can you see why i want to check myself for special mental treatment?)

Is there anything wrong with the google error report? And if so, what is wrong please?

Hopefully you wont be joining me in the mad house!!! Apologies if this is the case. Not a deliberate act by Russians to put our top people out of action though LOL

What is the XML declaration of the file look like?

Hi Mittinegue,

What does that mean in laymans language please?

This the first item from my xml file:

<?xml version="1.0" encoding="UTF-8"?>
-<urlset xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
-<url>
<loc>https://www.mysite.com/</loc>
<lastmod>2018-02-05T22:09:00+00:00</lastmod>
<priority>1.00</priority>
</url>

EDIT: I forgot to ask how long is your sitemap.xml?

It’s worth checking the size of the XML file, I’m not sure if it’s current but Google used to have a 50,000 URL limit and up to 10mb per file.

For that many URL’s it may be worth converting into 3 or 4 individual files, e.g.:

File 1 - Key pages, typically covering home, top level sections, categories etc
File 2 - Article/story pages
File 3 - General page types

Finally remove any pages you don’t really need indexed like password retrieval pages etc.

1 Like

Thanks Rubble.

I’m nit sure that we have this preamble on our sitemap. I dont have the native file, but I’ll send this to my man and (with our url added) have him load it and see what happens.

This is a minor but potentially significant issue. I REALLY appreciate your help here.

I’ll be happy to contribute toward helping anyone who needs help with project managing a website upgrade. I’ve just been through my ninth and I think I’ve hit most issues LOL

Tim

thanks bluedreamer.

i think we are within these thresholds. its a stupid thing that we are missing i believe. :slight_smile:
appreciate the help though
Tim

Ive sent this info to my developer (clever guy) . It will be great if it solves the problem.

THANKS AGAIN for the help

much appreciated
Tim

You might also find these two help articles from Google informative:

2 Likes

THANKS for all your contributions.

We uploaded some styling code but google hated that! LOL so thats removed, and we’re back to square one, with the strange error that google itself seems to create in the xml file, namely putting the space in front of the first character of the first line! There seems to be no way around this. It’s very odd indeed, and not something Ive come across before.

That sounds like the BOM (Byte Order Mark) gotcha.

Make sure your text editor is saving the file “without BOM”

there are a number of situations where the BOM, particularly because it is invisible, may cause a problem

1 Like

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.