Is there a way to determine which pages of a website are not being indexed by the search engines?
I know Google Webmasters has a sitemap area where it tells you how many urls have been submitted and how many are indexed out of those submitted. However, it doesn’t necessarily show which urls aren’t being indexed.
There is not a tool that will do this for you nor is there any kind of easy process. There is however a thread on SEOMoz about doing this using multiple tools. It sounds kind of tedious but it may help you,
Just google the URL with quotes around it. If its been indexed it will return a result.
go to google and search for site:your url. if the page is indexed you will get the result on the search result.
Google Webmaster tool also serves the purpose effectively. It will show you a bar chart of number of links submitted and number of links indexed. Also, as Eric Watson mentioned, Google the URL you think is not indexed and if results is returned against the query, it means the link is indexed otherwise not.
One simple way is just copy url and google it in search engine. If your url is indexed then it displays the page or else it is not indexed.
You can check the cached urls by making a search in google with following syntax:
Site:yoursite ( replace your website url with yoursite and make a search in google as a result you will find the total number of index urls)
put some specific words in the metas and search the “words” in search engine few days later. Normally, if the page content is unique and not disallowed in robot, it should be indexed, the time depends on your site structure.
What if the site has like a million URLs?
YES YOU JUST SEARCH SITE:WEBSITE URL ON Google search box and than enter after that you will find out the index urls of your website.
As long as your robots.txt file is not preventing pages from being indexed, all of your pages that are being linked to directly or indirectly from the homepage should be indexed. Just because they don’t rank doesn’t mean they are not indexed.
You can search in google by cache:yourwebsite.com
2 ways that you can use to identified how many of pages are indexed, Do site:yourwebsite.com and check your website all inner pages Cache:yourwebsite.com. Cross check the figure with GWT, In GWT you can see the figure of submitted & indexed pages. Lil bit time consuming job.
I totally agree with you but you may also use some tools where you can know your page is indexed or no.