Searching a whole online website if pages are not indexed?

Hi there,

I am wondering if it is possible to search for a key word on an entire website if certain pages are not indexed or there is no search on the site? Would this be in anyway possible?

Any thought would be great, thanks!

In code, or just from a browser? I could see how it might be possible (but not necessarily easy) to write some code to crawl the site, extract the visible data, and then perform the search. I could also think of a few ways that the site owner could prevent that, or at least make it difficult.

It would be from the browser or code or cmd I guess?

For example, if I wanted to find the word “business” on the entire BBC website (I know that is huge and a keyword would return so many results, but just an example), without using google or the BBC website itself, e.g their own search, how could I do this? Then to return a list of all the URLS that is does appear in. So I guess in a way, Google doesn’t really come into this.

Not sure if that makes sense.

You have two scenarios, one hard, the other harder:

  1. If the site is static, you crawl the site collecting all links and download each page. Then search those pages for your keyword. See CURL() to get started.
  2. If the site isn’t static, the search string you’re looking for may be in a database and only appear on the page given the right secret handshake, or every odd tuesday of the month. If you just process it as in method #1, you may or may not find the keyword you’re looking for. To get all occurrences of the keyword, you would need to be able to provide secret handshakes and run your crawling application every other tuesday to catch that search string.

Not to be pedantic, but if a page is not indexed, there’s a reason. The site owners have decided a) it’s not for public consumption or b) it’s not relevant/current anymore

You can do a site specific search on google for a term (you add site:example.com)

Or if you really want to go down a rabbit hole, you can look into domain specific search engines.

But again, if it’s not indexed, it shouldn’t be considered valid…

Although search engines let you look for content that is in their index, that does not mean the content isn’t actually there.
As for your question, it would be better to start at search engines like DuckDuckGo, Bing, and Yandex and find content there to find websites that are not listed on Google.