Google continues its supremacy, despite the clamor over emerging search engines attempting to dethrone it. These providers approach search from different perspectives (semantic, social, community-based) and present their results in fresh forms (design, layout, structure) in the hope that their new package will appeal more to the masses. However, it’s apparent that search dominance is as much about branding, popularity, and financial backing as it is about relevance or looks.
The media offers the momentum, while many web users watch the search engine war with anticipation, keeping faith that somehow, one of the new kids will prove Google-worthy. While at first sight these entities look promising, deeper scrutiny reveals their inferiority to Google.
The Google killer phrase still triggers the sort of reactions that provide high traffic – and traffic is the lifeblood of any online publication. All of Google’s moves are monitored, analyzed, and dissected. Search experts, SEO gurus, analysts, web prophets, tech writers, and even common bloggers cover all news on Google with remarkable fervor. Yet, very few of these Google analysts actually know what they’re writing about. In their ignorance, the media hand Google even more power to reign supreme.
Almost all recent articles about search engines and Google killers have had two key effects:
- misleading the public into believing that these emerging search providers have developed new technologies and algorithms foreign to Google
- lowering the credibility of the recently established search engines (for example, Cuil)
The Power of the Giant Derived from Ignorance
What happens when writers conduct limited research for the articles they publish on the Web? What is the effect when they make assertions based on personal perception? How influential is an editorial choc-full of biased terms on the page’s search rankings?
In extreme cases, popular bloggers get censured by their readers – assuming that the readers know the topic better than the authors themselves. Here, writers should just bite the bullet. A formal admission of error is the ideal approach, but not enough bloggers come clean. In alternative scenarios, the users suffer. They are introduced to search engine options that cater for specific niches, yet the information is often truncated and inaccurate. And some search providers even cheat – like this Islamic Search Engine, created for the sole purpose of generating revenue through Google’s Custom Search program (AdSense-integrated).
In both cases Google wins and the company can only benefit from such momentum. Users may test other search engines only to conclude that, in the end, Google delivers better results. The search startups gain some benefits of short-term buzz – traffic, links, free PR (negative or positive) – only to be forgotten soon after. To remain in the spotlight they need to constantly add new features and functions to attract the attention of the press. In the meantime, Google monitors their moves; the market leader’s reactions and responses are so refined that almost no one understands that they are Google’s weapons.
Financial Power beyond Defeat
Personally, I am convinced that the main reason why Google continues to be the king of search is financial. Google is a money machine. They use money to make more money. They invest in all possible niches to extend their web domination. They buy their competitors (or the nearest thing to being a competitor) for the same reasons. Whatever Google does … is for a good reason.
Microsoft has, obviously, greater financial power than Google, but their marketing strategies are not so smart. Besides, their main concern is still the desktop – search comes in second. To defeat Google would take more resources than Microsoft is willing to spend. The acquisition of Powerset was a smart move, but not really the move that will “power boost” Live in the search hierarchy. Why? Because Microsoft has merely purchased technology similar to what Google already has.
The Power of a Power Brand
Google is the number one brand in the world. Many companies attempt to dethrone it, but as you can see, the search giant is a threat for traditional businesses as well – on different levels, of course.
Table 1. Top 10 Most Powerful Brands
|Rank||Brand||Brand Value||Brand Value Change|
|2||GE (General Electric)||71,379||15%|
Source: Millward Brown Optimor (including data from brandz, Datamonitor, and Bloomberg)
Every day we witness the power of the most influential brand in the world over consumers. Whatever Google does, the media covers it and regardless of what new service or tool the company releases tomorrow, there will always be enough adopters (or fans) to ensure a fast market outreach and a spread without precedent.
New Search Technologies? Nah …
As I stated above, by acquiring Powerset Microsoft actually acquired competitive search technology. According to the many articles circulating in the media, Powerset “creates a semantic representation by parsing each sentence and extracting its meaning.” This is also defined as a semantic search.
Powerset is not the first in the search industry to be labeled a semantic search engine. We also have hakia, Swoogle, ZoomInfo and many others. Like Powerset, these search engines attempt to deliver results based on the meaning of the web pages they index. Their developers assume that Google is basing its results on keywords. Even statements by its own representatives lead us into believing that Google doesn’t use semantic algorithms:
“Because we’re processing so much data, we have a lot of context around things like acronyms. Suddenly, the search engine seems smart, like it achieved that semantic understanding, but it hasn’t really,” – Marissa Mayer.
SUMMARY OF THE INVENTION
One aspect of the present invention is directed to a method of identifying semantic units within a search query. The method includes identifying documents relating to the query by matching individual search terms in the query to an index of a corpus and generating substrings of the query. For each of the generated substrings, a value is calculated that relates to the portion of the identified documents that contains the substring. Semantic units are selected from the generated substrings based on the calculated values.
A second aspect of the present invention is directed to a method of locating documents in response to a search query. The method includes generating a list of relevant documents based on individual search terms of the query and identifying a subset of documents that are the most relevant documents from the list of relevant documents. Substrings are identified for the query and a value related to the portion of the subset of documents that contains the substring is generated. Semantic units are selected from the generated substrings based on the calculated values. Finally, the list of relevant documents is refined based on the semantic units.
A third aspect of the present invention is directed to a server that includes a processor, a database, and a memory. The memory includes a ranking component configured to return a list of documents ordered by relevance in response to a search query and a semantic unit locator component that locates semantic units in search queries entered by a user based on a predetermined number of the most relevant documents in the list returned by the ranking component.
Analysts of Powerset, hakia, and co will argue that the systems used by the real semantic search engines are actually based on natural language processing (NLP). But how do these engines process data? One possible answer comes from Microsoft:
Understanding” language means, among other things, knowing what concepts a word or phrase stands for and knowing how to link those concepts together in a meaningful way. It’s ironic that natural language, the symbol system that is easiest for humans to learn and use, is hardest for a computer to master […]
The challenges we face stem from the highly ambiguous nature of natural language. As an English speaker you effortlessly understand a sentence like “Flying planes can be dangerous.” Yet this sentence presents difficulties to a software program that lacks both your knowledge of the world and your experience with linguistic structures. […]
We address these problems using a mix of knowledge-engineered and statistical/machine-learning techniques to disambiguate and respond to natural language input. Our work has implications for applications like text critiquing, information retrieval, question answering, summarization, gaming, and translation.
No matter how many definitions we read, the conclusion is the same: NLP remains a dream that will probably never be achieved on a global scale. Google has the financial clout to employ the best engineers to develop an NLP engine, and they are probably already working on this behind the scenes. Remember what Larry Page said last year at the annual American Association for the Advancement of Science conference:
We have some people at Google (who) are really trying to build artificial intelligence and do it on a large scale. It’s not as far off as people think.
What did he mean? As some refer to NLP as artificial intelligence, is it safe to assume Page was actually announcing that Google is working on a natural language processing search engine?
It’s pretty clear why Powerset and hakia don’t define themselves as natural language processing search engines, preferring the semantic search engine label instead. After all, they index meaning much the same as Google does – which doesn’t make a big fuss out of this accomplishment.
Relative Brains and Brawn
No one could ever suggest that the competition is short on brain power, as their innovations and credentials reveal a good deal of ingenuity and quality. However, a few million dollars in the hands of brilliant people ultimately will not be able to topple a leviathan with billions behind equally brilliant minds.
Now that Powerset is backed by Microsoft and its unlimited resources, Google might take pause for a boosted development of merit. However, Google’s parity with even the best relevant competitor combined with its mountainous lead in market share still presents a summit that is perhaps out of reach.
Because Google works in silence – constantly improving its algorithms and never revealing too much about its technology – it’s hard to say who or what will dethrone the currently undisputed king of search. People appear to be far less concerned with relevance and the look of an organization than with Google as a familiar, popular, resourceful, and effective search engine. In my opinion, if Google falls it will not be because of a new search engine; it’s more probable that the Empire would fall from within, from self-inflicted blunders, irreparable miscalculations, and bad investments.