Google Custom Search Engine or CSE, the product exposes the API behind the world's most powerful search engine. Why is Google offering this API? How can it be used? And what is the connection to vertical search? We explore the what, how, and why of Google CSE in this post.Google already dominates the web search market, with between approximately 55% and 65% of the market depending on who you ask. The company's flagship product has been responsible for its phenomenal growth and everyone knows that Google made its fortune by tying its genius search algorithm to advertising. It is perhaps less known, however, that the web giant has opened its search engine for use on any web site, by any service. Dubbed
The Basics of Google Custom Search
You can think of Google Custom Search as a filter over the main Google search engine. This is a bit of a simplification, but a good way to initially wrap your heads around the concept. By creating a filter, CSE allows its users to restrict search results to particular sites that match URL patterns or keywords.
The resulting engines can be searched via API or a search box that users can place on their web sites. The monetization strategy for Google is straightforward, and not surprisingly it is based on ads. Unless used for academic purposes, custom search engine results display contextual ads just like the regular search engine results do. Creators of custom search engines can earn a cut of the ad revenue by linking their engine to an AdSense account.
For example, you can create a custom search engine that only searches one site. Many sites, have done that, instead of building their own search solution. Another thing that you can do is to restrict the search to a specific list of sites, in essence creating a vertical search engine, which we will discuss at length below.
Custom engines can be created and managed using a simple visual interface or, for more advanced users, an XML file. The UI version is essentially a wizard where the user is prompted to fill in the basic information about their search engine, a list of sites for the engine to index, and to define look and feel of the search and results pages and configure other advanced options. You can make the search engine private or have it listed in Google's custom search directory. Interestingly, you can invite other people to collaborate with you on creating your search engine. The process of creating an engine takes just a few minutes, and when you're done you get a page that looks a lot like Google itself with just a search box.
Custom Search Engine In Action
For this example we created a search engine for music reviews by telling Google Custom Search to index only sites that feature music reviews. In a way, this is like teaching Google semantics, because the sites that we hand pick contain mostly content for music reviews. There are two major types of sites that we picked - music magazines and music review blogs.
We then searched for a recent album by Josh Ritter - "Historical Conquests of Josh Ritter." The results from CSE only have links to the album review pages:
If we were to search Google directly with exact same phrase, we would not get just reviews. The matches there would lead to Wikipedia, the artist's home page, and album links at various retail sites, all mixed with the review pages. Interestingly, when we added the word 'review' to the search, the results from Google were similar to the ones returned by our custom search engine.
Still, the results returned by the specialized engine were more precise and targeted. The key to good results is a good selection of sites. The more high quality music review sites that we add to this engine, the better it will perform. It does not need to be a large number of sites, however. Even our initial set of 20 high-quality sites returned good results for a lot of recent music albums.
Powering Up Vertical Search
Google Custom Search Engine is a platform for building vertical search engines. What if the engine contained links to electronic sites, would it be close to Retrevo? Imagine keying every active blog on the Internet into a custom search engine (there is an API, so the process does not need to be manual). Could that yield a search engine that compares to Technorati or Google's own Blog Search? The answer is - very likely. Consider an example of a startup that is doing just that.
Colorado-based Lijit, allows people to search the web through the experiences of other people. One of Lijit's core ideas is that each of us is an expert in a particular area. For example, Brad Feld is an expert in Venture Capitalism and Investment. When you are looking for quality information about venture capital, it makes sense to ask Brad. Lijit's search engine does exactly that by searching through the various pieces of Brad Feld's online existence, including his blog, del.icio.us bookmarks, and Facebook profile, etc.
Behind the scenes, Lijit actually creates an instance of Google Custom Search Engine to do the search. This engine is configured with links to blogs, social network profiles, photos, videos and everything else that defines a person as a vertical. By leveraging Google's infrastructure, Lijit has given themselves a huge jump start. If they had to actually build a crawler, likely all technical efforts would be consumed doing that. Instead, the team built on top of Google's offering and focused on presenting the best way to search through online personal experiences.
Vertical Search Is Reduced To UI
Lijit's example naturally leads us to the this question: What is the impact that Google CSE has on the vertical search space? Does it make it a commodity? Not entirely, but it does commoditize the infrastructure. There is no longer any need to build custom crawler. Crawling and indexing web sites and other online information is a huge problem that requires a lot of resources, and even if you have them, there exists a very real chance of not being able to get it right. Look at Microsoft -- they still can't crack it.
So if the infrastructure problem is solved, the innovation is pushed up to the UI level. How the results are presented is what can make a difference. For example, Retrevo further clusters results on their vertical search engine into different categories, and distinguishes reviews, product manuals, etc. It adds semantical understanding not only to the filtering of the underlying sites, but also to the presentation of the results. Given that filtering can be done using Google CSE, the innovation is basically in the presentation of the results.
Google CSE is an interesting piece of web infrastructure. On one hand, it simply opens up a different use for Google's core technology. On the other hand, though, it commoditizes the backend of any vertical search engine. However, we think that it's more of a blessing than a problem for the vertical search players, as they can now focus on their core specialty - presentation of the results in the given domain.
Please share with us interesting examples of Google CSE that you've seen online and tell us your thoughts about what Google CSE means for the vertical search space.