Building An Open Source, Distributed Google Clone

Disclosure: the writer of this article, Emre Sokullu, joined Hakia as a Search Evangelist in March 2007. The following article in no way represents Hakia’s views – it is Emre’s personal opinions only.

Google is like a young mammoth, already very strong but still growing. Healthy quarter results and rising expectations in the online advertising space are the biggest factors for Google to keep its pace in NASDAQ. But now let’s think outside the square and try to figure out a Google killer scenario. You may know that I am obsessed with open source (e.g. my projects openhuman and simplekde), so my proposition will be open source based – and I’ll call it Google@Home.

First let me define what my concept of Google@Home is. Briefly, Google@Home is an open source, distributed clone of Google. We already have many open source search engine projects – Apache Lucene (which is composed of Nutch and Hadoop distributed file system sub-projects) being the most credible one. So this Google@Home concept can be based on one of those open source search engines. Of course it will have a long way to go before reaching Google’s utility and reach. But more importantly, Google@Home will be a distributed, decentralized system. What this means is that our desktop computers’ idle time will become a part of this new search engine’s computational power. In effect this allows it to compete with Google’s beefy data centers. This is not a new concept either, SETI@Home and Folding@Home are 2 well known scientific projects that use the same grid computing idea in their cores. Indeed Google itself is the biggest supporter of Stanford University based Folding@Home, by dedicating the resources of their toolbars to this project.

Comparison to Wikiasari

The distributed nature of the engine is what makes it different from Wikipedia co-founder Jimmy Wales’ Wikiasari project, which is an open source wiki-inspired search engine. While Wikiasari’s power may come from Wikipedia, its weakest chain is too much human dependency; the power of masses worked well in the open, community driven encyclopedia project, Wikipedia. But vandalism has still been present – albeit at a manageable level. I’m not sure if this can work so well in search engines though.

Why an open source search engine?

Well the concept is clear, but you may wonder about the motivation behind it – why would anyone, an organization or a loosely formed group of people, unite around such a project; and why would people dedicate their computer’s’ idle time to this? Here are some reasons:

A search engine is a platform and should be open, just like operating systems. Do you remember Alex’ post on the image search space? By using himself as an example, he tried to prove how lame current image search engines are. The first comment to his entry was from me, and I told him this problem could be solved with open information access and some face recognition algorithms – just like Riya is trying to do. Well, unfortunately we don’t have open access to search engine databases, all we have is the directory dmoz – which is clearly insufficient. Currently, most search engines APIs lock themselves off at predefined low limits of daily queries.
Need for a better search engine – collaborative work can always yield better results. Imagine a system where researchers from all around the world, and Google competitors, would contribute to. This would create a bigger brains trust than the one in Mountain View. This is again similar to what’s happening with Windows today. Microsoft has one of the world’s biggest tech talent pools in their campuses all around the world, but it’s impossible to compete with the whole world! And that’s why Linux is a clear leader in the server space, and keeps leaping forward in the desktop arena too – see latest Dell’s Ubuntu Linux deal and the 3D Linux desktops.
Privacy is a big concern – as the founder of openhuman, this argument surely doesn’t apply to me, but it’s a fact that many people are scared by the idea of being watched by the big G’s eyes. And Google’s compromises in the Chinese market have pushed people to think one more time before giving their noisy, but still useful, search history data to Google. Google’s Matt Cutts recently wrote an interesting post on his company’s approach to privacy – but there are still remaining questions in my mind. Google is vulnerable to give up its huge stack of information when presented with subpoenas.
Growing number of competitors – not everyone is happy with Google’s rise on NASDAQ. Case in point: the latest Yahoo – Microsoft – eBay partnership deal. Google, instead of creating new markets just like Amazon does with its artificial artificial intelligence projects and S3 – EC2, is competing heavily with Yahoo, eBay, Amazon and Microsoft. Also many startups are unhappy with Google disrupting their business and not rewarding their innovation. The best examples are Google Calendar and the broken dreams of 30 Boxes, Kiko and others. Also Google Spreadsheets and lately the situation with Google Toolbar and StumbleUpon. This was again what happened to Microsoft in the 80’s and 90’s – when they disrupted Sun, IBM, HP and others.

Who would create an open source Google clone?

Perhaps, Google itself. Or Google competitors such as Ask or Yahoo. Also it might be something that P2P kings Niklas Zennstrom and Janus Friisk are up to – besides their Joost project. Everything is possible, but in my opinion the most plausible option would be a joint attack by direct competitors. Indeed perhaps the best fit would be the classic “closed source” company Microsoft!! This could be a mirror response to Google, who up till now has leveraged most of its PR towards Microsoft’s ‘evil’ closed source approach (i.e. the subtle ‘do no evil’ mantra of Google). Stranger things have happened.

Revenues

Another idea, this Google@Home project can make more use of power of masses in its core – Google is still reluctant to use the direct power of masses idea in its search. Yahoo, on the other hand, with their new unified Social Search Unit seems more ambitious in this arena. As a total underdog, Google@Home would be more open to such innovations and could probably profit from these new paradigms.

How could you support this type of search engine with a complementary distributed and open source ad network? Baris Karadogan has more about this in his blog. (I met him at a conference last week and it turned out that surprisingly we hatched and blogged about these similar concepts at the same time!)

Conclusion

Yes. this is my ‘Google killer’ scenario. There are many open questions though – some of them are:

Is this really feasible (I think yes) – but your technical input is welcome
Are there any projects already doing this?
Would it really be a Google killer, or would the user base stay limited to geeks only?

Let us know what you think, and also your ‘Google killer’ scenarios too!

Disclosure: Emre Sokullu now works for Hakia, as a Search Evangelist. He started at Hakia in March 2007.

Building An Open Source, Distributed Google Clone

Comparison to Wikiasari

Why an open source search engine?

Revenues

Conclusion

Most Popular Gambling Stories

Latest News

Two British Scattered Spider hackers jailed over London attack linked to MGM and Caesars breaches

Washington and Tulalip reach tentative gaming compact update with modern safeguards

Evolution agrees to pay $6.4M settling UK license review investigation

Kalshi says it flagged and referred Trump teleprompter operator trades to CFTC as White House responds

Popular Topics

Building An Open Source, Distributed Google Clone

Comparison to Wikiasari

Why an open source search engine?

Revenues

Conclusion

About ReadWrite’s Editorial Process

Related News

Most Popular Gambling Stories

Latest News

Two British Scattered Spider hackers jailed over London attack linked to MGM and Caesars breaches

Popular Topics<img width="16" height="17" src="https://readwrite.com/wp-content/themes/twentytwentyone-child/images/Arrow-right.svg" alt="Arrow right.svg"/>

Popular Topics