Home Quora Blocks Startup Search Engines

Quora Blocks Startup Search Engines

The popular startup question and answer service Quora only allows the largest search engines to index its site. As Gabe Rivera of Techmeme pointed out yesterday, its robots.txt file explicitly grants Google, Bing, Blekko and other big players access, but excludes everyone else. If large sites had these restrictions back when Google was starting, it might never have succeeded and we’d still be stuck with Altavista. As more publishers move to this whitelist approach, are they stifling innovation?

Gabriel Weinberg has been struggling to persuade Facebook to add his DuckDuckGo search engine to their list of approved crawlers, with no luck. Concerned about mining of their public profiles, last year Facebook started requiring search engines to sign a legal agreement covering the usage of their data. Unfortunately it seems like the process has turned into a barrier for fledgling search companies like Gabriel’s.

Despite being happy to enter into that contract, he hasn’t heard back after several months. While he’s still able to show Facebook pages thanks to API partners like Bing, this leaves him unable to run his own algorithms to optimally rank and display the results. He’s frustrated by the trend towards whitelisting, pointing out that malicious or underhand scrapers ignore the policy file and says “Bad bots don’t respect it anyway”. In his view it’s a big drag on innovation too – “really you’re just hurting startups that may use your data in cool ways”.

Both Quora and Facebook offer APIs to access their data, so why do startups need to crawl their sites? After all, web page scraping is often associated with unsavory scammers and copyright infringers. The real loss is that APIs only allow you to ask the questions that the interface designers have anticipated. For example, Gabriel was hoping to build directories listing the Facebook pages for local businesses by location and type, together with snippets of information about them, just as he does for other categories of sites on the web. There’s no way to gather that information through the Facebook API, so without crawling access he’s unable to implement that feature.

As traditional search companies struggle to pull relevant results from an increasing deluge of low-quality content, we need innovative startups to pioneer new approaches. Without the openness that made it possible for Google to grow, the next big thing in search may never happen.

Photo by David Goehring

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.