Home How Yahoo’s Latest Acquisition Stole & Broke My Heart

How Yahoo’s Latest Acquisition Stole & Broke My Heart

“What do you think about Dapper?” That was the question it felt like everyone asked me for weeks after I wrote up a startup called Dapper.net on TechCrunch in the Summer of 2006. “Create an API for any website!” was the company’s unofficial slogan. Almost no one understood exactly what could be done with this powerful point-and-click tool, but everyone I talked to knew it was exciting.

Last week the company was acquired by Yahoo and brief press coverage of the deal called Dapper simply a semantic advertising platform. It was so much more than that, especially for me. Dapper set my imagination on fire, it powered acts of community management magic and it helped me meet Neil Young in person. We spent many long nights together. Four years after I first wrote about it, I still bring Dapper up in conversation frequently – but for a while now it’s been part of a story of heartbreak and caution.

What Dapper Does

Here’s how I described the core service when it launched, in August 2006:

Here’s how it works. Users identify a web site they are interested in extracting data from and view it through the Dapper virtual browser. [Co-founder Jon] Aizen showed my how to do it using Digg as an example. I clicked on a story headline, on the number of diggs and the URL field. I went to another page on the same site and did the same thing so that Dapper could clearly identify the fields I was interested in.

I then went through the various tools available on the site to set certain conditions and threshholds and ended up with XML feeds I could do all kinds of things with. Like send me an email whenever there’s a TechCrunch story on the front page of digg, or when a search results page shows a TechCrunch story with more than 10 diggs. After I create an end product through the site, other users will be able (after a 24 hour period in which I can edit the project) to use my project either as is, altered to fit their needs or in the future, in combination with other projects.

Below, a 4 minute video demonstrating Dapper that I recorded on New Year’s Eve 2007, after Wired Magazine wrote a post slamming web scraping. I had a sore throat, it was a holiday (on the next New Years I eloped to my living room and got married) but it was important that scraping be defended – a screencast had to be made! It was important.

In February, 2008 the startup held an event called DapperCamp in San Francisco. It was sponsored by IBM and MindTouch, because those and other companies were exploring ways to move data around from static websites into dynamic processes using Dapper.

The event was fabulous. I was the least technical person there, but I flew down my young cousin on my Dad’s side, a developer in training, for his first experience in the Bay area web geek scene. We had a great time and worked late into the night sitting in a little bar brainstorming ideas and scraping feeds from websites.

Our best idea was this: Yahoo’s service MyBlogLog tracked users who navigated to any participating website and upon visiting a site for the 3rd time, a user appeared in a field labled “New Fans” on your site’s MyBlogLog page. We used Dapper to scrape an RSS feed of the usernames of all the new people appearing as fans, people who had just made their 3rd visit to ReadWriteWeb, and we set up a workflow to email those people and welcome them to the community here. It was awesome.

We scraped a feed of the most bookmarked ReadWriteWeb pages in Delicious, a feed of RWW stories submitted to Digg and the number of Diggs they had. We monitored those feeds in a dashboard. We scraped feeds, .csv files, image slideshows and more. It was wonderful.

How Dapper Helped Me Meet Neil Young

I love Neil Young, I always have. In my early twenties I hitchhiked all around the country listening to a tape I’d recorded of Neil Young’s Greatest Hits albums that I’d checked out from the public library, until the tape was worn out and unlistenable. It was my personal soundtrack for years.

Years later, I work on the internet. In my personal consulting practice, I used Dapper a lot. I used it in working with a group of accountants to scrape feeds of news updates posted to old-fashioned government agency websites that had no feeds. I once subcontracted as a consultant to a consultant to a consultancy to an analyst service serving a pharmaceutical company. (I thought that was far enough removed that I wouldn’t get any on me, but none the less at my first meeting an executive said to me ominously “welcome to Big Pharma.”)

It turns out the client at the end of the long pipeline of invoices sold a diet pill, and young women were complaining on MySpace and forums that the pill sometimes caused leakage from their…and I showed the next consultant in line how to use Dapper to scrape the forums for a feed monitoring said customer complaints. The check cleared and I never went back, but I still thank Dapper for making that work possible. If stranger things were ever piped through the service, I don’t know what they were.

And then Dapper helped me meet Neil Young, in person.

I was working on a blog monitoring project for Sun Microsystems, building a web page that displayed the most recent and the most-talked about blog posts from around the web about 12 different Sun technologies, for use during the company’s huge user conference.


As a part of that work, I was grabbing a feed from Google Blogsearch for long search queries like “Sun+Java-Indonesia….” etc. Google Blogsearch’s own RSS feeds were all full of cruft, though. HTML bolding the search terms in the description field, and more. Not being a developer myself, I couldn’t figure out how to strip that all out. I spent several nights pulling out my hair, worried I wouldn’t be able to create something that was production-ready for this big client.

I tried Yahoo Pipes, I tried other blog search engines, but what I ended up doing was using Dapper to scrape a new feed from the search results pages. Those feeds were nice and clean to display on the project website.

This wasn’t an easy thing to figure out. I tried many different strategies before discovering that, with help from the guys at Dapper even. As the project proceeded, my contact at Sun came to me and said (paraphrasing) – “Marshall, it looks like you’re going to be able to pull this off after all, but I wonder if you could add one more search query and module to the end product. It is very, very top secret though and you cannot tell anyone about it.”

I said of course I could do that, what was the search query?

“Neil Young,” she said.

Of course I was more than happy to do that. It turned out that the big splashy secret announcement at Sun’s conference was that Neil Young was going to make a surprise appearance on stage to unveil the first ever collection of his entire life’s work, including letters he’d written, scanned-in notes from studio recording sessions, video interviews and of course all his music. All those materials would be made available on Blu-Ray, the media storage format that runs in all media players required to use Sun’s Java software.

I built a long search query that would automatically deliver the best feed of search results about Neil Young’s news that I knew nothing about yet, and included it in my deliverables. The project was completed days before the big conference and it was exhausting.

Just before the conference began, my Sun contact called me and said, “can we fly you down to the event for an interview with Neil Young as thanks for all your hard work?”

And that’s how Dapper made it possible for me to meet Neil Young. We talked about electric cars (his new passion), about MP3 audio quality, about DRM and more. It was great.

I used Dapper for many, many different things. I still use it regularly (I used it last night, in fact) and if I could stop time and geek out for an evening with no obligations, I’d still probably spend that time playing with Dapper or the similar new tool NeedleBase.

Isn’t That Just an Ad Network?

When Dapper was acquired by Yahoo last week, all the news coverage was brief and called the service a semantic advertising platform. How tragic! Co-founder Eran Shir wrote last week about the acquisition and said that the Dapper team always envisioned themselves making the display advertising world a more meaningful place. If that’s true I’m disappointed. That sure wasn’t what the service’s earliest adopters wanted to use it for.

In February 2008 Dapper announced at its DapperCamp event that it would be launching an advertising technology. The Dapp Factory, as it was called, would not longer just be used to extract data for an undetermined purpose – it would be used to target contextual relevance for ad placement.

A mere 35,000 “Dapps” to perform extraction had been built and the company was struggling to be financially viable. It was a confusing service with a challenging interface on top of a radically new user paradigm. The only clear solution was to become an ad network. To fund the semantic indexing of text fields around the web by turning some of them into advertisements.

It’s cool. I’m ad-supported. But Dapper had promised more than that. It had promised to be an easy and powerful tool that anyone, with no technical skills, could use to render any web page dynamic, to monitor particular fields in pages for updates automatically, to pull sets of data off of pages around the web. It’s magic.

It was beautiful, but people didn’t want it, they didn’t understand it. Because people are stupid. It’s maddening. If you tell people: take this tool, use it to get real-time notifications of changes to the tiniest part of any web page, use it to pull down sets of data from the web with a snap of your fingers, use it to work fast and get first movers’ advantage. Scrape, then grab the fruits of that scraping, then enjoy a fast-growing career and meet your childhood musical heroes! But no, if there’s an unclear step between a technology of empowerment and profit, a step that requires creativity and hard work, then the market at large throws a fit and demands that profit be instead put directly into its spoiled-child’s hand. “I want an ad network!” people say, effectively, “Give me the money directly!”

Dapper as Parable

A beautiful web technology is like a little fairy, whose light shines bright for a short time and then extinguishes. Enjoy it while you can, until an uncaring market starves it to death and it turns into an ad network, for lack of viable alternatives.

Dapper still lets you scrape feeds using its legacy product. Hopefully Yahoo won’t shut that down, if it allows any of the service to survive. But imagine how much more powerful (and stable) this beautiful service might have been if the company could have found a way to monetize its core feed scraping and publishing product. If that had remained the top development priority.

The same thing happens time and time again. “Your technology is too wonderful,” I sometimes tell the most inspiring startups I interview before they launch. “No one will understand how to capture the incredible value you deliver. Your sales people will pound their heads against a wall for months. And then you will become an ad network.”

Companies laugh uneasily. Perhaps because they know how likely it is that I’m right. (Perhaps because they think I’m a creep who ought to be perfectly happy for them if they can manage to build a viable ad network.)

I told Factery Labs that when I saw its demo. That startup provides an API that you can throw any URL at and get in response a feed of “fact-type sentences” extracted from the text behind the submitted URL. It’s awesome. Twitter client Sobees, for example, uses it to offer text summary previews of any links shared by your friends on Twitter. It’s great – but what are the odds that Factery is going to turn into an ad network? I think they are pretty good.

I told the company that and they said, “what’s your shirt size?”

I told them, and a week later a package showed up at my door from Cafe Press. In it was a hooded sweatshirt with the Factery Labs robot logo screen printed on the back of it. Around the logo circled the words: “Factery Labs – Not an Ad Network Yet.”

It’s a cautionary tale – tell people that anyone can blog or Tweet, post a photo or a video, and you will change the world. Tell people that anyone can now extract text and data, process it automatically and treat web content like bowling pins, torches and knives in a capable juggler’s hands. Not enough people, at least so far, will care. You will likely become an ad network.

Maybe that will change someday. Or maybe these freaky little services will remain forever like short-lived fairies, destined to be extinguished before their time.

Either way, I had a lot of great times with Dapper. I hope that technology like it will never stop being born.

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.