Home New Startup Analyzes 100,000 Web Pages With a Snap of Your Fingers

New Startup Analyzes 100,000 Web Pages With a Snap of Your Fingers

Machine processing of large quantities of unstructured text, to discover media mentions, relationships between entities and sentiment analysis need not be priced out of the range of the everyday web lover or small business.

Tonight two Texas companies announced a collaboration that brings exactly that to market, at a disruptively low price. Web crawling service 80Legs and Natural Language Processing service Language Computer Corporation have combined their efforts to create Extractiv, a web crawling and semantic analysis service offered at an affordable price. I’ve already put it to use to perform some awesome bulk text analysis for my own work.

Above: Extractiv correctly identified the people, places and dates in my article today about Jay Adelson’s new job. It only misidentified one geek as an athlete, not bad. Picture this analysis spread over hundreds of thousands or millions of documents and you are, as they say, cooking with gas.

Testing the Tool

To test Extractive, I gave the company a collection of more than 500 web domains for the top geolocation blogs online and asked its technology to sort for all appearances of the word “ESRI.” (The name of the leading vendor in the geolocation market.)

The resulting output included structured cells describing some person, place or thing, some type of relationship it had with the word ESRI and the URL where the words appeared together. It was thus sortable and ready for my analysis.

The task was partially completed before being rate limited due to my submitting so many links from the same domain. More than 125,000 pages were analyzed, 762 documents were found that included my keyword ESRI and about 400 relations were discovered (including duplicates). What kinds of patterns of relations will I discover by sorting all this data in a spreadsheet or otherwise? I can’t wait to find out.

That work took the machine about an hour and would have cost me less than $1, after a $99 monthly subscription fee. The next level of subscription would have been performed faster and with more simultaneous processes running at a base rate of $250 per month.

The machine isn’t perfect – but it looks very impressive for having just launched this evening. Would I use Extractiv for my bulk text analysis again in the future? Of course I would, in fact I intend to start thinking about what text I’d like analyzed next immediately.

This sort of service represents an incredible vision of the future: commodity level, DIY analysis of bulk data produced by user generated or other content, sortable for pattern detection and soon, Extractiv says, sentiment analysis.

The People Behind the Technology

80Legs is lead by CEO Shion Deysarkar, a former oil industry computer scientist turned social network data hacking entrepreuer whom we profiled this Spring. (Thoughts From the Man Who Would Sell The World, Nicely) Deysarkar and 80Legs CTO Toan Duong describe themselves online as employed by Creeris Ventures, a Houston venture capital firm with a diverse portfolio including grid computing, jet airplanes and litigation.

The Extractiv collaborators Language Computer Corporation include John Lehmann, CEO at LCC since September and President at Extractiv. Also co-founding the company is Andy Hickl, an NLP expert of the highest order, most recently of question-answering machine Swingly.

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.