Home Amazon looks into Perplexity AI for alleged content scraping

Amazon looks into Perplexity AI for alleged content scraping

tl;dr

  • Amazon is reviewing claims that Perplexity AI scraped content from sites without permission.
  • Perplexity, which uses AWS, is accused of violating robots.txt instructions that forbid data scraping.
  • Forbes alleges Perplexity copied their articles with "eerily similar wording" and insufficient source attribution.

Amazon’s cloud division is reportedly looking into Perplexity AI, an artificial intelligence search startup, to determine if it is violating Amazon Web Services (AWS) rules by scraping websites that have blocked such activities. An AWS spokesperson confirmed to ReadWrite that the company told WIRED it was reviewing the information the publication provided in a media inquiry, as it does with all potential violations of its terms of service. Perplexity operates using servers provided by Amazon’s cloud service.

According to a report by WIRED, Perplexity AI appears to use content from websites that have explicitly prohibited scraping through the Robots Exclusion Protocol, a common web standard. While this protocol is not legally binding, terms of service typically are. AWS stated that they are reviewing the information provided by WIRED as part of their standard procedure for handling reports of potential terms of service violations. To comply with Amazon’s terms, customers are supposed to adhere to robots.txt files when web crawling.

These files are generally used on websites to instruct bots and web crawlers to refrain from scraping their data, whether for generative AI tools or other uses.

“AWS’s terms of service prohibit abusive and illegal activities and our customers are responsible for complying with those terms. We routinely receive reports of alleged abuse from a variety of sources and engage our customers to understand those reports,” the representative said.

Has Perplexity AI plagiarized content?

Forbes claimed that Perplexity has been accessing content from websites that explicitly prohibit such scraping practices.

Forbes’ editor and chief content officer, Randall Lane, charged Perplexity with committing “cynical theft,” accusing the company of creating “knockoff stories” that contain “eerily similar wording” and “entirely lifted fragments” from its articles.

He added: “More egregiously, the post, which looked and read like a piece of journalism, didn’t mention Forbes at all, other than a line at the bottom of every few paragraphs that mentioned ‘sources,’ and a very small icon that looked to be the ‘F’ from the Forbes logo – if you squinted.”

The San Francisco-based AI search startup, Perplexity, once celebrated by top tech investors like Amazon’s Jeff Bezos, has recently faced scrutiny over plagiarism accusations.

Aravind Srinivas, CEO of Perplexity, denied allegations that his company was “ignoring the Robot Exclusions Protocol and then lying about it.” Srinivas acknowledged to Fast Company that Perplexity does use third-party web crawlers in addition to its own, and confirmed that the bot identified by WIRED was among them.

However, he added, “It was accurately pointed out by Forbes that they preferred a more prominent highlighting of the source.” Srinivas also mentioned that sources are now more prominently spotlighted.

ReadWrite has reached out to Amazon and Perplexity for comment.

UPDATED: Comment and clarification from Amazon has been added on July 2.

Featured image: Canva / Perplexity AI

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Suswati Basu
Tech journalist

Suswati Basu is a multilingual, award-winning editor and the founder of the intersectional literature channel, How To Be Books. She was shortlisted for the Guardian Mary Stott Prize and longlisted for the Guardian International Development Journalism Award. With 18 years of experience in the media industry, Suswati has held significant roles such as head of audience and deputy editor for NationalWorld news, digital editor for Channel 4 News and ITV News. She has also contributed to the Guardian and received training at the BBC As an audience, trends, and SEO specialist, she has participated in panel events alongside Google. Her…

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.