Home Children’s photos are being ‘illegally used to train AI’

Children’s photos are being ‘illegally used to train AI’

Personal photos of Brazilian children are being used without their knowledge or consent to develop sophisticated artificial intelligence (AI) tools, according to Human Rights Watch. These images are reportedly collected from the internet and compiled into extensive datasets, which companies use to improve their AI technologies.

Consequently, these tools are then said to be employed to produce harmful deepfakes, increasing the risk of exploitation and harm to more children.

Hye Jung Han, a children’s rights and technology researcher and Human Rights Watch advocate said: “Children should not have to live in fear that their photos might be stolen and weaponized against them.

“The government should urgently adopt policies to protect children’s data from AI-fueled misuse,” she continued, warning, “Generative AI is still a nascent technology, and the associated harm that children are already experiencing is not inevitable.”

She added: “Protecting children’s data privacy now will help to shape the development of this technology into one that promotes, rather than violates, children’s rights.”

Are children’s images being used to train AI?

The investigation revealed that LAION-5B, a major dataset used by prominent AI applications and compiled by crawling vast amounts of online content, includes links to identifiable images of Brazilian children. However, the firm told ReadWrite that the model has been taken down.

The images often included the children’s names either in the caption or the URL where the image is hosted. In various examples, it was possible to trace the children’s identities, revealing details about the time and place the photos were taken.

Human Rights Watch identified 170 images of children across at least 10 Brazilian states including Alagoas, Bahia, Ceará, Mato Grosso do Sul, Minas Gerais, Paraná, Rio de Janeiro, Rio Grande do Sul, Santa Catarina, and São Paulo from as far back as the mid-1990s. This figure likely represents just a fraction of the children’s personal data in LAION-5B, as only less than 0.0001 per cent of the dataset’s 5.85 billion images and captions were reviewed.

The organization claims that at least 85 girls from some of these Brazilian states have been subjected to harassment. Their classmates misused AI technology to create sexually explicit deepfakes using photographs from the girls’ social media profiles and subsequently distributed these manipulated images online.

LAION responds to claims

In response to ReadWrite’s queries, LAION, the German AI nonprofit overseeing LAION-5B, acknowledged that the dataset used some URL links that included children’s photos identified by Human Rights Watch and committed to removing them.

However, it contested that AI models trained on LAION-5B could exactly copy personal data. LAION also stated that it was the responsibility of children and their guardians to delete personal photos from the internet, arguing this was the best way to prevent misuse.

In December, Stanford University’s Internet Observatory reported that LAION-5B contained thousands of suspected child sexual abuse images. LAION took the dataset offline and released a statement saying it “has a zero tolerance policy for illegal content,” reiterating that it offered “only links to content on the public web, with no images.”

A LAION spokesperson told ReadWrite: “LAION is currently working with the Internet Watch Foundation, the Canadian Centre for Child Protection, Stanford, and Human Rights Watch to remove all known references to illegal content from LAION 5B. We are grateful for their support and hope to republish a revised LAION 5B soon.”

However, they added: “LAION 5B was built from a free and open index of the public web. It has never contained images because it is a dataset of URL links and text pointing to images that are hosted by the parties responsible for putting those images online. As such, removing links from a LAION dataset does not remove this content from the web. This is a larger and very concerning issue, and as a non-profit, volunteer organization, we will do our part to help.”

LAION is currently collaborating with Human Rights Watch and others to publish a new, safer version of the dataset.

UPDATED: Response from LAION has been added to article.

Featured image: Canva / Ideogram

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Suswati Basu
Tech journalist

Suswati Basu is a multilingual, award-winning editor and the founder of the intersectional literature channel, How To Be Books. She was shortlisted for the Guardian Mary Stott Prize and longlisted for the Guardian International Development Journalism Award. With 18 years of experience in the media industry, Suswati has held significant roles such as head of audience and deputy editor for NationalWorld news, digital editor for Channel 4 News and ITV News. She has also contributed to the Guardian and received training at the BBC As an audience, trends, and SEO specialist, she has participated in panel events alongside Google. Her…

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.