Personal photos of Brazilian children are being used without their knowledge or consent to develop sophisticated artificial intelligence (AI) tools, according to Human Rights Watch. These images are reportedly collected from the internet and compiled into extensive datasets, which companies use to improve their AI technologies.
Consequently, these tools are then said to be employed to produce harmful deepfakes, increasing the risk of exploitation and harm to more children.
📢NEW: The personal photos of Brazilian children are being secretly used to build powerful AI tools.
Others are then using these tools to create malicious deepfakes, putting even more children at risk of serious harm.https://t.co/Bf3f8SOK5M
— Hye Jung Han (@techchildrights) June 10, 2024
Hye Jung Han, a children’s rights and technology researcher and Human Rights Watch advocate said: “Children should not have to live in fear that their photos might be stolen and weaponized against them.
“The government should urgently adopt policies to protect children’s data from AI-fueled misuse,” she continued, warning, “Generative AI is still a nascent technology, and the associated harm that children are already experiencing is not inevitable.”
She added: “Protecting children’s data privacy now will help to shape the development of this technology into one that promotes, rather than violates, children’s rights.”
Are children’s images being used to train AI?
The investigation revealed that LAION-5B, a major dataset used by prominent AI applications and compiled by crawling vast amounts of online content, includes links to identifiable images of Brazilian children. However, the firm told ReadWrite that the model has been taken down.
The images often included the children’s names either in the caption or the URL where the image is hosted. In various examples, it was possible to trace the children’s identities, revealing details about the time and place the photos were taken.
Human Rights Watch identified 170 images of children across at least 10 Brazilian states including Alagoas, Bahia, Ceará, Mato Grosso do Sul, Minas Gerais, Paraná, Rio de Janeiro, Rio Grande do Sul, Santa Catarina, and São Paulo from as far back as the mid-1990s. This figure likely represents just a fraction of the children’s personal data in LAION-5B, as only less than 0.0001 per cent of the dataset’s 5.85 billion images and captions were reviewed.
The organization claims that at least 85 girls from some of these Brazilian states have been subjected to harassment. Their classmates misused AI technology to create sexually explicit deepfakes using photographs from the girls’ social media profiles and subsequently distributed these manipulated images online.
LAION responds to claims
In response to ReadWrite’s queries, LAION, the German AI nonprofit overseeing LAION-5B, acknowledged that the dataset used some URL links that included children’s photos identified by Human Rights Watch and committed to removing them.
However, it contested that AI models trained on LAION-5B could exactly copy personal data. LAION also stated that it was the responsibility of children and their guardians to delete personal photos from the internet, arguing this was the best way to prevent misuse.
In December, Stanford University’s Internet Observatory reported that LAION-5B contained thousands of suspected child sexual abuse images. LAION took the dataset offline and released a statement saying it “has a zero tolerance policy for illegal content,” reiterating that it offered “only links to content on the public web, with no images.”
A LAION spokesperson told ReadWrite: “LAION is currently working with the Internet Watch Foundation, the Canadian Centre for Child Protection, Stanford, and Human Rights Watch to remove all known references to illegal content from LAION 5B. We are grateful for their support and hope to republish a revised LAION 5B soon.”
However, they added: “LAION 5B was built from a free and open index of the public web. It has never contained images because it is a dataset of URL links and text pointing to images that are hosted by the parties responsible for putting those images online. As such, removing links from a LAION dataset does not remove this content from the web. This is a larger and very concerning issue, and as a non-profit, volunteer organization, we will do our part to help.”
LAION is currently collaborating with Human Rights Watch and others to publish a new, safer version of the dataset.
UPDATED: Response from LAION has been added to article.
Featured image: Canva / Ideogram