A potentially devastating supply chain cyberattack targeting AI leader HuggingFace was recently averted, according to a recent VentureBeat report. However, the incident highlights lingering vulnerabilities in the rapidly growing field of generative AI.
During a security audit of GitHub and HuggingFace repositories, Lasso Security researchers discovered over 1,600 compromised API tokens that could have enabled threat actors to mount an attack. With full access, attackers could have manipulated popular AI models used by millions of downstream applications.
“The gravity of the situation cannot be overstated,” said Lasso’s research team. “With control over an organization boasting millions of downloads, we now possess the capability to manipulate existing models, potentially turning them into malicious entities.”
Hugging Face is a leading provider of pre-trained models and datasets.
As a leading provider of pre-trained models and datasets for natural language processing, computer vision, and other AI tasks, HuggingFace has become a high-value target. The company’s open-source Transformers library hosts over 500,000 models relied on by over 50,000 organizations. Attackers are keenly aware that poisoning HuggingFace’s data and models could have an outsized impact across industries implementing AI.
Lasso’s audit focused on API tokens, which serve as keys allowing access to proprietary models and sensitive data. By scanning public code repositories, researchers identified hundreds of exposed tokens, many granting write access or full admin privileges over private assets. With control of these tokens, attackers could have exfiltrated or corrupted AI models and supporting data.
The findings highlight three emerging risk areas identified by OWASP’s new Top 10 list for AI security: supply chain attacks, data poisoning, and model theft. As AI permeates business and government functions, securing the full supply chain—from data to models to applications—is paramount.
Lasso recommends companies like HuggingFace implement automatic scanning for exposed API tokens, enforce access controls, and encourage developers to avoid hardcoded tokens in public repositories. Individual tokens should also be treated as identities and secured via multifactor authentication and zero-trust principles.
Need for continual monitoring to validate security measures.
For all adopters of generative AI, the incident reinforces the need to validate security postures across potential attack surfaces continually. Attackers are incentivized to compromise the AI supply chain and added vigilance alone won’t thwart determined efforts. Maintaining robust authentication and implementing least privilege controls—down to the API token level—is essential.
Featured Image Credit: Photo by Saksham Choudhary; Pexels