Home New EU AI checker reveals key shortcomings in major AI models’ compliance

New EU AI checker reveals key shortcomings in major AI models’ compliance

TLDR

  • The EU's new AI checker reveals many top AI models fall short of regulations in key areas.
  • Developed by LatticeFlow AI, Compl-AI scores AI models on robustness, safety, and compliance.
  • Models like OpenAI's GPT-3.5 Turbo scored low on discriminatory output, raising concerns.

A newly launched AI checker by the European Union (EU) has revealed that many leading artificial intelligence models are not meeting its regulations, particularly in key areas like cybersecurity resilience and preventing discriminatory outcomes.

In December, ReadWrite reported that EU negotiators reached a historic agreement on the world’s first comprehensive AI regulations. This came into force in August, though some details are still being finalized. However, its tiered provisions will gradually apply to AI app and model developers, meaning the compliance clock is already running.

Now, a new tool is testing generative AI models from major tech companies like Meta and OpenAI across multiple categories, in line with the EU’s comprehensive AI Act, which will be rolled out in stages over the next two years.

Developed by Swiss startup LatticeFlow AI in collaboration with research institutes ETH Zurich and Bulgaria’s INSAIT, the open-source framework, called Compl-AI, assigns AI models a score between 0 and 1 in areas such as technical robustness and safety.

EU AI checker results

According to a leaderboard published by LatticeFlow on Wednesday (Oct. 16), models from Alibaba, Anthropic, OpenAI, Meta, and Mistral all scored an average of 0.75 or higher. However, LatticeFlow’s Large Language Model (LLM) Checker also identified weaknesses in certain models, showcasing areas where companies may need to allocate more resources to ensure compliance.

The framework assesses LLM responses across 27 benchmarks, including categories like “toxic completions of benign text,” “prejudiced answers,” “following harmful instructions,” “truthfulness,” and “common sense reasoning,” among others used for evaluation. While there is no overall model score, performance is based on what’s being assessed.

While many models achieved solid scores, such as Anthropic’s Claude 3 Opus, which earned a 0.89, others had serious vulnerabilities. For example, OpenAI’s GPT-3.5 Turbo scored just 0.46 for discriminatory output, and Alibaba’s Qwen1.5 72B Chat fared even worse with a score of 0.37, signaling ongoing concerns about AI models perpetuating human biases, particularly around gender and race.

In cybersecurity testing, some models also fell short. Meta’s Llama 2 13B Chat scored 0.42 in the “prompt hijacking” category—a type of cyberattack where malicious prompts are used to extract sensitive information. Mistral’s 8x7B Instruct model performed similarly poorly, scoring 0.38.

AI model valuation welcomed

Thomas Regnier, the European Commission’s spokesperson for digital economy, research, and innovation, commented on the release: “The European Commission welcomes this study and AI model evaluation platform as a first step in translating the EU AI Act into technical requirements, helping AI model providers implement the AI Act.”

“We invite AI researchers, developers, and regulators to join us in advancing this evolving project,” said ETH Zurich Professor Martin Vechev, who is also the founder of INSAIT.

He added: “We encourage other research groups and practitioners to contribute by refining the AI Act mapping, adding new benchmarks, and expanding this open-source framework. The methodology can also be extended to evaluate AI models against future regulatory acts beyond the EU AI Act, making it a valuable tool for organizations working across different jurisdictions.”

LatticeFlow AI co-founder, Dr. Petar Tsankov, stated: “With this framework, any company can now evaluate their AI systems against the EU AI Act technical interpretation. Our vision is to enable organizations to ensure that their AI systems are not only high-performing but also fully aligned with the regulatory requirements.”

ReadWrite has reached out to the European Commission for comment.

Featured image: Ideogram

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the gambling and blockchain industries for major developments, new product and brand launches, game releases and other newsworthy events. Editors assign relevant stories to in-house staff writers with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Suswati Basu
Tech journalist

Suswati Basu is a multilingual, award-winning editor and the founder of the intersectional literature channel, How To Be Books. She was shortlisted for the Guardian Mary Stott Prize and longlisted for the Guardian International Development Journalism Award. With 18 years of experience in the media industry, Suswati has held significant roles such as head of audience and deputy editor for NationalWorld news, digital editor for Channel 4 News and ITV News. She has also contributed to the Guardian and received training at the BBC As an audience, trends, and SEO specialist, she has participated in panel events alongside Google. Her…

Get the biggest iGaming headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Gambling News

    Explore the latest in online gambling with our curated updates. We cut through the noise to deliver concise, relevant insights, keeping you informed about the ever-changing world of iGaming and its most important trends.

    In-Depth Strategy Guides

    Elevate your game with tailored strategies for sports betting, table games, slots, and poker. Learn how to maximize bonuses, refine your tactics, and boost your chances to beat the house.

    Unbiased Expert Reviews

    Honest and transparent reviews of sportsbooks, casinos and poker rooms crafted through industry expertise and in-depth analysis. Delve into intricacies, get the best bonus deals, and stay ahead with our trustworthy guides.