Home Clibrain’s Lince: The LLM That Understands Spanish Like a Native Speaker

Clibrain’s Lince: The LLM That Understands Spanish Like a Native Speaker

Clibrain, a Madrid-based AI startup, has joined the race to create generative AI models optimized for Spanish speakers. The company has released Lince Zero; a Spanish-instruction tuned LLM, which has been trained on a dedicated corpus of Spanish language data. Lince Zero is a 7BN parameter taster of a more powerful (foundational) model (40BN parameters) the company has in the pipeline, which will simply be called Lince.

According to Clibrain, Spanish is one of the most spoken languages globally, boasting considerable variety in terms of dialects and variants. The company argues that this linguistic diversity makes it challenging for mainstream models to perform adequately in Spanish. Clibrain aims to address this gap by developing models that can parse and understand more Spanish linguistic nuance than the average LLM.

Clibrain’s LLM, Lince, is based on existing open-source technologies. However, it is not just using existing architectures, touting its own senior engineering talent in AI. The startup was only founded in April 2023, with a multidisciplinary team of close to 30 staff with an R&D lab focused on generative AI at the core.

Clibrain’s co-founder and CEO, Elena Gonzalez-Blanco, brings an educational background in linguistics research and poetry to the startup, combined with a career focus on AI. She points back to her years doing linguistics research as powering a particularly crucial contribution to the project, enabling Clibrain to source unique training data to feed its model-making ambitions.

“We have a unique corpus [of training data],” she says. “I am a linguist; I have, let’s say, 15 years of research in terms of the history of language, Spanish language… a lot of contacts that have not been used for training yet. So we have a unique corpus [as a differentiator].”

Clibrain’s debut model release is called Lince Zero and is being released under an open-source license. This LLM is primarily based on existing open-source technologies, so it cannot yet boast its foundational model. However, the company says that’s coming soon.

The release of Lince Zero is the first step on Clibrain’s ambitious roadmap. It is primarily based on existing open-source technologies, so it cannot yet boast its foundational model. However, the company says that’s coming soon. As you can tell from the parameter numbers, these LLMs are far from contending to be the most significant models on the block. But, as Gonzalez-Blanco argues, Clibrain’s conviction is that model size, per se, won’t be the killer feature when it comes to generating a performance advantage around enhanced understanding of Spanish. Rather, quality attention to linguistic detail will count, and it hopes this will give it an edge in Spanish markets.

Clibrain’s Lince is far from the first conversational AI model to focus on Spanish. The Barcelona Supercomputing Center’s MarIA project, which launched back in 2021, claimed to be the first “massive” AI system in the Spanish language. Still, Clibrain argues it has surpassed MarIA and pulled together the most technologically “advanced” model focused on the Spanish-speaking market to date.

Many non-English language-optimized LLMs are out there now, such as Baidu’s Chinese language model, Ernie, or this LLM model family that’s being tuned for German. South Korean tech giant Naver also works on generative AI models trained in Korean.

However, Clibrain contends that its full focus on the Spanish language will enable its forthcoming foundational model, plus a series of domain-trained models it plans to develop atop the big one, to parse and understand more Spanish linguistic nuance than the average LLM.

Lince Zero’s performance is equivalent to GPT-3, whereas Clibrain says MarIA’s performance is equivalent to GPT-2. Although benchmarking linguistic performance of LLMs is a cutting-edge business in and of itself, Clibrain is encouraging Spanish speakers to check out what it’s built and start generating feedback.

Clibrain’s co-founders have been bootstrapping development so far, using funds gleaned from previous startup exits. The company doesn’t yet have a hefty investor roster or deep funding yet. Gonzalez-Blanco says they had wanted to focus on developing core models and getting their first products to market rather than on external fundraising. Still, the company may look to raise a more significant investment than the founders could plow in themselves as they continue to progress with the Lince product roadmap.

First reported on TechCrunch

Frequently Asked Questions

Q: What is Clibrain, and what is its goal?

A: Clibrain is a Madrid-based AI startup focused on creating generative AI models optimized for Spanish speakers. The company aims to develop models that can parse and understand Spanish linguistic nuance better than existing language models.

Q: What is Lince Zero?

A: Lince Zero is Clibrain’s debut model release. It is a Spanish-instruction tuned Language Model (LLM) trained on a dedicated corpus of Spanish language data. Lince Zero is a 7 billion-parameter model that previews Clibrain’s more powerful foundational model, which has 40 billion parameters and is currently in development.

Q: What makes Clibrain’s approach unique?

A: Clibrain differentiates itself by leveraging its unique corpus of training data sourced through the linguistics research background of its co-founder and CEO, Elena Gonzalez-Blanco. The company combines existing open-source technologies with its own senior engineering talent in AI to develop its models.

Q: How does Clibrain’s LLM compare to other conversational AI models in Spanish?

A: Clibrain contends that its focus on the Spanish language enables its models to outperform existing models, including the Barcelona Supercomputing Center’s MarIA project. Clibrain claims to have the most technologically advanced model for the Spanish-speaking market.

Q: What are Clibrain’s plans for the future?

A: The release of Lince Zero is the first step in Clibrain’s roadmap. The company plans to develop its foundational model, Lince, and domain-trained models. They aim to provide an enhanced understanding of Spanish through quality attention to linguistic detail.

Q: How does Lince Zero’s performance compare to other models?

A: Clibrain states that Lince Zero’s performance is equivalent to OpenAI’s GPT-3 model while suggesting that MarIA’s performance is equivalent to GPT-2. However, benchmarking linguistic performance of language models is an ongoing process.

Featured Image Credit: Jon Tyson; Unsplash; Thank you!

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Brad Anderson
Former editor

Brad is the former editor who oversaw contributed content at ReadWrite.com. He previously worked as an editor at PayPal and Crunchbase.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.