Contact us
Contact us
Back to insights

CaLICO – a new way to process harmful content

Every day we see an increase in hate speech, targeted threats, coordinated disinformation and harmful content online. Building a model that can handle this harmful content across multiple languages whilst also accounting for cultural differences was always going to be a challenge. At Textgain we love a challenge. That’s why we are creating CaLICO, a cutting-edge large language model. We’re building it from the ground up in the EU to detect and understand harmful content responsibly in all official European languages.

CaLICO central logo loads of room
CaLICO central logo loads of room

Why we believe CaLICO is a big deal

Most language models are not designed to engage with harmful content at all. That means these tools can’t help us figure out and address problems like online death threats or disinformation campaigns. That’s where CaLICO comes in. It’s built from scratch to responsibly process harmful content and help experts really understand it. It works across Europe’s 24 official languages plus a number of others, considering diverse cultures and perspectives. This makes it versatile and useful for identifying dangerous trends or figuring out how certain narratives affect people in different ways.

How we built it

Developing CaLICO involved several important steps to ensure it’s both effective and responsible. First, we organized available information into formats the AI can actually work with. Next, the data was carefully cleaned to guarantee it is high quality and error-free. We also paid attention to underrepresented European languages so that smaller communities receive the same attention as larger ones. Finally, we meticulously anonymized all the information by removing personally identifiable details.

We also used very specific sources that most language models don’t use, making CaLICO much more informed. We think the effort was worth it, as this attention to detail gives CaLICO the edge when tackling tough detailed tasks like identifying dangerous patterns in language.

The superpowered tech behind CaLICO

CaLICO exists thanks to European Union support. By winning the Large AI Grand Challenge, Textgain received access to 2 million GPU hours from LUMI, one of the best supercomputers in Europe. It uses AMD GPUs and the open-source ROCm platform, which gives Textgain developers more flexibility compared to proprietary systems.

The team has also teamed up with experts through the Epicure HPC project, optimizing CaLICO to run faster and smarter. Right now, CaLICO can process language during training at speeds of over 700,000 tokens per second. That’s like having 70,000 people reading together at the same time.

Why CaLICO Stands Out

Plenty of AI models exist, but few focus on ethical AI the way CaLICO does. The model sticks to strict rules based on fairness, transparency, and privacy, following guidelines from the European Union. It’s designed to not just churn out answers but to collaborate with specialists, policymakers and other experts to uncover the “why” behind harmful content trends.

CaLICO isn’t about replacing humans. Instead, it boosts existing intelligence gathering to make smarter decisions. Whether you’re investigating malicious narratives or figuring out how politicised propaganda spreads, CaLICO helps without being biased or invasive.

What’s next for CaLICO?

The first CaLICO models are already up and running, but there’s still more to come and the team here at Textgain is fine-tuning its features to make it easier to use. For example, we’re implementing chat functionality so people can interact with the AI more naturally. We are also developing smaller versions of the model that can run privately on local devices, ensuring your data stays secure.

CaLICO will improve other Textgain products, helping customers better handle the challenges of harmful content online. With increasing focus on EU cybersecurity independence, we believe CaLICO will have an important role to play.

To find out more about CaLICO and how it can help you, or the work we do here at Textgain, why not get in touch?