Deep Cogito 14B Outperforms Qwen & DeepSeek in Key Benchmarks

San Francisco-based AI startup Deep Cogito has released a new suite of open large language models (LLMs) that reportedly outperform current open-source competitors and represent a significant step toward achieving general superintelligence.

The company, whose mission is to build “general superintelligence,” has launched preview models in five parameter sizes: 3B, 8B, 14B, 32B, and 70B. According to Deep Cogito, “each model surpasses the top open models of similar size—including LLaMA, DeepSeek, and Qwen—on most standard benchmarks.”

🧠 IDA: A New Approach to Scalable Intelligence

At the core of these advancements is Deep Cogito’s novel training framework: Iterated Distillation and Amplification (IDA).

Described as a scalable alignment strategy for superintelligence, IDA centers on iterative self-improvement. The process involves two main stages:

Amplification – The model uses increased computational power to explore improved reasoning and solutions.
Distillation – These enhanced capabilities are then internalized and compressed back into the model.

This cycle creates what Deep Cogito calls a “positive feedback loop”, enabling models to scale intelligence with compute rather than being limited by the oversight of larger models or human feedback.

The company believes IDA overcomes key bottlenecks found in existing training methods like Reinforcement Learning from Human Feedback (RLHF) or traditional model distillation.

“When we study superintelligent systems like AlphaGo, we see two key factors: advanced reasoning and iterative self-improvement,” the company wrote in its research notes. “IDA integrates both into LLM training.”

⚙️ Outperforming LLaMA and DeepSeek

The standout release is the Cogito 70B model, which outperforms Meta’s LLaMA 4 109B MoE model—a major achievement considering LLaMA’s much larger architecture.

In benchmark testing, Cogito 70B achieved 91.73% on MMLU in standard mode—a 6.40% gain over LLaMA 3.3 70B—and 91.00% in thinking mode, beating DeepSeek R1 Distill 70B by 4.40%.

Performance comparisons across other benchmarks such as MMLU-Pro, GSM8K, ARC, and MATH also show significant gains across Cogito’s full model range (3B–70B), particularly in reasoning mode.

🔧 Optimized for Agents and Code

Built on Qwen and LLaMA checkpoints, the Cogito models are tailored for:

Code generation
Function calling
Agentic behavior and planning tasks

Each model offers dual functionality: it can respond directly like a typical LLM or invoke a self-reflective reasoning mode before answering—similar to models like Claude 3.5. However, Deep Cogito notes they’ve intentionally deprioritized very long reasoning chains in favor of faster, distilled outputs that users typically prefer.

🚀 Fast Iteration, Big Ambitions

Despite the strong performance, Deep Cogito emphasizes this is only an early preview. The current models were trained by a small team in just 75 days, highlighting IDA’s efficiency.

The company plans to:

Release improved checkpoints for existing sizes.
Launch larger Mixture-of-Experts (MoE) models in the 109B, 400B, and 671B ranges.
Keep all models open-source moving forward.

Conclusion
Deep Cogito’s open LLMs mark a bold move toward more efficient, scalable AI development. By leveraging IDA instead of traditional alignment methods, the company may have set a new bar for performance and adaptability in open-source language models.

Deep Cogito’s open LLMs leverage IDA to outperform other models of the same size.