AI21 Labs has launched Jamba Reasoning 3B, a small open-source model designed for enterprise use. The model aims to reduce data center traffic by enabling extended reasoning and code generation directly on devices such as laptops and mobile phones. Jamba Reasoning 3B can handle over 250,000 tokens and is reported to run inference efficiently on edge devices.
Co-CEO Ori Goshen indicated that there are increasing enterprise applications for small models like Jamba Reasoning 3B, as processing inference on devices can alleviate the pressure on data centers, which are facing high operational costs. He noted that the industry’s future may involve a hybrid model where some computational tasks are performed locally on devices while others are handled by GPUs in data centers.
The architecture of Jamba Reasoning 3B combines components of Mamba architecture and Transformers, facilitating faster inference speeds—estimated to be 2-4 times quicker than previous models. AI21 tested the model on a MacBook Pro, achieving a processing speed of 35 tokens per second. The model is particularly suited for simpler tasks, such as scheduling and generating agendas, while more complex tasks can still be offloaded to GPU clusters.
In the broader enterprise landscape, there is a growing interest in specialized small models. Meta has introduced MobileLLM-R1, a series of reasoning models for specific applications. Google’s Gemma model was among the first to target portable devices. Other companies like FICO are also developing tailored models for their operational needs.
Benchmark testing has shown that Jamba Reasoning 3B performs competitively against other small models, demonstrating strong results in various assessment scenarios. Additionally, the model’s localized processing is viewed as a potential advantage for privacy-focused applications, as it minimizes data transmission to external servers.
Source: https://venturebeat.com/ai/ai21s-jamba-reasoning-3b-redefines-what-small-means-in-llms-250k-context-on

