Nvidia has announced the release of its new small language model (SLM), Nemotron-Nano-9B-V2, which follows recent trends in developing compact AI models that can run on devices like smartwatches and smartphones. This model has been optimized to perform effectively with 9 billion parameters, reduced from its original 12 billion, allowing it to operate on a single Nvidia A10 GPU. This reduction aims to enhance efficiency for deployment, as noted by Oleksii Kuchiaev, Nvidia’s Director of AI Model Post-Training.
The model supports various languages, including English, German, and Japanese, and is built for tasks such as instruction following and code generation. It features unique attributes like the ability to toggle AI reasoning on and off, allowing users to generate a reasoning trace prior to final outputs. This capability, alongside a “thinking budget” management system which caps internal reasoning tokens, aims to optimize accuracy and response time for applications such as customer support.
Benchmark evaluations indicate that Nemotron-Nano-9B-V2 performs competitively with other small-scale models. It achieved scores of 72.1 percent on AIME25, 97.8 percent on MATH500, and 90.3 percent on IFEval, among others. These results highlight its potential in a landscape where many leading language models have over 70 billion parameters.
The model was trained using a blend of web-sourced, curated, and synthetic datasets, including text from diverse domains such as law, finance, and science. Nvidia has made this model available under a permissive licensing framework that allows developers to utilize it commercially without additional fees. However, users must comply with various regulatory conditions, including deployment guidelines and safety mechanisms.
With Nemotron-Nano-9B-V2, Nvidia aims to cater to developers looking for efficient AI solutions that balance reasoning capability with manageable deployment requirements.
Source: https://venturebeat.com/ai/nvidia-releases-a-new-small-open-model-nemotron-nano-9b-v2-with-toggle-on-off-reasoning/

