Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Nvidia Nemotron Team

[ LINK ]
[ PDF ]

Abstract

Nemotron 3 Nano 30B-A3B is a Mixture-of-Experts hybrid Mamba-Transformer language model pretrained on 25 trillion tokens and then post-trained with supervised fine-tuning and large-scale reinforcement learning. The model improves over the prior Nemotron 2 Nano generation while activating fewer parameters per forward pass, with strong throughput, reasoning, chat, and agentic performance and support for contexts up to 1M tokens.