NVIDIA's Nemotron Diffusion Cuts AI Inference Bill by 84%
TL;DR - Nemotron-Labs Diffusion 8B hits 865 tokens/second on B200 hardware — 6.4× faster than Qwen3-8B autoregressive generation - Self-speculation mode (diffusion drafts + AR verify) scores +1.2% higher average accuracy than Qwen3-8B across benchmarks, not just speed - The same checkpoint runs in three modes (AR, diffusion,