DiffusionGemma: 700 Tokens/Second Local LLM on RTX 5090
Your AI bill's about to drop a line item.
Google DeepMind dropped DiffusionGemma today. A 26B open-weight model that pushes 700 tokens/second on an RTX 5090 and breaks 1000 on a single H100. No per-token pricing. No round-trip to some remote server.
Just your hardware, running flat