Skip to main content

2 posts tagged with "GPU"

GPU hardware and cloud compute

View All Tags

GPU Rowhammer Is Real: A Single Bit Flip Drops AI Model Accuracy from 80% to 0.1%

· 13 min read
Dhayabaran V
Barrack AI

A single bit flip in GPU memory dropped an AI model's accuracy from 80% to 0.1%.

That is not a theoretical risk. It is a documented, reproducible attack called GPUHammer, demonstrated on an NVIDIA RTX A6000 by University of Toronto researchers and presented at USENIX Security 2025. The attack requires only user-level CUDA privileges and works in multi-tenant cloud GPU environments where attacker and victim share the same physical GPU.

GPUHammer is not the only GPU hardware vulnerability. LeftoverLocals (CVE-2023-4969) proved that AMD, Apple, and Qualcomm GPUs leak memory between processes, allowing full reconstruction of LLM responses. NVBleed demonstrated cross-VM data leakage through NVIDIA's NVLink interconnect on Google Cloud Platform. And at RSA Conference 2026, analysts highlighted that traditional security tools monitor only CPU and OS activity, leaving GPU operations completely invisible.

If you are training or running inference on cloud GPUs, this matters. Here is the full technical breakdown.

NVIDIA Spent $20 Billion Because GPUs Alone Can't Win the Inference Era

· 19 min read
Dhayabaran V
Barrack AI

On March 16, 2026, Jensen Huang took the stage at GTC in San Jose and unveiled the NVIDIA Groq 3 LPU: a chip that is not a GPU, does not run CUDA natively, and exists for one reason only. Inference.

Three months earlier, on Christmas Eve 2025, NVIDIA paid $20 billion in cash to license Groq's entire patent portfolio, hire roughly 90% of its employees, and acquire all of its assets. It was the largest deal in NVIDIA's history. The company that built the GPU monopoly spent $20 billion on a chip that replaces GPUs for the most latency-sensitive phase of AI inference.

This is not a product announcement recap. Every major outlet has covered the Groq 3 specs. What nobody has published is the synthesis: why the GPU company needed a non-GPU chip, what the data says about GPU architectural limitations during inference decode, and what this means for the thousands of ML teams currently renting GPUs for inference workloads.

Every claim in this post is sourced. NVIDIA's own projections are labeled as such. Independent benchmarks are cited separately.