Skip to main content

GDDRHammer and GeForge: GPU Rowhammer Now Achieves Full System Compromise

· 15 min read
Dhayabaran V
Barrack AI

Last updated: April 2026. GPU security is an evolving field. Verify current mitigation guidance with your infrastructure provider.

Rowhammer just jumped from CPUs to GPUs. And this time it is not about corrupting model weights or degrading inference accuracy. Two independent research teams disclosed attacks on April 2, 2026 that escalate GDDR6 memory bit flips into a root shell on the host machine. From an unprivileged CUDA kernel. No authentication required.

The original GPUHammer research demonstrated 8 bit flips on an RTX A6000 and showed that a single strategic flip could drop ImageNet accuracy from 80% to 0.1%. That was a data integrity problem. What GDDRHammer and GeForge demonstrate is a full privilege escalation chain: GPU memory corruption to GPU page table hijacking to CPU memory read/write to root shell.

Both papers will be presented at the 47th IEEE Symposium on Security and Privacy (IEEE S&P 2026), running May 18 through 20 in San Francisco. A third concurrent attack called GPUBreach, from the University of Toronto team behind the original GPUHammer, goes even further by bypassing IOMMU protections entirely. All three are disclosed at gddr.fail and gpubreach.ca.

The RTX A6000 is one of the two confirmed vulnerable GPUs, and it is widely deployed across GPU cloud platforms. This post covers what the attacks actually do, which hardware is affected, what the mitigations cost, and what it means for anyone running GDDR6 GPUs in a shared environment.

How GDDRHammer works

GDDRHammer was developed by researchers at UNC Chapel Hill, Georgia Tech, and Mohamed bin Zayed University of Artificial Intelligence. The paper, code (github.com/heelsec/GDDRHammer), and supplementary materials are all available at gddr.fail.

The attack exploits a flaw in how NVIDIA's default memory allocator (cudaMalloc) places GPU page tables. Under normal operation, page table entries should be isolated from user-controlled data. They are not. The allocator co-locates page tables and user data in the same GDDR6 memory region. That means an attacker who can induce bit flips in adjacent rows can corrupt page table entries.

The team characterized Rowhammer behavior across 25 GDDR6 GPUs. They developed double-sided hammering patterns that exploit GPU parallelism, specifically the SIMT architecture and multi-warp execution model, to generate far more intense memory access patterns than a CPU can produce. The result was roughly 64x more bit flips than the original GPUHammer work.

The actual attack chain has four parts. The attacker uses a memory massaging technique to steer GPU page table entries toward DRAM rows with known-vulnerable bits. Then they hammer adjacent rows to flip bits in those page table entries. A single flip in the right position redirects a GPU virtual address mapping to point at CPU physical memory via the PCIe BAR1 aperture. From there, the GPU performs DMA reads and writes to arbitrary CPU memory. The attacker modifies kernel data structures and gets a root shell.

On the RTX A6000, the team achieved an average of 129 bit flips per memory bank. Compare that to GPUHammer's 8 bit flips across 4 banks.

How GeForge differs

GeForge was built by a separate team at Purdue, University of Rochester, University of Western Australia, HydroX AI, and Clemson. Code is at github.com/stefan1wan/GeForge, and a video demo of the root shell exploit is at gddr.fail/files/geforge-demo.mp4.

The main architectural difference is where in the GPU's address translation hierarchy the attack lands. GDDRHammer corrupts the last-level page table (PT). GeForge goes one level deeper and targets the last-level page directory (PD0). The page directory contains pointers to page tables, so corrupting a PD0 entry lets the attacker forge entirely new page table mappings instead of just modifying existing ones. Broader control.

GeForge introduced three techniques that set it apart. A memory massaging strategy tuned specifically for page directory placement. A non-uniform Rowhammer pattern that varies hammering intensity across rows rather than applying uniform pressure, which produced more bit flips. And a page-anchoring technique that uses timing side-channels to locate GPU physical addresses at runtime, since the GPU physical address layout is not exposed to userspace.

Results: 1,171 bit flips on an RTX 3060. 202 bit flips on an RTX A6000. Both exploits achieve the same end state as GDDRHammer. When IOMMU is disabled (the default on most systems), the attacker gets arbitrary read/write to CPU memory and a root shell from an unprivileged user account.

GPUBreach bypasses IOMMU

This is the one that should concern cloud operators most. GPUBreach, from the University of Toronto Computer Security Lab (the same group behind GPUHammer), will also be presented at IEEE S&P 2026. It is disclosed at gpubreach.ca.

GDDRHammer and GeForge can be blocked by enabling IOMMU, which restricts GPU DMA access to only host memory regions mapped by the OS. GPUBreach sidesteps that entirely. It starts the same way, with Rowhammer bit flips corrupting GPU page tables from an unprivileged CUDA kernel. But instead of trying to DMA into CPU memory (which IOMMU blocks), GPUBreach chains the GPU-side memory corruption with newly discovered memory-safety bugs in the NVIDIA GPU driver. The driver runs as a CPU-side kernel component. Exploiting it bypasses IOMMU because the escalation path goes through software, not hardware DMA.

That means IOMMU alone is not enough. Full technical details of the driver vulnerabilities exploited by GPUBreach are pending the IEEE S&P presentation in May.

Which GPUs are affected

The researchers tested specific models across multiple memory technologies. The picture is clear for now.

GPUMemoryArchitectureBit FlipsStatus
RTX 3060GDDR6Ampere1,171Exploit demonstrated
RTX A6000GDDR6Ampere202Exploit demonstrated
RTX 3080GDDR6XAmpere0Not vulnerable
RTX 4060 / 4060 TiGDDR6Ada Lovelace0Not vulnerable
RTX 6000 AdaGDDR6Ada Lovelace0Not vulnerable
RTX 5050GDDR7Blackwell0Not vulnerable
A100HBM2eAmpere0On-die ECC
H100HBM3HopperNot testedOn-die ECC
H200HBM3eHopperNot testedOn-die ECC

Source: gddr.fail, GDDRHammer and GeForge papers (IEEE S&P 2026). "Not vulnerable" = no bit flips observed in testing. "On-die ECC" = always-on hardware error correction, assessed as resistant to current single-bit techniques.

Confirmed vulnerable with exploits demonstrated:

NVIDIA GeForce RTX 3060 (Ampere, GA106, 12 GB GDDR6). Showed 1,171 bit flips in GeForge testing.

NVIDIA RTX A6000 (Ampere, GA102, 48 GB GDDR6). Showed 202 bit flips in GeForge testing and averaged 129 bit flips per bank in GDDRHammer. The GDDRHammer paper states that nearly all tested RTX A6000 cards remained vulnerable under realistic settings.

Tested with no bit flips observed:

GeForce RTX 3080 (Ampere, GDDR6X). GDDR6X appears to have stronger in-DRAM mitigations.

GeForce RTX 4060 and RTX 4060 Ti (Ada Lovelace, GDDR6). Two samples of the Ti were tested. No bit flips on either. Ada-generation memory controllers or newer GDDR6 chip revisions may include improved defenses.

RTX 6000 Ada (Ada Lovelace, GDDR6, 48 GB). Tested by the GDDRHammer team. No bit flips induced. Some press outlets incorrectly reported this GPU as vulnerable, likely confusing it with the Ampere-generation RTX A6000. They are different products.

GeForce RTX 5050 (Blackwell, GDDR7). No bit flips. GDDR7 implements always-on, non-configurable on-die ECC.

Not tested against these attacks but assessed:

A100 (HBM2e). Tested in the original GPUHammer research. No bit flips observed. On-die ECC is standard.

H100 (HBM3) and H200 (HBM3e). On-die ECC enabled by default. The gddr.fail FAQ states that these GPUs "likely mask single-bit flips." The researchers add a caveat: future Rowhammer patterns causing multi-bit flips may bypass ECC, citing prior work like ECCploit and ECC.fail.

The GDDRHammer team tested 25 GDDR6 GPUs in total. Tom's Hardware reports the paper found vulnerabilities in most tested GDDR6 GPUs, suggesting bit flips were observed on additional models beyond the RTX 3060 and A6000 even if full exploits were not demonstrated on all of them.

NVIDIA's response

As of April 5, 2026, NVIDIA has not issued a new security bulletin for GDDRHammer or GeForge. They point to the existing "Security Notice: Rowhammer, July 2025" (nvidia.custhelp.com/app/answers/detail/a_id/5671/), which was originally published July 9, 2025 in response to GPUHammer. NVIDIA characterizes Rowhammer as an industry-wide DRAM issue and says the notice reinforces already known mitigations. No CVE has been assigned as of April 5, 2026. No new driver patches or firmware updates target these attacks.

NVIDIA recommends two mitigations.

First, enabling ECC via nvidia-smi -e 1 followed by a reboot. This activates SECDED error correction that detects and corrects single-bit errors. The trade-offs: approximately 6.25% reduction in usable VRAM (consumed by parity bits) and a performance overhead that varies by workload, typically 5 to 15% for ML inference. ECC is available on professional and datacenter GPUs like the RTX A6000, A5000, and A4000 but generally not on consumer GeForce cards. On Ampere professional GPUs like the A6000, ECC is not enabled by default. Administrators must explicitly enable it. Hopper and Blackwell datacenter GPUs have ECC on by default.

Second, enabling IOMMU in the system BIOS. This restricts GPU DMA access to only host memory regions explicitly mapped by the OS. IOMMU is disabled by default on most systems. Performance impact in passthrough mode (iommu=pt) is minimal for GPU workloads. Strict DMA translation mode can add 0 to 25% overhead depending on workload, though GPU workloads with large bulk transfers are less affected than networking workloads.

NVIDIA also notes that all GDDR7 and HBM GPUs feature on-die ECC that is always on and non-configurable, providing hardware-level Rowhammer protection.

What this means for GPU cloud environments

If you run GDDR6 GPUs in a multi-tenant setup, the threat model is simple. A tenant with standard CUDA execution access (which is exactly what cloud tenants get) could run a Rowhammer attack, corrupt GPU page tables, and escalate to host memory access. From there, data belonging to other tenants on the same host is reachable. NVIDIA's default GPU time-slicing provides sufficient time windows to execute the attack.

The isolation mechanisms that matter:

IOMMU blocks the DMA-based escalation path used by GDDRHammer and GeForge. Major cloud providers typically enable IOMMU on hypervisor hosts since it is essential for VM isolation via VT-d and AMD-Vi. Bare-metal GPU instances may not have it enabled. But IOMMU alone does not stop GPUBreach, which escalates through the GPU driver instead of DMA. GPU passthrough in VMs with VFIO and IOMMU typically achieves 95%+ of bare-metal performance.

MIG (Multi-Instance GPU) provides hardware-level partitioning with isolated DRAM banks, memory channels, L2 cache, and compute units per instance. The GPUHammer paper explicitly states that MIG and Confidential Computing prevent the multi-tenant data co-location required for these exploits. The problem: MIG is only available on datacenter GPUs (A100, A30, H100, H200, B200). The RTX A6000 does not support MIG.

SR-IOV creates hardware Virtual Functions with IOMMU protection per VM, blocking GPU-to-CPU escalation. It does not prevent intra-GPU Rowhammer between VFs sharing the same physical GDDR6 memory.

Time-slicing, the default GPU sharing mode many cloud providers use, provides no protection. Tenants share DRAM banks.

The RTX A6000 is in a difficult position. It is confirmed vulnerable. It does not support MIG. ECC is off by default. If you run shared A6000 instances, enabling both ECC and IOMMU is the minimum. Recognizing that GPUBreach can bypass IOMMU through driver bugs, those are necessary but not sufficient.

How Barrack AI A6000 instances are configured

On Barrack AI, as of April 2026, every A6000 instance is provisioned as a dedicated GPU. No other tenant shares the physical GPU while a VM is active. The host infrastructure runs with IOMMU enabled, which blocks the DMA-based escalation path used by GDDRHammer and GeForge. ECC is enabled by default on all A6000 GPUs and should not be disabled.

These three configurations address the primary attack vectors disclosed in this research. Dedicated GPU allocation eliminates the cross-tenant co-location that the attacks require. IOMMU prevents corrupted GPU page tables from reaching host CPU memory via DMA. ECC corrects single-bit flips before they can corrupt page table entries. Single-tenant GPU allocation also means the co-location required for GPUBreach's driver-based escalation is not present.

For H100 instances, HBM3 memory with on-die ECC is always active and non-configurable. No bit flips have been demonstrated on HBM GPUs using current techniques.

The GPU attack surface keeps expanding

This is the fourth major GPU security disclosure in two years. LeftoverLocals (CVE-2023-4969, January 2024) demonstrated uninitialized local memory leakage across process boundaries on Apple, AMD, and Qualcomm GPUs, enough to reconstruct LLM responses. NVIDIA GPUs were not affected by that one. NVIDIAScape (CVE-2025-23266, CVSS 9.0) showed that a three-line Dockerfile exploiting the NVIDIA Container Toolkit could achieve complete host takeover, affecting 37% of cloud environments. GPUHammer (USENIX Security 2025) proved Rowhammer works on GPU GDDR6 memory.

Each disclosure raised the severity ceiling. Data leakage, then container escape, then model corruption, now full system compromise from unprivileged code. The trajectory from 8 bit flips in 2025 to 1,171 in 2026, from accuracy degradation to root shell, from IOMMU-blockable to IOMMU-bypassing, shows a research area that is still accelerating.

The IEEE S&P presentations in mid-May will bring full technical detail. If you are running GDDR6 GPUs in any shared capacity, the time to audit your IOMMU and ECC configuration is now, not after the conference.

FAQ

Which GPUs are confirmed vulnerable to GDDRHammer and GeForge?

The NVIDIA GeForce RTX 3060 (Ampere, GDDR6) and the NVIDIA RTX A6000 (Ampere, GDDR6) are the only two GPUs with publicly demonstrated exploits. The GDDRHammer team tested 25 GDDR6 GPUs and found bit flips in most of them, but full exploit chains are only demonstrated on these two models.

Are H100, H200, or A100 GPUs affected?

Not by current techniques. These GPUs use HBM memory with on-die ECC enabled by default. The gddr.fail FAQ states that on-die ECC "likely masks single-bit flips." The researchers caution that future multi-bit flip attacks could potentially bypass ECC, but no such attack has been demonstrated on HBM GPUs.

Are GDDR6X or GDDR7 GPUs vulnerable?

No bit flips were observed on any tested GDDR6X GPU (including the RTX 3080) or GDDR7 GPU (including the RTX 5050). GDDR6X appears to have stronger in-DRAM mitigations. GDDR7 implements always-on, non-configurable on-die ECC.

Is the RTX 6000 Ada the same as the RTX A6000?

No. The RTX A6000 is Ampere-generation (GA102) with GDDR6 and is confirmed vulnerable. The RTX 6000 Ada is the Ada Lovelace successor with GDDR6 and was tested by the GDDRHammer team with no bit flips observed. Some press coverage has confused the two.

Does enabling IOMMU fully protect against these attacks?

IOMMU blocks the DMA-based escalation used by GDDRHammer and GeForge. It does not protect against GPUBreach, which bypasses IOMMU by exploiting memory-safety bugs in the NVIDIA GPU driver to escalate through software instead of hardware DMA. IOMMU is necessary but not sufficient.

What is the performance cost of enabling ECC on an RTX A6000?

Approximately 6.25% reduction in usable VRAM (consumed by parity bits) and a performance overhead that varies by workload, typically 5 to 15% for ML inference. ECC is enabled via nvidia-smi -e 1 followed by a system reboot. It is not on by default on Ampere professional GPUs.

Has NVIDIA issued a security bulletin for these attacks?

No new bulletin as of April 5, 2026. NVIDIA directs users to the existing "Security Notice: Rowhammer, July 2025" and characterizes Rowhammer as an industry-wide DRAM issue. No CVE has been assigned as of April 5, 2026.

Can an unprivileged cloud tenant execute these attacks?

Yes. The attacks require only standard CUDA execution access, which is the access level GPU cloud tenants receive. NVIDIA's default GPU time-slicing provides sufficient time windows to perform the Rowhammer attack.

What should GPU cloud operators do right now?

Enable ECC on all GDDR6 professional GPUs (accepting the VRAM and performance trade-off). Verify IOMMU is active on all hosts. Avoid time-slicing for multi-tenant GDDR6 GPU sharing. For workloads requiring strong tenant isolation, use datacenter GPUs with MIG support (A100, H100, H200, B200). Monitor NVIDIA security notices and the IEEE S&P 2026 proceedings (May 18 to 20) for updated guidance.

When are the full papers being presented?

Both GDDRHammer and GeForge will be presented at the 47th IEEE Symposium on Security and Privacy (IEEE S&P 2026), May 18 through 20, 2026 in San Francisco. GPUBreach will also be presented at the same conference.

Where can I read the full research?

The papers, code repositories, and FAQ are at gddr.fail. GPUBreach details are at gpubreach.ca. The GDDRHammer code is at github.com/heelsec/GDDRHammer. The GeForge code is at github.com/stefan1wan/GeForge.

OpenAI Codex: How a Branch Name Stole GitHub Tokens

· 12 min read
Dhayabaran V
Barrack AI

BeyondTrust Phantom Labs disclosed a critical command injection vulnerability in OpenAI's Codex cloud environment on March 30, 2026. The vulnerability allowed attackers to steal GitHub OAuth tokens by injecting shell commands through a branch name parameter. A branch name. That is where the entire attack starts.

The flaw affected every Codex surface: the ChatGPT website, Codex CLI, Codex SDK, and the Codex IDE Extension. OpenAI classified it as Critical (Priority 1) and remediated all issues by February 5, 2026, following responsible disclosure that began December 16, 2025. No CVE has been assigned.

B300 Draws 1,400W Per GPU. Most Data Centers Aren't Ready.

· 11 min read
Dhayabaran V
Barrack AI

NVIDIA's B300 GPU draws up to 1,400W per chip. That is double the H100, which shipped barely two years ago.

A single GB300 NVL72 rack, fully loaded with 72 of these GPUs, pulls 132 to 140 kW under normal operation. To put that number in perspective, the global average rack density in data centers sits at roughly 8 kW. So the B300 needs about 17 times the power of a typical rack. And according to Uptime Institute's 2024 survey, only about 1% of data center operators currently run racks above 100 kW.

Rack power density comparison

That gap between what the B300 demands and what the world's data center infrastructure can actually deliver is the story nobody is telling properly. Behind every cloud GPU instance running Blackwell Ultra is a facility that had to solve problems in power delivery, liquid cooling, and grid access that most buildings on earth are not equipped to handle.

This post breaks down the real infrastructure cost of running B300s, the deployment problems operators have already encountered, and why the electricity grid itself is becoming the binding constraint on AI compute scaling.

GPU Rowhammer Is Real: A Single Bit Flip Drops AI Model Accuracy from 80% to 0.1%

· 13 min read
Dhayabaran V
Barrack AI

A single bit flip in GPU memory dropped an AI model's accuracy from 80% to 0.1%.

That is not a theoretical risk. It is a documented, reproducible attack called GPUHammer, demonstrated on an NVIDIA RTX A6000 by University of Toronto researchers and presented at USENIX Security 2025. The attack requires only user-level CUDA privileges and works in multi-tenant cloud GPU environments where attacker and victim share the same physical GPU.

GPUHammer is not the only GPU hardware vulnerability. LeftoverLocals (CVE-2023-4969) proved that AMD, Apple, and Qualcomm GPUs leak memory between processes, allowing full reconstruction of LLM responses. NVBleed demonstrated cross-VM data leakage through NVIDIA's NVLink interconnect on Google Cloud Platform. And at RSA Conference 2026, analysts highlighted that traditional security tools monitor only CPU and OS activity, leaving GPU operations completely invisible.

If you are training or running inference on cloud GPUs, this matters. Here is the full technical breakdown.

NVIDIA's CUDA Never Clears GPU Memory. Here's a Decade of Research Showing Why That Matters.

· 15 min read
Dhayabaran V
Barrack AI

NVIDIA's official CUDA documentation explicitly states that cudaMalloc() does not clear memory. That means every GPU memory allocation can return data left behind by a previous process. Academic researchers have been exploiting this behavior since 2014, recovering credit card numbers, rendered webpages, LLM responses, and model weights from GPU memory residues. NVIDIA's only documented fix, Confidential Computing on H100, is opt-in and requires specific hardware that most deployments don't use.

This post compiles every verified source on the topic: NVIDIA's documentation, peer-reviewed research from IEEE S&P, USENIX Security, ACM CCS, active CVEs, and NVIDIA's own security bulletins. No speculation. No assumptions. Just what NVIDIA documents, what researchers have proven, and what ML engineers should know.

NVIDIA's own documentation confirms memory is not cleared

The foundation of this entire issue is a single, unambiguous sentence repeated across every NVIDIA CUDA memory allocation API. The official CUDA Runtime API documentation for cudaMalloc() states:

"Allocates size bytes of linear memory on the device and returns in *devPtr a pointer to the allocated memory. The allocated memory is suitably aligned for any kind of variable. The memory is not cleared."

This identical language appears in the documentation for cuMemAlloc, cudaMallocManaged, and cuMemAllocManaged across both the CUDA Runtime API and CUDA Driver API. NVIDIA's own Compute Sanitizer tool includes an --tool initcheck mode specifically designed to detect "Uninitialized global memory read" errors after cudaMalloc, confirming that the returned memory contains whatever data previously occupied those addresses.

Beyond individual allocations, no official NVIDIA documentation guarantees that GPU memory is zeroed between processes or CUDA contexts in standard (non-Confidential Computing) operation. The CUDA C++ Best Practices Guide contains zero mention of security considerations for memory management, focusing exclusively on performance optimization. The cudaFree() documentation describes freeing memory but does not specify whether freed memory is zeroed before reallocation. NVIDIA Developer Forum posts corroborate this: one widely cited thread from 2018 notes "From my experience, the driver does not erase memory after it is freed. You can easily do it yourself from host code using cuda_memset()." NVIDIA staff did not correct this statement.

NVIDIA's own gpu-admin-tools repository on GitHub includes a --clear-memory flag described as "Clear the contents of the GPU memory." The existence of this tool as an explicit administrative action confirms that memory clearing is not a default operation.

For AMD's ROCm/HIP ecosystem, the situation is worse in terms of documentation: AMD's hipMalloc() documentation is entirely silent on whether allocated memory is zeroed or uninitialized. Since HIP is designed as a CUDA-compatible interface and cudaMalloc() explicitly does not clear memory, the documented behavior of the reference implementation points to the same outcome. AMD provides no explicit documentation either way.

This stands in stark contrast to CPU memory behavior. The Linux kernel guarantees zeroed pages to userspace processes via get_zeroed_page() and mmap(), with additional hardening options like CONFIG_INIT_ON_ALLOC_DEFAULT_ON (since kernel v5.3) and CONFIG_INIT_ON_FREE_DEFAULT_ON. No documented GPU equivalent exists.

A decade of academic research proves data leaks from GPU memory

The academic literature on GPU memory security is extensive, consistent, and spans over a decade. Multiple peer-reviewed papers across top-tier venues have demonstrated that GPU memory persistence is exploitable.

Lee et al. (IEEE S&P 2014) published "Stealing Webpages Rendered on Your Browser by Exploiting GPU Vulnerabilities," the first in-depth security analysis of GPU memory. They discovered that both NVIDIA and AMD GPUs do not initialize newly allocated GPU memory pages. The researchers recovered rendered webpage textures from GPU memory residues and identified original webpages with up to 95.4% accuracy using pixel sequence matching.

Maurice et al. (Financial Cryptography 2014) published "Confidentiality Issues on a GPU in a Virtualized Environment" and demonstrated cross-VM GPU data recovery. Their finding: GPU global memory is zeroed only in some configurations, and when it does happen, it occurs as a side effect of Error Correction Codes (ECC), not for security reasons. They explicitly warned that memory cleaning is not implemented by the GPU card itself.

Zhou et al. (PoPETs 2017) published "Vulnerable GPU Memory Management: Towards Recovering Raw Data from GPU" and proposed an algorithm for recovering raw images directly from GPU memory residues. The researchers recovered credit card numbers, email contents, usernames, and credentials from GPU memory left by Google Chrome, Adobe PDF Reader, GIMP, and Matlab. Their conclusion: nearly all GPU-accelerated applications are vulnerable to such attacks, and adversaries can launch attacks without requiring any special privileges.

Naghibijouybari et al. (ACM CCS 2018) published "Rendered Insecure: GPU Side Channel Attacks are Practical" and demonstrated the first general side-channel attacks on GPUs, including website fingerprinting with approximately 90% accuracy and the ability to derive internal parameters of neural network models used by other CUDA applications. This research led to CVE-2018-6260.

Pustelnik et al. (IEEE EuroS&P 2024) published "Whispering Pixels: Exploiting Uninitialized Register Accesses in Modern GPUs" and uncovered a vulnerability class where GPU implementations lack proper register initialization before shader execution. On NVIDIA GPUs, reading from uninitialized registers reveals data previously written to GPU memory. This affects products from Apple, NVIDIA, and Qualcomm, and the researchers demonstrated leaking CNN intermediate data and LLM output reconstruction. AMD assigned CVE-2024-21969 for this issue.

Guo et al. (USENIX Security 2024) published "GPU Memory Exploitation for Fun and Profit" and demonstrated practical code injection and code reuse attacks on modern NVIDIA GPUs (Volta and newer), including tampering with DNN model parameters persisting in GPU memory to compromise inference for future requests.

LeftoverLocals demonstrated real-time LLM eavesdropping

The most impactful GPU memory security disclosure to date is LeftoverLocals (CVE-2023-4969), discovered by Tyler Sorensen and Heidy Khlaaf at Trail of Bits and disclosed on January 16, 2024. This vulnerability demonstrated that GPU local memory is not cleared between kernel executions, enabling a co-resident attacker to listen to another user's interactive LLM session in real time.

The proof-of-concept required fewer than 10 lines of OpenCL code. On an AMD Radeon RX 7900 XT running a 7B parameter model on llama.cpp, the attack leaked approximately 5.5 MB per GPU invocation, totaling approximately 181 MB per LLM query. That is enough data to reconstruct the LLM response with high precision. The PoC code is publicly available on GitHub.

LeftoverLocals affected AMD, Apple, Qualcomm, and Imagination Technologies GPUs. NVIDIA GPUs were confirmed not affected. Trail of Bits noted that NVIDIA had likely addressed these memory leak patterns due to prior academic research dating back to the CUDA Leaks paper.

AMD's response was telling: they created a new operating mode designed to prevent processes from running in parallel on the GPU and to clear registers between processes on supported products. This mode is not enabled by default and needs to be set by an administrator.

Container escape vulnerabilities compound the memory risk

While GPU memory persistence creates the data exposure surface, container escape vulnerabilities provide the attack path in cloud environments. Wiz Research has discovered a series of critical vulnerabilities in the NVIDIA Container Toolkit that enable complete host compromise from within a container:

CVE-2024-0132 (September 2024, CVSS 9.0): A Time-of-Check Time-of-Use (TOCTOU) vulnerability in NVIDIA Container Toolkit v1.16.1 and earlier. A specially crafted container image could escape its boundaries and gain full access to the host file system. Wiz estimated approximately 35% of cloud environments had vulnerable versions installed. Discovered by Andres Riancho, Ronen Shustin, and Shir Tamari from Wiz Research.

CVE-2025-23359 (February 2025, CVSS 9.0): The patch for CVE-2024-0132 was incomplete. Trend Micro found that the TOCTOU vulnerability persisted, enabling the same container escape attack on patched systems. Fixed in Container Toolkit v1.17.4.

CVE-2025-23266 "NVIDIAScape" (July 2025, CVSS 9.0): A vulnerability in the Container Toolkit's enable-cuda-compat OCI hook, which inherited environment variables (including LD_PRELOAD) from container images. An attacker could craft a malicious image that, when processed by the privileged hook, loaded a rogue library outside the container, granting root access on the host. Exploitable with a 3-line Dockerfile. Per Wiz, 37% of cloud environments had vulnerable resources. Wiz stated that this vulnerability represents a systemic risk to the AI ecosystem because the NVIDIA Container Toolkit is the backbone for managed AI and GPU services across all major cloud providers.

The January 2026 NVIDIA security bulletin disclosed additional memory-related vulnerabilities including CVE-2025-33220 (CVSS 7.8), a use-after-free in the vGPU Virtual GPU Manager enabling guest-to-host escape, directly threatening multi-tenant GPU virtualization environments. The same bulletin included CVE-2025-33217 (CVSS 7.8, use-after-free in Windows GPU display driver) and CVE-2025-33218 (CVSS 7.8, integer overflow in Windows kernel-mode driver). The January 2026 CUDA Toolkit bulletin added four more CVEs (CVE-2025-33228 through CVE-2025-33231), including high-severity OS command injection flaws in Nsight Systems.

On the AMD side, CVE-2026-23213 (CVSS 5.5) addressed improper MMIO access handling during SMU Mode 1 reset in the Linux kernel's AMDGPU driver, creating race conditions during GPU power management transitions.

The responsibility falls on NVIDIA's driver and firmware layer

The pattern across all the research above points to the same root cause: NVIDIA's GPU driver and firmware do not perform memory sanitization by default. The CUDA API does not zero memory on allocation. The driver does not zero memory on free. No documented automatic scrubbing occurs between CUDA contexts or processes in standard operation. The only documented exception is Confidential Computing mode on H100, which requires explicit opt-in at the firmware level.

This means that regardless of what infrastructure a GPU runs on, whether it is a local workstation, an on-premise cluster, or any hosted environment, the default NVIDIA behavior is the same: memory is not cleared. The security posture of any GPU deployment is bounded by what NVIDIA's driver and firmware do (or don't do) at the hardware level.

MIG provides runtime isolation but not documented temporal isolation

NVIDIA's Multi-Instance GPU (MIG) technology, available on Ampere architecture and newer, provides hardware-level partitioning of a single GPU into up to seven isolated instances. The MIG User Guide states that each instance's processors have separate and isolated paths through the entire memory system, including on-chip crossbar ports, L2 cache banks, memory controllers, and DRAM address busses.

This provides strong runtime isolation: one MIG instance cannot access another's memory during operation.

However, no MIG documentation explicitly addresses memory scrubbing when MIG instances are destroyed and recreated. The documentation notes that created MIG devices are not persistent across system reboots and requires administrators to recreate the desired MIG configurations if the GPU or system is reset. Whether that reset includes memory scrubbing is not specified. The documentation focuses entirely on runtime isolation (preventing concurrent access), not temporal isolation (clearing data between successive tenants of the same partition).

Research scheduled for USENIX Security 2026 ("Behind Bars: A Side-Channel Attack on NVIDIA MIG Cache Partitioning Using Memory Barriers") and NDSS 2026 ("Exploiting TLBs in Virtualized GPUs for Cross-VM Side-Channel Attacks") indicates that even MIG's runtime isolation may have weaknesses through cache and TLB side channels.

Confidential Computing addresses the gap, but only when enabled

NVIDIA's H100 Confidential Computing (CC) mode is the only documented mechanism that explicitly guarantees memory scrubbing between tenants. The official NVIDIA whitepaper ("Confidential Compute on NVIDIA Hopper H100," WP-11459-001) describes the process:

A toggle operation requires a Function Level Reset (FLR) of the GPU for the mode to take effect. During this reset, a memory lock is engaged which blocks access to the GPU's memory until it has been scrubbed, mitigating cold boot attacks. GPU Firmware initiates a scrub of memory and states in registers and SRAMs before the GPU is handed over to the user.

The NVIDIA developer blog confirms this occurs at both boot and tenant shutdown. An ACM Queue publication by NVIDIA engineers further states the scrubbing ensures all the states in registers and SRAMs are correctly reset before the GPU is handed to the next tenant. The scrubbing is managed by the Secure Processor (SEC2) engine on the GPU die.

Three critical limitations constrain CC's practical impact:

CC is opt-in and requires specific infrastructure. The host CPU must support Intel TDX, AMD SEV-SNP, or ARM CCA. The GPU must be explicitly toggled into CC-On mode. The vast majority of cloud GPU deployments do not use CC mode.

HBM memory is not encrypted during computation. The whitepaper explicitly states that the on-package HBM memory is considered secure against common physical attack tools, such as interposers, and is not encrypted. Data runs in plaintext inside the GPU. The security model relies on the physical inaccessibility of on-package HBM.

Memory scrubbing only occurs during FLR (GPU reset between tenants). Within a single CC session, standard CUDA allocation behavior applies. cudaMalloc still returns uncleared memory. CC protects against inter-tenant leakage, not intra-session memory reuse.

What lives in GPU VRAM makes the stakes concrete

GPU VRAM during a typical training or inference session contains:

Model parameters (weights and biases) persist throughout the entire session. Optimizer states, which for Adam includes first and second moment estimates, roughly doubling the memory footprint of the model parameters alone. Gradients are computed and stored during backward passes. Activations (intermediate layer outputs) are retained for backpropagation, often consuming the largest share of memory. Training data batches, which are the actual input data including tokenized text, images, or embeddings, reside in VRAM during processing. For inference, KV caches store attention key-value pairs for sequence generation.

The Ohio Supercomputer Center documents that total training VRAM for transformer-based models with Adam optimizer in mixed precision requires approximately 40x the model parameter count in billions of bytes. A 7B parameter model consumes roughly 280 GB across its memory footprint. Every byte of this data is potentially recoverable from uncleared GPU memory.

No confirmed real-world breaches exploiting GPU memory persistence in production have been publicly reported. All documented cases are researcher proof-of-concepts and coordinated vulnerability disclosures. The gap between demonstrated capability (academic PoCs recovering credit cards, emails, LLM responses, model weights) and documented protections (essentially none in standard deployments) is the core issue.

What you can do about it

Based on what is documented and available today:

Zero your own VRAM. Call cudaMemset(ptr, 0, size) on all allocated buffers before calling cudaFree(). This is not the default behavior of any ML framework. You would need to add this explicitly to your training/inference pipeline.

Use single-tenant instances for sensitive workloads. If your workload processes proprietary models, PII, or regulated data, dedicated-host options where the physical GPU is not shared eliminate cross-tenant risk during operation.

Evaluate Confidential Computing where available. NVIDIA's H100 Confidential Computing mode is the only option with documented firmware-level VRAM scrubbing between sessions. It comes with infrastructure requirements and cost premiums, but it is the only NVIDIA-documented solution to the memory persistence problem.

Monitor NVIDIA's security bulletins. Three critical container escape CVEs in 18 months (CVE-2024-0132, CVE-2025-23359, CVE-2025-23266) demonstrate that timely patching of the NVIDIA Container Toolkit is not optional if you run GPU workloads in containers.

Use NVIDIA's gpu-admin-tools for manual scrubbing. NVIDIA's gpu-admin-tools repository on GitHub includes a --clear-memory flag that explicitly clears GPU memory contents. If you manage your own GPU infrastructure, this can be integrated into your teardown process between workloads.

FAQ

Q: Is this different from how CPU memory works? Yes. The Linux kernel guarantees zeroed pages to userspace processes through mmap() and related calls. This has been standard behavior for decades and is further hardened by kernel options like CONFIG_INIT_ON_ALLOC_DEFAULT_ON. No equivalent default behavior exists for GPU memory.

Q: Were NVIDIA GPUs affected by LeftoverLocals? No. NVIDIA confirmed that their devices were not affected by LeftoverLocals (CVE-2023-4969). Trail of Bits noted that NVIDIA had likely addressed these memory leak patterns in their driver due to prior academic research. AMD, Apple, Qualcomm, and Imagination Technologies GPUs were affected.

Q: Does NVIDIA MIG (Multi-Instance GPU) solve this? MIG provides runtime isolation between concurrent tenants on the same physical GPU. Each MIG instance has isolated memory paths, cache banks, and memory controllers. However, no MIG documentation specifies whether memory is scrubbed when MIG instances are destroyed and recreated for a new tenant. Runtime isolation and temporal isolation are different properties.

Q: What is NVIDIA Confidential Computing and does it fix the VRAM persistence issue? NVIDIA's Confidential Computing mode on H100 GPUs is the only documented mechanism that performs firmware-level VRAM scrubbing between tenants. During a GPU Function Level Reset (FLR), the GPU's Secure Processor scrubs all memory and register states before handing the GPU to the next tenant. It requires specific hardware (H100+), compatible CPUs (Intel TDX, AMD SEV-SNP, or ARM CCA), and must be explicitly enabled. It is not the default GPU operating mode.

Q: Does NVIDIA's driver clear memory when a process terminates or a CUDA context is destroyed? No official NVIDIA documentation guarantees this. See the first section of this post for full details on what NVIDIA's documentation does and does not specify.

Q: Should I be worried about NVIDIA's memory behavior in my ML workloads? The documented risk is real but context-dependent. If you are running non-sensitive workloads (public model fine-tuning, open-source inference), the practical risk is low. If you are processing proprietary models, PII, healthcare data, financial data, or any regulated information on GPU infrastructure, NVIDIA's default behavior of not clearing memory on allocation or deallocation is a gap that warrants evaluation against your compliance and security requirements. The mitigation is straightforward: zero your own buffers with cudaMemset before freeing them, and evaluate Confidential Computing for workloads that require firmware-level guarantees.

Langflow Got Hacked Twice Through the Same exec() Call. Your AI Stack Probably Has the Same Problem.

· 15 min read
Dhayabaran V
Barrack AI

Langflow fixed a critical RCE last year. Attackers just found the same unsandboxed exec() call on a different endpoint, and exploited it in 20 hours flat, with no public proof-of-concept code.

CVE-2026-33017 (CVSS 9.3, Critical) is an unauthenticated remote code execution vulnerability affecting all Langflow versions through 1.8.1, fixed in 1.9.0. Within 20 hours of the advisory going public on March 17, 2026, attackers built working exploits from the advisory text alone and began harvesting API keys for OpenAI, Anthropic, and AWS from compromised instances.

The important part for anyone running AI orchestration tools: the fix for the first vulnerability (CVE-2025-3248) was structurally incapable of preventing this one, because the vulnerable endpoint is designed to be unauthenticated. This is a case study in why AI orchestration tools demand security review at the architecture level, not just the endpoint level.

Your ML Pipeline's Security Scanner Was Stealing Your Cloud Credentials for 12 Hours

· 15 min read
Dhayabaran V
Barrack AI

On March 19, 2026, threat actors hijacked Aqua Security's Trivy vulnerability scanner, one of the most widely used container security tools in the open-source ecosystem, and turned it into an infostealer that exfiltrated every secret it could find from CI/CD pipelines.

If your team runs trivy-action in GitHub Actions to scan Docker images before deploying to GPU cloud infrastructure, your GPU cloud API keys, HuggingFace tokens, Weights & Biases credentials, and cloud IAM keys may have been stolen.

The attack affected 75 of 76 release tags across a roughly 12-hour window. Over 10,000 GitHub workflow files reference trivy-action, and StepSecurity's Harden-Runner telemetry detected compromised instances making outbound connections to attacker infrastructure across 12,000+ public repositories.

This post breaks down exactly what happened, what was stolen, why ML engineers face outsized risk, and the precise steps you need to take right now.

DarkSword and the LLM Question: What Every Outlet Mentioned but Nobody Wrote About

· 24 min read
Dhayabaran V
Barrack AI

On March 18, 2026, Lookout, Google's Threat Intelligence Group (GTIG), and iVerify published coordinated research disclosing DarkSword, a full-chain iOS exploit kit targeting iPhones running iOS 18.4 through 18.7. Within 48 hours, every major cybersecurity outlet covered the story. They covered the six CVEs, the three zero-days, the exploit chain walkthrough, the 270 million affected devices, the threat actor attribution, and the Coruna connection. But buried inside nearly every article was a finding that none of them turned into its own piece: indicators of LLM-assisted code inside a mass-deployed iOS exploit kit.

NVIDIA Spent $20 Billion Because GPUs Alone Can't Win the Inference Era

· 19 min read
Dhayabaran V
Barrack AI

On March 16, 2026, Jensen Huang took the stage at GTC in San Jose and unveiled the NVIDIA Groq 3 LPU: a chip that is not a GPU, does not run CUDA natively, and exists for one reason only. Inference.

Three months earlier, on Christmas Eve 2025, NVIDIA paid $20 billion in cash to license Groq's entire patent portfolio, hire roughly 90% of its employees, and acquire all of its assets. It was the largest deal in NVIDIA's history. The company that built the GPU monopoly spent $20 billion on a chip that replaces GPUs for the most latency-sensitive phase of AI inference.

This is not a product announcement recap. Every major outlet has covered the Groq 3 specs. What nobody has published is the synthesis: why the GPU company needed a non-GPU chip, what the data says about GPU architectural limitations during inference decode, and what this means for the thousands of ML teams currently renting GPUs for inference workloads.

Every claim in this post is sourced. NVIDIA's own projections are labeled as such. Independent benchmarks are cited separately.

Qihoo 360's AI Product Leaked the Platform's SSL Key, Issued by Its Own CA Banned for Fraud

· 18 min read
Dhayabaran V
Barrack AI

Qihoo 360, China's largest cybersecurity company with approximately 460 million users and a valuation of approximately $10 billion, shipped a wildcard SSL private key inside the public installer of its new AI assistant, 360 Security Lobster (360安全龙虾).

The certificate was issued by WoTrus CA Limited. WoTrus is a subsidiary of Qihoo 360 and the rebranded version of WoSign, a certificate authority that was distrusted by Google Chrome, Mozilla Firefox, and Apple Safari in 2016 for backdating certificates and concealing corporate acquisitions.

Six days before the key was discovered in the installer, Qihoo 360 founder Zhou Hongyi publicly promised that 360 Security Lobster would "not damage the user's system, not delete data, and not leak passwords or other private information on the user's computer."

The original Chinese statement from Zhou Hongyi:

保证"龙虾"在用户电脑上不会破坏系统、不删除数据、不泄露密码等隐私信息。