Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Miri, Valgrind, and Sanitizers — Verifying Unsafe Code 🔴
Miri、Valgrind 与 Sanitizer:验证 unsafe 代码 🔴

What you’ll learn:
本章将学到什么:

  • Miri as a MIR interpreter — what it catches and what it cannot
    把 Miri 当成 MIR 解释器来理解:它能抓什么,抓不到什么
  • Valgrind memcheck, Helgrind, Callgrind, and Massif
    Valgrind 家族工具:memcheck、Helgrind、Callgrind、Massif
  • LLVM sanitizers: ASan, MSan, TSan, LSan with nightly -Zbuild-std
    LLVM Sanitizer:ASan、MSan、TSan、LSan,以及 nightly 下的 -Zbuild-std
  • cargo-fuzz for crash discovery and loom for concurrency model checking
    如何用 cargo-fuzz 找崩溃,以及用 loom 做并发模型检查
  • A decision tree for choosing the right verification tool
    如何选择合适验证工具的决策树

Cross-references: Code Coverage — coverage finds untested paths, Miri verifies the tested ones · no_std & Featuresno_std code often requires unsafe that Miri can verify · CI/CD Pipeline — Miri job in the pipeline
交叉阅读: 代码覆盖率 负责找没测到的路径;Miri 则负责验证已经测到的路径里有没有未定义行为。no_std 与 feature 讲的很多 unsafe 场景也适合拿 Miri 来校验。CI/CD 流水线 则会把 Miri 接进流水线。

Safe Rust guarantees memory safety and data-race freedom at compile time. But the moment you write unsafe for FFI、手写数据结构或者性能技巧,这些保证就变成了开发者自己的责任。本章讨论的,就是怎么证明这些 unsafe 真配得上它嘴里的安全契约。
Safe Rust 会在编译期保证内存安全和无数据竞争。但只要写下 unsafe,无论是为了 FFI、手写数据结构还是性能技巧,这些保证就得自己扛。本章讲的就是:拿什么工具去验证这些 unsafe 代码,真的没有在胡来。

Miri — An Interpreter for Unsafe Rust
Miri:unsafe Rust 的解释器

Miri is an interpreter for Rust MIR. Instead of producing machine code, it executes your program step by step and checks every operation for undefined behavior.
Miri 是 Rust MIR 的解释器。它不生成机器码,而是一步一步执行程序,同时在每个操作点上检查有没有未定义行为。

# Install Miri (nightly-only component)
rustup +nightly component add miri

# Run your test suite under Miri
cargo +nightly miri test

# Run a specific binary under Miri
cargo +nightly miri run

# Run a specific test
cargo +nightly miri test -- test_name

How Miri works:
Miri 大概是这么工作的:

Source → rustc → MIR → Miri interprets MIR
                        │
                        ├─ Tracks every pointer's provenance
                        ├─ Validates every memory access
                        ├─ Checks alignment at every deref
                        ├─ Detects use-after-free
                        ├─ Detects data races (with threads)
                        └─ Enforces Stacked Borrows / Tree Borrows rules
源码 → rustc → MIR → Miri 解释执行 MIR
                    │
                    ├─ 跟踪每个指针的 provenance
                    ├─ 校验每一次内存访问
                    ├─ 检查解引用时的对齐
                    ├─ 抓 use-after-free
                    ├─ 检测线程间数据竞争
                    └─ 执行 Stacked Borrows / Tree Borrows 规则

What Miri Catches (and What It Cannot)
Miri 能抓什么,抓不到什么

Miri detects:
Miri 能抓到的典型问题:

Category
类别
Example
例子
Would Crash at Runtime?
运行时一定会崩吗
Out-of-bounds access
越界访问
ptr.add(100).read()Sometimes
不一定
Use after free
释放后继续用
Reading a dropped BoxSometimes
Double free
重复释放
drop_in_place twiceUsually
Unaligned access
未对齐访问
(ptr as *const u32).read() on odd addressOn some architectures
Invalid values
非法值
transmute::<u8, bool>(2)Often silent
Dangling references
悬垂引用
&*ptr where ptr is freedOften silent
Data races
数据竞争
Two threads, unsynchronized writesHard to reproduce
Stacked Borrows violation
借用规则违例
aliasing &mutOften silent

Miri does NOT detect:
Miri 抓不到的东西:

Limitation
限制
Why
原因
Logic bugs
业务逻辑错误
Miri checks safety, not correctness
它查安全,不查业务含义。
Deadlocks and livelocks
死锁与活锁
It is not a full concurrency model checker
它不是完整并发模型检查器。
Performance problems
性能问题
It is an interpreter, not a profiler
它是解释器,不是性能分析器。
OS/hardware interaction
系统调用和硬件交互
It cannot emulate devices and most syscalls
它没法模拟真实外设和大量系统调用。
All FFI calls
所有 FFI 调用
It cannot interpret C code
它解释不了 C 代码。
Paths your tests never reach
测试没走到的路径
It only checks executed code paths
没执行到的路径它也看不到。

A concrete example:
一个实际例子:

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    #[test]
    fn test_miri_catches_ub() {
        let mut v = vec![1, 2, 3];
        let ptr = v.as_ptr();

        v.push(4);

        // ❌ UB: ptr may be dangling after reallocation
        // let _val = unsafe { *ptr };

        // ✅ Correct: get a fresh pointer after mutation
        let ptr = v.as_ptr();
        let val = unsafe { *ptr };
        assert_eq!(val, 1);
    }
}
}

Running Miri on a Real Crate
在真实 crate 上跑 Miri

# Step 1: Run all tests under Miri
cargo +nightly miri test 2>&1 | tee miri_output.txt

# Step 2: If Miri reports errors, isolate them
cargo +nightly miri test -- failing_test_name

# Step 3: Use Miri's backtrace for diagnosis
MIRIFLAGS="-Zmiri-backtrace=full" cargo +nightly miri test

# Step 4: Choose a borrow model
cargo +nightly miri test
MIRIFLAGS="-Zmiri-tree-borrows" cargo +nightly miri test

Useful Miri flags:
常用的 Miri 参数:

MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly miri test
MIRIFLAGS="-Zmiri-seed=42" cargo +nightly miri test
MIRIFLAGS="-Zmiri-strict-provenance" cargo +nightly miri test
MIRIFLAGS="-Zmiri-disable-isolation -Zmiri-backtrace=full -Zmiri-strict-provenance" \
    cargo +nightly miri test

Miri in CI:
CI 里的 Miri:

name: Miri
on: [push, pull_request]

jobs:
  miri:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@nightly
        with:
          components: miri

      - name: Run Miri
        run: cargo miri test --workspace
        env:
          MIRIFLAGS: "-Zmiri-backtrace=full"

Performance note: Miri is often 10-100× slower than native execution. In CI, it is better to focus on crates or tests that actually contain unsafe code.
性能提醒:Miri 经常比原生执行慢 10 到 100 倍,所以在 CI 里最好只挑那些真的带 unsafe 的 crate 或测试来跑。

Valgrind and Its Rust Integration
Valgrind 以及它在 Rust 里的用法

Valgrind is the classic native memory checker from the C/C++ world, but it can also inspect compiled Rust binaries because它看的是最终机器码。
Valgrind 是 C/C++ 世界里非常经典的内存检查工具。它同样能检查 Rust 编译后的二进制,因为它盯的是最终生成的机器码。

# Install Valgrind
sudo apt install valgrind

# Build with debug info
cargo build --tests

# Run a specific test binary under Valgrind
valgrind --tool=memcheck \
    --leak-check=full \
    --show-leak-kinds=all \
    --track-origins=yes \
    ./target/debug/deps/my_crate-abc123 --test-threads=1

# Run the main binary
valgrind --tool=memcheck \
    --leak-check=full \
    --error-exitcode=1 \
    ./target/debug/diag_tool --run-diagnostics

Valgrind tools beyond memcheck:
除了 memcheck,Valgrind 还有这些工具:

ToolCommandWhat It Detects
作用
Memcheck--tool=memcheckMemory leaks, use-after-free, buffer overflows
内存泄漏、释放后访问、越界
Helgrind--tool=helgrindData races and lock-order violations
数据竞争和锁顺序问题
DRD--tool=drdData races with another algorithm
另一套数据竞争检测算法
Callgrind--tool=callgrindInstruction-level profiling
指令级性能分析
Massif--tool=massifHeap memory profile over time
堆内存变化曲线
Cachegrind--tool=cachegrindCache miss analysis
缓存命中分析

Using Callgrind:
Callgrind 的典型用法:

valgrind --tool=callgrind \
    --callgrind-out-file=callgrind.out \
    ./target/release/diag_tool --run-diagnostics

kcachegrind callgrind.out
callgrind_annotate callgrind.out | head -100

Miri vs Valgrind:
Miri 和 Valgrind 怎么选:

Aspect
方面
MiriValgrind
Rust-specific UB
Rust 专属 UB
FFI / C code
FFI 与 C 代码
Needs nightly
需要 nightly
Speed
速度
10-100× slower10-50× slower
Leak detection
泄漏检测
Data race detection
数据竞争
✅(借助 Helgrind/DRD)

Use both:
最务实的做法是两者配合:

  • Miri for pure Rust unsafe code
    纯 Rust unsafe 先交给 Miri。
  • Valgrind for FFI-heavy code and whole-program leak checks
    FFI 重的路径和整程序泄漏分析交给 Valgrind。

AddressSanitizer, MemorySanitizer, ThreadSanitizer
ASan、MSan、TSan 与 LSan

LLVM sanitizers are compile-time instrumentation passes with runtime checks. They are typically much faster than Valgrind and catch a different slice of bugs.
LLVM sanitizer 是编译期插桩、运行期检查的一类工具。它们通常比 Valgrind 快很多,而且能抓到另一类问题。

rustup component add rust-src --toolchain nightly

RUSTFLAGS="-Zsanitizer=address" \
    cargo +nightly test -Zbuild-std --target x86_64-unknown-linux-gnu

RUSTFLAGS="-Zsanitizer=memory" \
    cargo +nightly test -Zbuild-std --target x86_64-unknown-linux-gnu

RUSTFLAGS="-Zsanitizer=thread" \
    cargo +nightly test -Zbuild-std --target x86_64-unknown-linux-gnu

RUSTFLAGS="-Zsanitizer=leak" \
    cargo +nightly test --target x86_64-unknown-linux-gnu

Note: ASan、MSan、TSan 一般都需要 -Zbuild-std,因为标准库也得跟着插桩;LSan 相对特殊一些。
注意:ASan、MSan、TSan 通常都需要 -Zbuild-std,因为标准库本身也要重新插桩。LSan 则相对特殊一些。

Sanitizer comparison:
几种 sanitizer 的对比:

SanitizerOverhead
开销
Catches
抓什么
ASanabout 2×Buffer overflow, use-after-free, stack overflow
越界、释放后访问、栈溢出
MSanabout 3×Uninitialized reads
未初始化内存读取
TSan5× and aboveData races
数据竞争
LSanMinimalMemory leaks
内存泄漏

A race example:
一个数据竞争例子:

#![allow(unused)]
fn main() {
use std::sync::Arc;
use std::thread;

fn racy_counter() -> u64 {
    let data = Arc::new(std::cell::UnsafeCell::new(0u64));
    let mut handles = vec![];

    for _ in 0..4 {
        let data = Arc::clone(&data);
        handles.push(thread::spawn(move || {
            for _ in 0..1000 {
                unsafe {
                    *data.get() += 1;
                }
            }
        }));
    }

    for h in handles {
        h.join().unwrap();
    }

    unsafe { *data.get() }
}
}

Both Miri and TSan can complain about this, and the fix is to use AtomicU64 or Mutex<u64>.
这类代码 Miri 和 TSan 都会骂,而且它们骂得没毛病。修法通常就是回到 AtomicU64Mutex<u64>

cargo-fuzz — Coverage-Guided Fuzzing:
cargo-fuzz:覆盖率引导的模糊测试。

cargo install cargo-fuzz
cargo fuzz init
cargo fuzz add parse_gpu_csv
#![allow(unused)]
#![no_main]
fn main() {
use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: &[u8]| {
    if let Ok(s) = std::str::from_utf8(data) {
        let _ = diag_tool::parse_gpu_csv(s);
    }
});
}
cargo +nightly fuzz run parse_gpu_csv -- -max_total_time=300
cargo +nightly fuzz tmin parse_gpu_csv artifacts/parse_gpu_csv/crash-...

When to fuzz: parsers、配置读取器、协议解码器、JSON/CSV 处理器,这些都很适合被 fuzz。
什么时候该 fuzz:只要函数会吃不可信或半可信输入,例如传感器输出、配置文件、网络数据、JSON/CSV,基本都值得 fuzz 一把。

loom — Concurrency Model Checker:
loom:并发模型检查器。

[dev-dependencies]
loom = "0.7"
#![allow(unused)]
fn main() {
#[cfg(loom)]
mod tests {
    use loom::sync::atomic::{AtomicUsize, Ordering};
    use loom::thread;

    #[test]
    fn test_counter_is_atomic() {
        loom::model(|| {
            let counter = loom::sync::Arc::new(AtomicUsize::new(0));
            let c1 = counter.clone();
            let c2 = counter.clone();

            let t1 = thread::spawn(move || { c1.fetch_add(1, Ordering::SeqCst); });
            let t2 = thread::spawn(move || { c2.fetch_add(1, Ordering::SeqCst); });

            t1.join().unwrap();
            t2.join().unwrap();

            assert_eq!(counter.load(Ordering::SeqCst), 2);
        });
    }
}
}

When to use loom: custom lock-free structures, atomics-heavy state machines, or handmade synchronization. For ordinary Mutex/RwLock code, it is usually unnecessary.
什么时候该用 loom:自定义无锁结构、原子变量很多的状态机、手写同步原语,这些都适合。普通 Mutex/RwLock 场景一般用不上它。

When to Use Which Tool
到底该用哪个工具

Decision tree for unsafe verification:

Is the code pure Rust (no FFI)?
├─ Yes → Use Miri
│        Also run ASan in CI for extra defense
└─ No
   ├─ Memory safety concerns?
   │  └─ Yes → Use Valgrind memcheck AND ASan
   ├─ Concurrency concerns?
   │  └─ Yes → Use TSan or Helgrind
   └─ Leak concerns?
      └─ Yes → Use Valgrind --leak-check=full
unsafe 验证的粗略决策树:

代码是不是纯 Rust,没有 FFI?
├─ 是 → 先上 Miri
│      CI 里再补一层 ASan
└─ 不是
   ├─ 担心内存安全?
   │  └─ 上 Valgrind memcheck + ASan
   ├─ 担心并发问题?
   │  └─ 上 TSan 或 Helgrind
   └─ 担心泄漏?
      └─ 上 Valgrind --leak-check=full

Recommended CI matrix:
建议的 CI 组合:

jobs:
  miri:
    runs-on: ubuntu-latest
    steps:
      - uses: dtolnay/rust-toolchain@nightly
        with: { components: miri }
      - run: cargo miri test --workspace

  asan:
    runs-on: ubuntu-latest
    steps:
      - uses: dtolnay/rust-toolchain@nightly
      - run: |
          RUSTFLAGS="-Zsanitizer=address" \
          cargo test -Zbuild-std --target x86_64-unknown-linux-gnu

  valgrind:
    runs-on: ubuntu-latest
    steps:
      - run: sudo apt-get install -y valgrind
      - uses: dtolnay/rust-toolchain@stable
      - run: cargo build --tests

Application: Zero Unsafe — and When You’ll Need It
应用场景:当前零 unsafe,以及将来什么时候会需要它

The project currently contains zero unsafe blocks, which is an excellent sign for a systems-style Rust codebase. That already covers IPMI subprocess调用、GPU 查询、PCIe 拓扑解析、SEL 管理和 JSON 报告生成。
当前工程里几乎没有 unsafe,这对一个偏系统工具的 Rust 代码库来说,其实非常漂亮。像 IPMI 子进程调用、GPU 查询、PCIe 拓扑解析、SEL 管理和 JSON 报告生成,都已经靠 safe Rust 搞定了。

When unsafe is likely to appear:
未来最可能引入 unsafe 的场景:

Scenario
场景
Why unsafe
为什么会需要 unsafe
Recommended Verification
建议验证方式
Direct ioctl-based IPMI
直接 ioctl 调 IPMI
Need raw syscalls
需要原始系统调用
Miri + Valgrind
Direct GPU driver queries
直接调 GPU 驱动
FFI to native SDK
原生 SDK FFI
Valgrind
Memory-mapped PCIe config
内存映射 PCIe 配置空间
Raw pointer arithmetic
裸指针访问
ASan + Valgrind
Lock-free SEL buffer
无锁 SEL 缓冲区
Atomics and pointer juggling
原子和指针配合
Miri + TSan
Embedded/no_std variant
嵌入式 no_std 版本
Bare-metal pointer manipulation
裸机下的指针操作
Miri

Preparation pattern:
一个很稳的准备方式:

[features]
default = []
direct-ipmi = []
direct-accel-api = []
#![allow(unused)]
fn main() {
#[cfg(feature = "direct-ipmi")]
mod direct {
    //! Direct IPMI device access via /dev/ipmi0 ioctl.
}

#[cfg(not(feature = "direct-ipmi"))]
mod subprocess {
    //! Safe subprocess-based fallback.
}
}

Key insight: put unsafe paths behind feature flags so they can be verified independently in CI.
关键思路:把 unsafe 路径放进 feature flag 后面。这样在 CI 里就能单独验证这些高风险分支,而默认安全构建也不会被影响。

cargo-careful — Extra UB Checks on Stable
cargo-careful:额外的 UB 检查

cargo-careful runs your code with extra checks enabled. It is not as thorough as Miri, but the overhead is far lower.
cargo-careful 会在运行时打开更多检查。它没有 Miri 那么彻底,但开销小得多。

cargo install cargo-careful

cargo +nightly careful test
cargo +nightly careful run -- --run-diagnostics

What it catches:
它比较擅长抓这些问题:

  • uninitialized memory reads
    未初始化内存读取
  • invalid bool / char / enum values
    非法布尔值、字符或枚举值
  • unaligned pointer reads/writes
    未对齐读写
  • overlapping copy_nonoverlapping ranges
    本不该重叠的内存复制区间却重叠了
Least overhead                                          Most thorough
├─ cargo test ──► cargo careful test ──► Miri ──► ASan ──► Valgrind ─┤
开销最低                                               检查最重
├─ cargo test ──► cargo careful test ──► Miri ──► ASan ──► Valgrind ─┤

Troubleshooting Miri and Sanitizers
Miri 与 Sanitizer 排障

Symptom
现象
Cause
原因
Fix
处理方式
Miri does not support FFIMiri cannot execute C code
Miri 跑不了 C
Use Valgrind or ASan
改用 Valgrind 或 ASan。
can't call foreign functionMiri hit extern "C"
撞上外部函数了
Mock FFI or gate with #[cfg(miri)]
mock 掉 FFI,或者单独分支。
Stacked Borrows violationAliasing violation
借用规则被破坏
Refactor ownership and aliasing
回头整理借用关系。
Sanitizer says DEADLYSIGNALASan caught memory corruption
说明真有内存问题
Check indexing and pointer arithmetic
查索引、切片和指针运算。
LeakSanitizer: detected memory leaksLeak exists or leak is intentional
有泄漏,或者故意泄漏
Suppress intentional leaks, fix accidental ones
该抑制的抑制,该修的修。
Miri is extremely slowInterpretation overhead
解释执行本来就慢
Narrow test scope
缩小测试范围。
TSan false positiveAtomic ordering interpretation gap
对原子模型理解有限
Add suppressions cautiously
必要时加抑制规则。

Try It Yourself
动手试一试

  1. Trigger a Miri UB detection: Write an unsafe function that creates two mutable references to the same i32, run cargo +nightly miri test, then fix it with UnsafeCell or separate allocations.
    1. 触发一次 Miri 的 UB 报警:写一个 unsafe 函数,让同一个 i32 同时出现两个 &mut,然后跑 cargo +nightly miri test,最后用 UnsafeCell 或分离分配来修它。

  2. Run ASan on a deliberate bug: Write an out-of-bounds access, then用 RUSTFLAGS="-Zsanitizer=address" 跑测试,看看 ASan 指到哪一行。
    2. 故意让 ASan 报一次错:写一个越界访问,再用 RUSTFLAGS="-Zsanitizer=address" 跑测试,观察它如何精确指出问题位置。

  3. Benchmark Miri overhead: Compare cargo test --lib with cargo +nightly miri test --lib and measure the slowdown factor.
    3. 测一下 Miri 的开销:对比 cargo test --libcargo +nightly miri test --lib,算出慢了多少倍。

Safety Verification Decision Tree
安全验证决策树

flowchart TD
    START["Have unsafe code?<br/>代码里有 unsafe 吗?"] -->|No<br/>没有| SAFE["Safe Rust<br/>默认无需额外验证"]
    START -->|Yes<br/>有| KIND{"What kind?<br/>是哪类 unsafe?"}
    
    KIND -->|"Pure Rust unsafe<br/>纯 Rust"| MIRI["Miri<br/>catches aliasing, UB, leaks"]
    KIND -->|"FFI / C interop"| VALGRIND["Valgrind memcheck<br/>or ASan"]
    KIND -->|"Concurrent unsafe"| CONC{"Lock-free?<br/>无锁并发吗?"}
    
    CONC -->|"Atomics/lock-free"| LOOM["loom<br/>Model checker"]
    CONC -->|"Mutex/shared state"| TSAN["TSan or Miri"]
    
    MIRI --> CI_MIRI["CI: cargo +nightly miri test"]
    VALGRIND --> CI_VALGRIND["CI: valgrind --leak-check=full"]
    
    style SAFE fill:#91e5a3,color:#000
    style MIRI fill:#e3f2fd,color:#000
    style VALGRIND fill:#ffd43b,color:#000
    style LOOM fill:#ff6b6b,color:#000
    style TSAN fill:#ffd43b,color:#000

🏋️ Exercises
🏋️ 练习

🟡 Exercise 1: Trigger a Miri UB Detection
🟡 练习 1:触发一次 Miri 的 UB 检测

Write an unsafe function that creates two &mut references to the same i32, run cargo +nightly miri test, observe the error, and fix it.
写一个 unsafe 函数,让同一个 i32 同时出现两个 &mut,跑 cargo +nightly miri test,观察错误,再把它修掉。

Solution 参考答案
#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    #[test]
    fn aliasing_ub() {
        let mut x: i32 = 42;
        let ptr = &mut x as *mut i32;
        unsafe {
            let _a = &mut *ptr;
            let _b = &mut *ptr;
        }
    }
}
}
#![allow(unused)]
fn main() {
use std::cell::UnsafeCell;

#[test]
fn no_aliasing_ub() {
    let x = UnsafeCell::new(42);
    unsafe {
        let a = &mut *x.get();
        *a = 100;
    }
}
}

🔴 Exercise 2: ASan Out-of-Bounds Detection
🔴 练习 2:ASan 越界检测

Create a test with out-of-bounds array access and run it under ASan.
写一个数组越界测试,再在 ASan 下运行它。

Solution 参考答案
#![allow(unused)]
fn main() {
#[test]
fn oob_access() {
    let arr = [1u8, 2, 3, 4, 5];
    let ptr = arr.as_ptr();
    unsafe {
        let _val = *ptr.add(10);
    }
}
}
RUSTFLAGS="-Zsanitizer=address" cargo +nightly test -Zbuild-std \
  --target x86_64-unknown-linux-gnu -- oob_access

Key Takeaways
本章要点

  • Miri is the first-choice tool for pure-Rust unsafe
    Miri 是纯 Rust unsafe 的优先工具。
  • Valgrind is valuable for FFI-heavy code and leak analysis
    Valgrind 特别适合 FFI 较重的路径和泄漏检查。
  • Sanitizers run faster than Valgrind and are ideal for larger test suites
    Sanitizer 通常比 Valgrind 快,更适合较大的测试集。
  • loom is for lock-free and atomic-heavy concurrency verification
    loom 适合无锁结构和原子并发验证。
  • Run Miri continuously and schedule heavier checks on a slower cadence
    Miri 可以持续跑,更重的检查则适合按较慢节奏定时运行。