Miri, Valgrind, and Sanitizers — Verifying Unsafe Code 🔴
Miri、Valgrind 与 Sanitizer：验证 unsafe 代码 🔴

What you’ll learn:
本章将学到什么：

Miri as a MIR interpreter — what it catches and what it cannot
把 Miri 当成 MIR 解释器来理解：它能抓什么，抓不到什么

Valgrind memcheck, Helgrind, Callgrind, and Massif
Valgrind 家族工具：memcheck、Helgrind、Callgrind、Massif

LLVM sanitizers: ASan, MSan, TSan, LSan with nightly -Zbuild-std
LLVM Sanitizer：ASan、MSan、TSan、LSan，以及 nightly 下的 -Zbuild-std

cargo-fuzz for crash discovery and loom for concurrency model checking
如何用 cargo-fuzz 找崩溃，以及用 loom 做并发模型检查

A decision tree for choosing the right verification tool
如何选择合适验证工具的决策树

Cross-references: Code Coverage — coverage finds untested paths, Miri verifies the tested ones · no_std & Features — no_std code often requires unsafe that Miri can verify · CI/CD Pipeline — Miri job in the pipeline
交叉阅读： 代码覆盖率负责找没测到的路径；Miri 则负责验证已经测到的路径里有没有未定义行为。no_std 与 feature 讲的很多 unsafe 场景也适合拿 Miri 来校验。CI/CD 流水线则会把 Miri 接进流水线。

Safe Rust guarantees memory safety and data-race freedom at compile time. But the moment you write unsafe for FFI、手写数据结构或者性能技巧，这些保证就变成了开发者自己的责任。本章讨论的，就是怎么证明这些 unsafe 真配得上它嘴里的安全契约。
Safe Rust 会在编译期保证内存安全和无数据竞争。但只要写下 unsafe，无论是为了 FFI、手写数据结构还是性能技巧，这些保证就得自己扛。本章讲的就是：拿什么工具去验证这些 unsafe 代码，真的没有在胡来。

Miri — An Interpreter for Unsafe Rust
Miri：unsafe Rust 的解释器

Miri is an interpreter for Rust MIR. Instead of producing machine code, it executes your program step by step and checks every operation for undefined behavior.
Miri 是 Rust MIR 的解释器。它不生成机器码，而是一步一步执行程序，同时在每个操作点上检查有没有未定义行为。

# Install Miri (nightly-only component)
rustup +nightly component add miri

# Run your test suite under Miri
cargo +nightly miri test

# Run a specific binary under Miri
cargo +nightly miri run

# Run a specific test
cargo +nightly miri test -- test_name

How Miri works:
Miri 大概是这么工作的：

Source → rustc → MIR → Miri interprets MIR
                        │
                        ├─ Tracks every pointer's provenance
                        ├─ Validates every memory access
                        ├─ Checks alignment at every deref
                        ├─ Detects use-after-free
                        ├─ Detects data races (with threads)
                        └─ Enforces Stacked Borrows / Tree Borrows rules

源码 → rustc → MIR → Miri 解释执行 MIR
                    │
                    ├─ 跟踪每个指针的 provenance
                    ├─ 校验每一次内存访问
                    ├─ 检查解引用时的对齐
                    ├─ 抓 use-after-free
                    ├─ 检测线程间数据竞争
                    └─ 执行 Stacked Borrows / Tree Borrows 规则

What Miri Catches (and What It Cannot)
Miri 能抓什么，抓不到什么

Miri detects:
Miri 能抓到的典型问题：

Category 类别	Example 例子	Would Crash at Runtime? 运行时一定会崩吗
Out-of-bounds access 越界访问	`ptr.add(100).read()`	Sometimes 不一定
Use after free 释放后继续用	Reading a dropped `Box`	Sometimes
Double free 重复释放	`drop_in_place` twice	Usually
Unaligned access 未对齐访问	`(ptr as *const u32).read()` on odd address	On some architectures
Invalid values 非法值	`transmute::<u8, bool>(2)`	Often silent
Dangling references 悬垂引用	`&*ptr` where ptr is freed	Often silent
Data races 数据竞争	Two threads, unsynchronized writes	Hard to reproduce
Stacked Borrows violation 借用规则违例	aliasing `&mut`	Often silent

Miri does NOT detect:
Miri 抓不到的东西：

Limitation 限制	Why 原因
Logic bugs 业务逻辑错误	Miri checks safety, not correctness 它查安全，不查业务含义。
Deadlocks and livelocks 死锁与活锁	It is not a full concurrency model checker 它不是完整并发模型检查器。
Performance problems 性能问题	It is an interpreter, not a profiler 它是解释器，不是性能分析器。
OS/hardware interaction 系统调用和硬件交互	It cannot emulate devices and most syscalls 它没法模拟真实外设和大量系统调用。
All FFI calls 所有 FFI 调用	It cannot interpret C code 它解释不了 C 代码。
Paths your tests never reach 测试没走到的路径	It only checks executed code paths 没执行到的路径它也看不到。

A concrete example:
一个实际例子：

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    #[test]
    fn test_miri_catches_ub() {
        let mut v = vec![1, 2, 3];
        let ptr = v.as_ptr();

        v.push(4);

        // ❌ UB: ptr may be dangling after reallocation
        // let _val = unsafe { *ptr };

        // ✅ Correct: get a fresh pointer after mutation
        let ptr = v.as_ptr();
        let val = unsafe { *ptr };
        assert_eq!(val, 1);
    }
}
}

Running Miri on a Real Crate
在真实 crate 上跑 Miri

# Step 1: Run all tests under Miri
cargo +nightly miri test 2>&1 | tee miri_output.txt

# Step 2: If Miri reports errors, isolate them
cargo +nightly miri test -- failing_test_name

# Step 3: Use Miri's backtrace for diagnosis
MIRIFLAGS="-Zmiri-backtrace=full" cargo +nightly miri test

# Step 4: Choose a borrow model
cargo +nightly miri test
MIRIFLAGS="-Zmiri-tree-borrows" cargo +nightly miri test

Useful Miri flags:
常用的 Miri 参数：

MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly miri test
MIRIFLAGS="-Zmiri-seed=42" cargo +nightly miri test
MIRIFLAGS="-Zmiri-strict-provenance" cargo +nightly miri test
MIRIFLAGS="-Zmiri-disable-isolation -Zmiri-backtrace=full -Zmiri-strict-provenance" \
    cargo +nightly miri test

Miri in CI:
CI 里的 Miri：

name: Miri
on: [push, pull_request]

jobs:
  miri:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@nightly
        with:
          components: miri

      - name: Run Miri
        run: cargo miri test --workspace
        env:
          MIRIFLAGS: "-Zmiri-backtrace=full"

Performance note: Miri is often 10-100× slower than native execution. In CI, it is better to focus on crates or tests that actually contain unsafe code.
性能提醒：Miri 经常比原生执行慢 10 到 100 倍，所以在 CI 里最好只挑那些真的带 unsafe 的 crate 或测试来跑。

Valgrind and Its Rust Integration
Valgrind 以及它在 Rust 里的用法

Valgrind is the classic native memory checker from the C/C++ world, but it can also inspect compiled Rust binaries because它看的是最终机器码。
Valgrind 是 C/C++ 世界里非常经典的内存检查工具。它同样能检查 Rust 编译后的二进制，因为它盯的是最终生成的机器码。

# Install Valgrind
sudo apt install valgrind

# Build with debug info
cargo build --tests

# Run a specific test binary under Valgrind
valgrind --tool=memcheck \
    --leak-check=full \
    --show-leak-kinds=all \
    --track-origins=yes \
    ./target/debug/deps/my_crate-abc123 --test-threads=1

# Run the main binary
valgrind --tool=memcheck \
    --leak-check=full \
    --error-exitcode=1 \
    ./target/debug/diag_tool --run-diagnostics

Valgrind tools beyond memcheck:
除了 memcheck，Valgrind 还有这些工具：

Tool	Command	What It Detects 作用
Memcheck	`--tool=memcheck`	Memory leaks, use-after-free, buffer overflows 内存泄漏、释放后访问、越界
Helgrind	`--tool=helgrind`	Data races and lock-order violations 数据竞争和锁顺序问题
DRD	`--tool=drd`	Data races with another algorithm 另一套数据竞争检测算法
Callgrind	`--tool=callgrind`	Instruction-level profiling 指令级性能分析
Massif	`--tool=massif`	Heap memory profile over time 堆内存变化曲线
Cachegrind	`--tool=cachegrind`	Cache miss analysis 缓存命中分析

Using Callgrind:
Callgrind 的典型用法：

valgrind --tool=callgrind \
    --callgrind-out-file=callgrind.out \
    ./target/release/diag_tool --run-diagnostics

kcachegrind callgrind.out
callgrind_annotate callgrind.out | head -100

Miri vs Valgrind:
Miri 和 Valgrind 怎么选：

Aspect 方面	Miri	Valgrind
Rust-specific UB Rust 专属 UB	✅	❌
FFI / C code FFI 与 C 代码	❌	✅
Needs nightly 需要 nightly	✅	❌
Speed 速度	10-100× slower	10-50× slower
Leak detection 泄漏检测	✅	✅
Data race detection 数据竞争	✅	✅（借助 Helgrind/DRD）

Use both:
最务实的做法是两者配合：

Miri for pure Rust unsafe code
纯 Rust unsafe 先交给 Miri。
Valgrind for FFI-heavy code and whole-program leak checks
FFI 重的路径和整程序泄漏分析交给 Valgrind。

AddressSanitizer, MemorySanitizer, ThreadSanitizer
ASan、MSan、TSan 与 LSan

LLVM sanitizers are compile-time instrumentation passes with runtime checks. They are typically much faster than Valgrind and catch a different slice of bugs.
LLVM sanitizer 是编译期插桩、运行期检查的一类工具。它们通常比 Valgrind 快很多，而且能抓到另一类问题。

rustup component add rust-src --toolchain nightly

RUSTFLAGS="-Zsanitizer=address" \
    cargo +nightly test -Zbuild-std --target x86_64-unknown-linux-gnu

RUSTFLAGS="-Zsanitizer=memory" \
    cargo +nightly test -Zbuild-std --target x86_64-unknown-linux-gnu

RUSTFLAGS="-Zsanitizer=thread" \
    cargo +nightly test -Zbuild-std --target x86_64-unknown-linux-gnu

RUSTFLAGS="-Zsanitizer=leak" \
    cargo +nightly test --target x86_64-unknown-linux-gnu

Note: ASan、MSan、TSan 一般都需要 -Zbuild-std，因为标准库也得跟着插桩；LSan 相对特殊一些。
注意：ASan、MSan、TSan 通常都需要 -Zbuild-std，因为标准库本身也要重新插桩。LSan 则相对特殊一些。

Sanitizer comparison:
几种 sanitizer 的对比：

Sanitizer	Overhead 开销	Catches 抓什么
ASan	about 2×	Buffer overflow, use-after-free, stack overflow 越界、释放后访问、栈溢出
MSan	about 3×	Uninitialized reads 未初始化内存读取
TSan	5× and above	Data races 数据竞争
LSan	Minimal	Memory leaks 内存泄漏

A race example:
一个数据竞争例子：

#![allow(unused)]
fn main() {
use std::sync::Arc;
use std::thread;

fn racy_counter() -> u64 {
    let data = Arc::new(std::cell::UnsafeCell::new(0u64));
    let mut handles = vec![];

    for _ in 0..4 {
        let data = Arc::clone(&data);
        handles.push(thread::spawn(move || {
            for _ in 0..1000 {
                unsafe {
                    *data.get() += 1;
                }
            }
        }));
    }

    for h in handles {
        h.join().unwrap();
    }

    unsafe { *data.get() }
}
}

Both Miri and TSan can complain about this, and the fix is to use AtomicU64 or Mutex<u64>.
这类代码 Miri 和 TSan 都会骂，而且它们骂得没毛病。修法通常就是回到 AtomicU64 或 Mutex<u64>。

cargo-fuzz — Coverage-Guided Fuzzing:
cargo-fuzz：覆盖率引导的模糊测试。

cargo install cargo-fuzz
cargo fuzz init
cargo fuzz add parse_gpu_csv

#![allow(unused)]
#![no_main]
fn main() {
use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: &[u8]| {
    if let Ok(s) = std::str::from_utf8(data) {
        let _ = diag_tool::parse_gpu_csv(s);
    }
});
}

cargo +nightly fuzz run parse_gpu_csv -- -max_total_time=300
cargo +nightly fuzz tmin parse_gpu_csv artifacts/parse_gpu_csv/crash-...

When to fuzz: parsers、配置读取器、协议解码器、JSON/CSV 处理器，这些都很适合被 fuzz。
什么时候该 fuzz：只要函数会吃不可信或半可信输入，例如传感器输出、配置文件、网络数据、JSON/CSV，基本都值得 fuzz 一把。

loom — Concurrency Model Checker:
loom：并发模型检查器。

[dev-dependencies]
loom = "0.7"

#![allow(unused)]
fn main() {
#[cfg(loom)]
mod tests {
    use loom::sync::atomic::{AtomicUsize, Ordering};
    use loom::thread;

    #[test]
    fn test_counter_is_atomic() {
        loom::model(|| {
            let counter = loom::sync::Arc::new(AtomicUsize::new(0));
            let c1 = counter.clone();
            let c2 = counter.clone();

            let t1 = thread::spawn(move || { c1.fetch_add(1, Ordering::SeqCst); });
            let t2 = thread::spawn(move || { c2.fetch_add(1, Ordering::SeqCst); });

            t1.join().unwrap();
            t2.join().unwrap();

            assert_eq!(counter.load(Ordering::SeqCst), 2);
        });
    }
}
}

When to use loom: custom lock-free structures, atomics-heavy state machines, or handmade synchronization. For ordinary Mutex/RwLock code, it is usually unnecessary.
什么时候该用 loom：自定义无锁结构、原子变量很多的状态机、手写同步原语，这些都适合。普通 Mutex/RwLock 场景一般用不上它。

When to Use Which Tool
到底该用哪个工具

Decision tree for unsafe verification:

Is the code pure Rust (no FFI)?
├─ Yes → Use Miri
│        Also run ASan in CI for extra defense
└─ No
   ├─ Memory safety concerns?
   │  └─ Yes → Use Valgrind memcheck AND ASan
   ├─ Concurrency concerns?
   │  └─ Yes → Use TSan or Helgrind
   └─ Leak concerns?
      └─ Yes → Use Valgrind --leak-check=full

unsafe 验证的粗略决策树：

代码是不是纯 Rust，没有 FFI？
├─ 是 → 先上 Miri
│      CI 里再补一层 ASan
└─ 不是
   ├─ 担心内存安全？
   │  └─ 上 Valgrind memcheck + ASan
   ├─ 担心并发问题？
   │  └─ 上 TSan 或 Helgrind
   └─ 担心泄漏？
      └─ 上 Valgrind --leak-check=full

Recommended CI matrix:
建议的 CI 组合：

jobs:
  miri:
    runs-on: ubuntu-latest
    steps:
      - uses: dtolnay/rust-toolchain@nightly
        with: { components: miri }
      - run: cargo miri test --workspace

  asan:
    runs-on: ubuntu-latest
    steps:
      - uses: dtolnay/rust-toolchain@nightly
      - run: |
          RUSTFLAGS="-Zsanitizer=address" \
          cargo test -Zbuild-std --target x86_64-unknown-linux-gnu

  valgrind:
    runs-on: ubuntu-latest
    steps:
      - run: sudo apt-get install -y valgrind
      - uses: dtolnay/rust-toolchain@stable
      - run: cargo build --tests

Application: Zero Unsafe — and When You’ll Need It
应用场景：当前零 unsafe，以及将来什么时候会需要它

The project currently contains zero unsafe blocks, which is an excellent sign for a systems-style Rust codebase. That already covers IPMI subprocess调用、GPU 查询、PCIe 拓扑解析、SEL 管理和 JSON 报告生成。
当前工程里几乎没有 unsafe，这对一个偏系统工具的 Rust 代码库来说，其实非常漂亮。像 IPMI 子进程调用、GPU 查询、PCIe 拓扑解析、SEL 管理和 JSON 报告生成，都已经靠 safe Rust 搞定了。

When unsafe is likely to appear:
未来最可能引入 unsafe 的场景：

Scenario 场景	Why `unsafe` 为什么会需要 `unsafe`	Recommended Verification 建议验证方式
Direct ioctl-based IPMI 直接 ioctl 调 IPMI	Need raw syscalls 需要原始系统调用	Miri + Valgrind
Direct GPU driver queries 直接调 GPU 驱动	FFI to native SDK 原生 SDK FFI	Valgrind
Memory-mapped PCIe config 内存映射 PCIe 配置空间	Raw pointer arithmetic 裸指针访问	ASan + Valgrind
Lock-free SEL buffer 无锁 SEL 缓冲区	Atomics and pointer juggling 原子和指针配合	Miri + TSan
Embedded/no_std variant 嵌入式 `no_std` 版本	Bare-metal pointer manipulation 裸机下的指针操作	Miri

Preparation pattern:
一个很稳的准备方式：

[features]
default = []
direct-ipmi = []
direct-accel-api = []

#![allow(unused)]
fn main() {
#[cfg(feature = "direct-ipmi")]
mod direct {
    //! Direct IPMI device access via /dev/ipmi0 ioctl.
}

#[cfg(not(feature = "direct-ipmi"))]
mod subprocess {
    //! Safe subprocess-based fallback.
}
}

Key insight: put unsafe paths behind feature flags so they can be verified independently in CI.
关键思路：把 unsafe 路径放进 feature flag 后面。这样在 CI 里就能单独验证这些高风险分支，而默认安全构建也不会被影响。

`cargo-careful` — Extra UB Checks on Stable
`cargo-careful`：额外的 UB 检查

cargo-careful runs your code with extra checks enabled. It is not as thorough as Miri, but the overhead is far lower.
cargo-careful 会在运行时打开更多检查。它没有 Miri 那么彻底，但开销小得多。

cargo install cargo-careful

cargo +nightly careful test
cargo +nightly careful run -- --run-diagnostics

What it catches:
它比较擅长抓这些问题：

uninitialized memory reads
未初始化内存读取
invalid bool / char / enum values
非法布尔值、字符或枚举值
unaligned pointer reads/writes
未对齐读写
overlapping copy_nonoverlapping ranges
本不该重叠的内存复制区间却重叠了

Least overhead                                          Most thorough
├─ cargo test ──► cargo careful test ──► Miri ──► ASan ──► Valgrind ─┤

开销最低                                               检查最重
├─ cargo test ──► cargo careful test ──► Miri ──► ASan ──► Valgrind ─┤

Troubleshooting Miri and Sanitizers
Miri 与 Sanitizer 排障

Symptom 现象	Cause 原因	Fix 处理方式
`Miri does not support FFI`	Miri cannot execute C code Miri 跑不了 C	Use Valgrind or ASan 改用 Valgrind 或 ASan。
`can't call foreign function`	Miri hit `extern "C"` 撞上外部函数了	Mock FFI or gate with `#[cfg(miri)]` mock 掉 FFI，或者单独分支。
`Stacked Borrows violation`	Aliasing violation 借用规则被破坏	Refactor ownership and aliasing 回头整理借用关系。
Sanitizer says `DEADLYSIGNAL`	ASan caught memory corruption 说明真有内存问题	Check indexing and pointer arithmetic 查索引、切片和指针运算。
`LeakSanitizer: detected memory leaks`	Leak exists or leak is intentional 有泄漏，或者故意泄漏	Suppress intentional leaks, fix accidental ones 该抑制的抑制，该修的修。
Miri is extremely slow	Interpretation overhead 解释执行本来就慢	Narrow test scope 缩小测试范围。
`TSan` false positive	Atomic ordering interpretation gap 对原子模型理解有限	Add suppressions cautiously 必要时加抑制规则。

Try It Yourself
动手试一试

Trigger a Miri UB detection: Write an unsafe function that creates two mutable references to the same i32, run cargo +nightly miri test, then fix it with UnsafeCell or separate allocations.
1. 触发一次 Miri 的 UB 报警：写一个 unsafe 函数，让同一个 i32 同时出现两个 &mut，然后跑 cargo +nightly miri test，最后用 UnsafeCell 或分离分配来修它。
Run ASan on a deliberate bug: Write an out-of-bounds access, then用 RUSTFLAGS="-Zsanitizer=address" 跑测试，看看 ASan 指到哪一行。
2. 故意让 ASan 报一次错：写一个越界访问，再用 RUSTFLAGS="-Zsanitizer=address" 跑测试，观察它如何精确指出问题位置。
Benchmark Miri overhead: Compare cargo test --lib with cargo +nightly miri test --lib and measure the slowdown factor.
3. 测一下 Miri 的开销：对比 cargo test --lib 和 cargo +nightly miri test --lib，算出慢了多少倍。

Safety Verification Decision Tree
安全验证决策树

flowchart TD
    START["Have unsafe code?<br/>代码里有 unsafe 吗？"] -->|No<br/>没有| SAFE["Safe Rust<br/>默认无需额外验证"]
    START -->|Yes<br/>有| KIND{"What kind?<br/>是哪类 unsafe？"}
    
    KIND -->|"Pure Rust unsafe<br/>纯 Rust"| MIRI["Miri<br/>catches aliasing, UB, leaks"]
    KIND -->|"FFI / C interop"| VALGRIND["Valgrind memcheck<br/>or ASan"]
    KIND -->|"Concurrent unsafe"| CONC{"Lock-free?<br/>无锁并发吗？"}
    
    CONC -->|"Atomics/lock-free"| LOOM["loom<br/>Model checker"]
    CONC -->|"Mutex/shared state"| TSAN["TSan or Miri"]
    
    MIRI --> CI_MIRI["CI: cargo +nightly miri test"]
    VALGRIND --> CI_VALGRIND["CI: valgrind --leak-check=full"]
    
    style SAFE fill:#91e5a3,color:#000
    style MIRI fill:#e3f2fd,color:#000
    style VALGRIND fill:#ffd43b,color:#000
    style LOOM fill:#ff6b6b,color:#000
    style TSAN fill:#ffd43b,color:#000

🏋️ Exercises
🏋️ 练习

🟡 Exercise 1: Trigger a Miri UB Detection
🟡 练习 1：触发一次 Miri 的 UB 检测

Write an unsafe function that creates two &mut references to the same i32, run cargo +nightly miri test, observe the error, and fix it.
写一个 unsafe 函数，让同一个 i32 同时出现两个 &mut，跑 cargo +nightly miri test，观察错误，再把它修掉。

Solution 参考答案

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    #[test]
    fn aliasing_ub() {
        let mut x: i32 = 42;
        let ptr = &mut x as *mut i32;
        unsafe {
            let _a = &mut *ptr;
            let _b = &mut *ptr;
        }
    }
}
}

#![allow(unused)]
fn main() {
use std::cell::UnsafeCell;

#[test]
fn no_aliasing_ub() {
    let x = UnsafeCell::new(42);
    unsafe {
        let a = &mut *x.get();
        *a = 100;
    }
}
}

🔴 Exercise 2: ASan Out-of-Bounds Detection
🔴 练习 2：ASan 越界检测

Create a test with out-of-bounds array access and run it under ASan.
写一个数组越界测试，再在 ASan 下运行它。

Solution 参考答案

#![allow(unused)]
fn main() {
#[test]
fn oob_access() {
    let arr = [1u8, 2, 3, 4, 5];
    let ptr = arr.as_ptr();
    unsafe {
        let _val = *ptr.add(10);
    }
}
}

RUSTFLAGS="-Zsanitizer=address" cargo +nightly test -Zbuild-std \
  --target x86_64-unknown-linux-gnu -- oob_access

Key Takeaways
本章要点

Miri is the first-choice tool for pure-Rust unsafe
Miri 是纯 Rust unsafe 的优先工具。
Valgrind is valuable for FFI-heavy code and leak analysis
Valgrind 特别适合 FFI 较重的路径和泄漏检查。
Sanitizers run faster than Valgrind and are ideal for larger test suites
Sanitizer 通常比 Valgrind 快，更适合较大的测试集。
loom is for lock-free and atomic-heavy concurrency verification
loom 适合无锁结构和原子并发验证。
Run Miri continuously and schedule heavier checks on a slower cadence
Miri 可以持续跑，更重的检查则适合按较慢节奏定时运行。

Keyboard shortcuts

Rust Engineering Practices | Rust 工程实践