Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Unsafe Rust
Unsafe Rust

What you’ll learn: When and how to use unsafe — raw pointer dereferencing, FFI for calling C from Rust and vice versa, CString / CStr for string interop, and the discipline required to wrap unsafe code in safe interfaces.
本章将学到什么: 什么时候该用 unsafe,以及该怎么用。内容包括原始指针解引用、Rust 与 C 双向调用的 FFI、用于字符串互操作的 CString / CStr,还有怎样把不安全代码包进安全接口里。

  • unsafe 会打开 Rust 编译器平时默认关着的那几扇门。
    也就是说,编译器不再替忙兜底,很多约束要靠代码作者自己守住。
    • Dereferencing raw pointers
      解引用原始指针
    • Accessing mutable static variables
      访问可变静态变量
    • https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html
  • With great power comes great responsibility.
    能力越大,越容易一脚踩进未定义行为。
    • unsafe 本质上是在告诉编译器:“这些不变量由程序员负责保证。”
      编译器平时会替忙检查的那部分,现在全部改成人工担保。
    • Must guarantee no aliased mutable and immutable references, no dangling pointers, no invalid references, and so on.
      必须自己保证:不存在别名的可变与不可变引用,不存在悬空指针,不存在无效引用,等等。
    • The scope of unsafe should be kept as small as possible.
      unsafe 的作用范围越小越好,别一时图省事把整段逻辑全糊进去。
    • Every unsafe block should have a Safety: comment describing the assumptions being made.
      每个 unsafe 块都应该有明确的 Safety: 注释,把成立前提写清楚。

Unsafe Rust examples
unsafe 的基础示例

unsafe fn harmless() {}
fn main() {
    // Safety: We are calling a harmless unsafe function
    unsafe {
        harmless();
    }
    let a = 42u32;
    let p = &a as *const u32;
    // Safety: p is a valid pointer to a variable that will remain in scope
    unsafe {
        println!("{}", *p);
    }
    // Safety: Not safe; for illustration purposes only
    let dangerous_buffer = 0xb8000 as *mut u32;
    unsafe {
        println!("About to go kaboom!!!");
        *dangerous_buffer = 0; // This will SEGV on most modern machines
    }
}

Simple FFI example (Rust library function consumed by C)
简单 FFI 示例:让 C 调用 Rust 库函数

FFI Strings: CString and CStr
FFI 字符串:CStringCStr

FFI 全称是 Foreign Function Interface,就是 Rust 用来和其他语言互相调用的接口机制。最常见的对象当然是 C。
这个概念听着很玄,其实就是“跨语言边界时,双方怎么约定数据和函数调用方式”。

当 Rust 代码和 C 代码交互时,Rust 的 String&str 不能直接等同于 C 字符串。Rust 字符串是 UTF-8 字节序列,不自带结尾的 \0;C 字符串则是以空字符结尾的字节数组。标准库里对应的桥接类型就是 CStringCStr
一个负责“从 Rust 侧构造可交给 C 的字符串”,另一个负责“把来自 C 的字符串借用成 Rust 可读形式”。

TypeAnalogous toUse when
CStringOwned String for C interop
给 C 用的拥有型字符串
Creating a C string from Rust data
把 Rust 数据变成 C 风格字符串时
&CStrBorrowed &str for foreign input
借用型 C 字符串视图
Receiving a C string from foreign code
接收外部代码传进来的 C 字符串时
#![allow(unused)]
fn main() {
use std::ffi::{CString, CStr};
use std::os::raw::c_char;

fn demo_ffi_strings() {
    // Creating a C-compatible string (adds null terminator)
    let c_string = CString::new("Hello from Rust").expect("CString::new failed");
    let ptr: *const c_char = c_string.as_ptr();

    // Converting a C string back to Rust (unsafe because we trust the pointer)
    // Safety: ptr is valid and null-terminated (we just created it above)
    let back_to_rust: &CStr = unsafe { CStr::from_ptr(ptr) };
    let rust_str: &str = back_to_rust.to_str().expect("Invalid UTF-8");
    println!("{}", rust_str);
}
}

Warning: CString::new() returns an error if the input contains an interior null byte \0. That Result needs to be handled. CStr 会在后面的 FFI 例子里反复出现,因为凡是从 C 边界接收字符串,几乎都得走它。
提醒: 如果字符串内部本身带着 \0CString::new() 会返回错误,所以这个 Result 不能随手糊掉。后面几乎所有 FFI 字符串示例都会用到 CStr

  • FFI 导出函数通常要标记 #[no_mangle],这样编译器才不会把符号名改得乱七八糟。
    不然 C 那边按原名去找,大概率直接扑空。
  • We’ll compile the crate as a static library.
    这里先假设把 Rust crate 编译成静态库,交给 C 链接。
#![allow(unused)]
fn main() {
#[no_mangle] 
pub extern "C" fn add(left: u64, right: u64) -> u64 {
    left + right
}
}
  • 然后可以在 C 侧按普通外部函数那样声明并调用它。
    只要 ABI 和符号名对得上,调用方式看起来就很平常。
#include <stdio.h>
#include <stdint.h>
extern uint64_t add(uint64_t, uint64_t);
int main() {
    printf("Add returned %llu\n", add(21, 21));
}

Complex FFI example
更完整的 FFI 例子

  • In the following example, the plan is to build a Rust logging interface and expose it to Python and C.
    下面这个例子里,会做一个 Rust 日志接口,再把它导出给 Python 和 C 使用。
    • The same interface can be used natively from Rust and from C.
      同一套核心逻辑既能被 Rust 直接调用,也能被 C 侧复用。
    • Tools such as cbindgen can generate header files automatically.
      cbindgen 这样的工具可以自动生成 C 头文件,省掉很多手写同步工作。
    • Thin unsafe wrappers can serve as a bridge into safe Rust internals.
      unsafe 包装层的理想职责,是把边界上的脏活做完,再把内部逻辑交回安全 Rust。

Logger helper functions
日志器辅助函数

#![allow(unused)]
fn main() {
fn create_or_open_log_file(log_file: &str, overwrite: bool) -> Result<File, String> {
    if overwrite {
        File::create(log_file).map_err(|e| e.to_string())
    } else {
        OpenOptions::new()
            .write(true)
            .append(true)
            .open(log_file)
            .map_err(|e| e.to_string())
    }
}

fn log_to_file(file_handle: &mut File, message: &str) -> Result<(), String> {
    file_handle
        .write_all(message.as_bytes())
        .map_err(|e| e.to_string())
}
}

Logger struct
日志器结构体

#![allow(unused)]
fn main() {
struct SimpleLogger {
    log_level: LogLevel,
    file_handle: File,
}

impl SimpleLogger {
    fn new(log_file: &str, overwrite: bool, log_level: LogLevel) -> Result<Self, String> {
        let file_handle = create_or_open_log_file(log_file, overwrite)?;
        Ok(Self {
            file_handle,
            log_level,
        })
    }

    fn log_message(&mut self, log_level: LogLevel, message: &str) -> Result<(), String> {
        if log_level as u32 <= self.log_level as u32 {
            let timestamp = Local::now().format("%Y-%m-%d %H:%M:%S").to_string();
            let message = format!("Simple: {timestamp} {log_level} {message}\n");
            log_to_file(&mut self.file_handle, &message)
        } else {
            Ok(())
        }
    }
}
}

Testing
测试

  • Testing the Rust side is easy.
    这部分一旦还在 Rust 语言边界内,测试成本其实很低。
    • Test methods use the #[test] attribute and are not part of the final binary.
      测试函数用 #[test] 标记,编译出的正式二进制里不会带着它们一起跑。
    • Creating mock helpers for tests is straightforward.
      需要伪造输入或辅助对象时,也很好搭。
#![allow(unused)]
fn main() {
#[test]
fn testfunc() -> Result<(), String> {
    let mut logger = SimpleLogger::new("test.log", false, LogLevel::INFO)?;
    logger.log_message(LogLevel::TRACELEVEL1, "Hello world")?;
    logger.log_message(LogLevel::CRITICAL, "Critical message")?;
    Ok(()) // The compiler automatically drops logger here
}
}
cargo test

(C)-Rust FFI
C 与 Rust 的 FFI

  • cbindgen is a very handy tool for generating headers for exported Rust functions.
    给 C 提供接口时,这玩意儿很省心,头文件能自动生成。
    • Can be installed using cargo.
      直接用 cargo 就能装。
cargo install cbindgen
cbindgen 
  • Functions and structs exported across the C boundary typically use #[no_mangle] and, when C needs field-level access, #[repr(C)].
    导出函数基本都绕不开 #[no_mangle]。如果结构体字段布局也要给 C 看,就得再配上 #[repr(C)]
    • The example below uses the classic interface style: pass ** out-parameters and return 0 on success, non-zero on failure.
      下面沿用 C 世界最熟悉的那种接口习惯:通过二级指针把对象传出去,返回 0 表示成功,非零表示失败。
    • Opaque vs transparent structs: SimpleLogger is passed around as an opaque pointer, so C never inspects its fields and #[repr(C)] is unnecessary. If C code needs to read/write fields directly, #[repr(C)] becomes mandatory.
      不透明结构体和透明结构体的区别: SimpleLogger 这里只是作为不透明指针在 C 侧流转,C 根本不碰内部字段,所以可以不加 #[repr(C)]。如果 C 要直接读写字段,那就必须显式保证布局兼容。
#![allow(unused)]
fn main() {
// Opaque — C only holds a pointer, never inspects fields. No #[repr(C)] needed.
struct SimpleLogger { /* Rust-only fields */ }

// Transparent — C reads/writes fields directly. MUST use #[repr(C)].
#[repr(C)]
pub struct Point {
    pub x: f64,
    pub y: f64,
}
}
typedef struct SimpleLogger SimpleLogger;
uint32_t create_simple_logger(const char *file_name, struct SimpleLogger **out_logger);
uint32_t log_entry(struct SimpleLogger *logger, const char *message);
uint32_t drop_logger(struct SimpleLogger *logger);
  • Note how much defensive checking is required at the boundary.
    这地方最忌讳想当然,凡是从外面传进来的指针都得先验一遍。
  • We also have to leak memory deliberately so Rust does not drop the logger too early.
    还有一个很容易忘的点:对象交给 C 管理以后,Rust 这一侧必须先把自动释放停掉,否则刚创建完就没了。
#![allow(unused)]
fn main() {
#[no_mangle] 
pub extern "C" fn create_simple_logger(file_name: *const std::os::raw::c_char, out_logger: *mut *mut SimpleLogger) -> u32 {
    use std::ffi::CStr;
    // Make sure pointer isn't NULL
    if file_name.is_null() || out_logger.is_null() {
        return 1;
    }
    // Safety: The passed in pointer is either NULL or 0-terminated by contract
    let file_name = unsafe {
        CStr::from_ptr(file_name)
    };
    let file_name = file_name.to_str();
    // Make sure that file_name doesn't have garbage characters
    if file_name.is_err() {
        return 1;
    }
    let file_name = file_name.unwrap();
    // Assume some defaults; we'll pass them in in real life
    let new_logger = SimpleLogger::new(file_name, false, LogLevel::CRITICAL);
    // Check that we were able to construct the logger
    if new_logger.is_err() {
        return 1;
    }
    let new_logger = Box::new(new_logger.unwrap());
    // This prevents the Box from being dropped when if goes out of scope
    let logger_ptr: *mut SimpleLogger = Box::leak(new_logger);
    // Safety: logger is non-null and logger_ptr is valid
    unsafe {
        *out_logger = logger_ptr;
    }
    return 0;
}
}
  • log_entry() has the same style of checks: validate pointers, validate UTF-8, then hand off to safe logic.
    log_entry() 也一样,边界层先把脏活干完,再把调用转进去。
#![allow(unused)]
fn main() {
#[no_mangle]
pub extern "C" fn log_entry(logger: *mut SimpleLogger, message: *const std::os::raw::c_char) -> u32 {
    use std::ffi::CStr;
    if message.is_null() || logger.is_null() {
        return 1;
    }
    // Safety: message is non-null
    let message = unsafe {
        CStr::from_ptr(message)
    };
    let message = message.to_str();
    // Make sure that file_name doesn't have garbage characters
    if message.is_err() {
        return 1;
    }
    // Safety: logger is valid pointer previously constructed by create_simple_logger()
    unsafe {
        (*logger).log_message(LogLevel::CRITICAL, message.unwrap()).is_err() as u32
    }
}

#[no_mangle]
pub extern "C" fn drop_logger(logger: *mut SimpleLogger) -> u32 {
    if logger.is_null() {
        return 1;
    }
    // Safety: logger is valid pointer previously constructed by create_simple_logger()
    unsafe {
        // This constructs a Box<SimpleLogger>, which is dropped when it goes out of scope
        let _ = Box::from_raw(logger);
    }
    0
}
}
  • This FFI can be tested from Rust itself, or from a small C program.
    一套边界接口,既可以在 Rust 测试里先跑通,也可以在 C 侧写个小程序做集成验证。
#![allow(unused)]
fn main() {
#[test]
fn test_c_logger() {
    // The c".." creates a NULL terminated string
    let file_name = c"test.log".as_ptr() as *const std::os::raw::c_char;
    let mut c_logger: *mut SimpleLogger = std::ptr::null_mut();
    assert_eq!(create_simple_logger(file_name, &mut c_logger), 0);
    // This is the manual way to create c"..." strings
    let message = b"message from C\0".as_ptr() as *const std::os::raw::c_char;
    assert_eq!(log_entry(c_logger, message), 0);
    drop_logger(c_logger);
}
}
#include "logger.h"
...
int main() {
    SimpleLogger *logger = NULL;
    if (create_simple_logger("test.log", &logger) == 0) {
        log_entry(logger, "Hello from C");
        drop_logger(logger); /*Needed to close handle, etc.*/
    } 
    ...
}

Ensuring correctness of unsafe code
怎么验证 unsafe 代码真的站得住

  • The short version is simple: writing unsafe requires deliberate thought and verification.
    不是“能跑就算对”,而是“必须知道为什么对”。
    • Always document the safety assumptions and have experienced reviewers inspect them.
      安全前提要写出来,最好还得让熟悉这块的人再看一遍。
    • Use tools such as cbindgen、Miri、Valgrind to help validate behavior.
      能借工具验证的地方就别只靠肉眼。
    • Never let a panic unwind across an FFI boundary because that is undefined behavior. Wrap entry points with std::panic::catch_unwind, or configure panic = "abort" if that matches the project needs.
      绝对不要让 panic 跨越 FFI 边界向外展开,那会直接触发未定义行为。常见做法是入口处用 std::panic::catch_unwind 包起来,或者在配置里把 panic 设成 "abort"
    • If a struct crosses the FFI boundary by value or field access, mark it #[repr(C)] to lock down layout.
      凡是跨 FFI 边界按值传递,或者要让 C 直接碰字段的结构体,都应该用 #[repr(C)] 固定内存布局。
    • Consult the Rustonomicon: https://doc.rust-lang.org/nomicon/intro.html
      这个话题真想深挖,Rustonomicon 基本绕不过去。
    • Seek help from internal experts when in doubt.
      遇到拿不准的地方,别硬撑,找更熟的人一起看。

Verification tools: Miri vs Valgrind
验证工具:Miri 和 Valgrind

C++ 开发者通常熟悉 Valgrind 和各种 sanitizer。Rust 在这些工具之外,还有一个非常特别的 Miri,它对 Rust 特有的未定义行为更敏感。
所以两边不是替代关系,更像是互补关系。

MiriValgrindC++ sanitizers (ASan/MSan/UBSan)
What it catchesRust-specific UB such as stacked borrows, invalid enum discriminants, uninitialized reads, aliasing violations
Rust 特有的 UB,像 stacked borrows、非法枚举判别值、未初始化读取、别名违规
Memory leaks, use-after-free, invalid reads/writes, uninitialized memory
内存泄漏、释放后使用、非法读写、未初始化内存
Buffer overflow, use-after-free, data races, generic UB
缓冲区溢出、释放后使用、数据竞争和更通用的 UB
How it worksInterprets MIR, Rust 的中层中间表示
不是跑本机指令,而是解释执行 MIR
Instruments the compiled binary at runtime
在运行时对编译产物做检测
Compile-time instrumentation
编译阶段插桩
FFI supportCannot cross the FFI boundary
过不去 FFI 边界,C 调用会跳过
Works on full compiled binaries including FFI
整套二进制都能查,包括 FFI
Works if the C side is also built with sanitizers
如果 C 那边也开 sanitizer,就能一起看
SpeedAbout 100x slower than native
比原生执行慢很多
Roughly 10x 到 50x slower
比原生慢一个明显量级
Roughly 2x 到 5x slower
相对温和一些
When to usePure Rust unsafe code, invariants, unsafe data structures
纯 Rust 的 unsafe 逻辑和数据结构不变量
FFI code and integration tests of the full binary
FFI 与整体验证
C/C++ side of FFI or performance-sensitive testing
C/C++ 边的检测,以及更重视性能的测试阶段
Catches aliasing bugsYes, via the Stacked Borrows model
能抓
No
抓不到
Partial support only
只能覆盖一部分场景

Recommendation: Use both. Let Miri inspect pure Rust unsafe code, and let Valgrind cover the integrated FFI binary.
建议: 两边一起上。纯 Rust 的 unsafe 逻辑交给 Miri,牵扯 FFI 的整体验证交给 Valgrind。

  • Miri catches Rust-specific UB that Valgrind cannot see.
    像别名违规、非法枚举值这些,Valgrind 看不到,Miri 能看出来。
rustup +nightly component add miri
cargo +nightly miri test                    # Run all tests under Miri
cargo +nightly miri test -- test_name       # Run a specific test

⚠️ Miri requires nightly and cannot execute FFI calls. Isolate unsafe Rust logic into self-contained units when testing it.
⚠️ Miri 需要 nightly,而且执行不了真正的 FFI 调用。所以最好把纯 Rust 的 unsafe 逻辑拆成独立单元去测。

  • Valgrind remains useful for the compiled program including FFI.
    这就是老朋友的价值:它能看整套跑起来之后的真实行为。
sudo apt install valgrind
cargo install cargo-valgrind
cargo valgrind test                         # Run all tests under Valgrind

Catches leaks in Box::leak / Box::from_raw patterns that often show up in FFI code.
Box::leakBox::from_raw 这些 FFI 里常见的配对操作,Valgrind 很适合拿来查有没有漏掉释放。

  • cargo-careful sits somewhere between normal tests and Miri, enabling extra runtime checks.
    如果觉得 Miri 太重、普通测试又太松,可以拿 cargo-careful 做中间层补强。
cargo install cargo-careful
cargo +nightly careful test

Unsafe Rust summary
本章小结

  • cbindgen is an excellent tool when exporting Rust APIs to C.
    如果方向反过来,是从 Rust 去调用 C,则通常会用 bindgen 去处理另一侧的绑定。
    • Use bindgen for the opposite direction, namely importing C interfaces into Rust.
      两者别搞反,一个偏导出,一个偏导入。
  • Never assume unsafe code is correct just because it appears to work. Many bugs hide in invariants that are only violated under rare interleavings or unusual inputs.
    unsafe 代码最会骗人,表面上跑通根本不代表成立。很多问题只会在很偏的输入或时序下冒头。
    • Use tools to verify correctness.
      能测就测,能查就查。
    • If doubt remains, ask experienced reviewers for help.
      还有疑问就继续找人复核,别靠胆子硬顶。
  • Every unsafe block and every caller of an unsafe API should document the safety assumptions being relied on.
    不光 unsafe 块内部要写清楚前提,调用方如果也承担了某些约束,同样应该把这些约束写出来。

Exercise: Writing a safe FFI wrapper
练习:给 FFI 写一个安全包装层

🔴 Challenge — requires understanding raw pointers, unsafe blocks, and safe API design
🔴 挑战题:这题会同时考原始指针、unsafe 块和安全 API 设计。

  • Write a safe Rust wrapper around an unsafe FFI-style function. The exercise simulates a C function that writes a formatted string into a caller-provided buffer.
    给一个 unsafe 风格的 FFI 函数写安全包装层。这个练习模拟的是:C 函数往调用者提供的缓冲区里写一段格式化字符串。
  • Step 1: Implement unsafe_greet, which writes a greeting into a raw *mut u8 buffer.
    第 1 步: 实现 unsafe_greet,把问候语写进原始 *mut u8 缓冲区。
  • Step 2: Write safe_greet, which allocates a Vec<u8>,调用 unsafe_greet,然后返回 String
    第 2 步: 写一个 safe_greet,由它负责分配缓冲区、调用不安全函数、再把结果转回 String
  • Step 3: Add proper // Safety: comments to every unsafe block.
    第 3 步: 每个 unsafe 块都补上明确的 // Safety: 注释。

Starter code:
起始代码:

use std::fmt::Write as _;

/// Simulates a C function: writes "Hello, <name>!" into buffer.
/// Returns the number of bytes written (excluding null terminator).
/// # Safety
/// - `buf` must point to at least `buf_len` writable bytes
/// - `name` must be a valid pointer to a null-terminated C string
unsafe fn unsafe_greet(buf: *mut u8, buf_len: usize, name: *const u8) -> isize {
    // TODO: Build greeting, copy bytes into buf, return length
    // Hint: use std::ffi::CStr::from_ptr or iterate bytes manually
    todo!()
}

/// Safe wrapper — no unsafe in the public API
fn safe_greet(name: &str) -> Result<String, String> {
    // TODO: Allocate a Vec<u8> buffer, create a null-terminated name,
    // call unsafe_greet inside an unsafe block with Safety comment,
    // convert the result back to a String
    todo!()
}

fn main() {
    match safe_greet("Rustacean") {
        Ok(msg) => println!("{msg}"),
        Err(e) => eprintln!("Error: {e}"),
    }
    // Expected output: Hello, Rustacean!
}
Solution 参考答案
use std::ffi::CStr;

/// Simulates a C function: writes "Hello, <name>!" into buffer.
/// Returns the number of bytes written, or -1 if buffer too small.
/// # Safety
/// - `buf` must point to at least `buf_len` writable bytes
/// - `name` must be a valid pointer to a null-terminated C string
unsafe fn unsafe_greet(buf: *mut u8, buf_len: usize, name: *const u8) -> isize {
    // Safety: caller guarantees name is a valid null-terminated string
    let name_cstr = unsafe { CStr::from_ptr(name as *const std::os::raw::c_char) };
    let name_str = match name_cstr.to_str() {
        Ok(s) => s,
        Err(_) => return -1,
    };
    let greeting = format!("Hello, {}!", name_str);
    if greeting.len() > buf_len {
        return -1;
    }
    // Safety: buf points to at least buf_len writable bytes (caller guarantee)
    unsafe {
        std::ptr::copy_nonoverlapping(greeting.as_ptr(), buf, greeting.len());
    }
    greeting.len() as isize
}

/// Safe wrapper — no unsafe in the public API
fn safe_greet(name: &str) -> Result<String, String> {
    let mut buffer = vec![0u8; 256];
    // Create a null-terminated version of name for the C API
    let name_with_null: Vec<u8> = name.bytes().chain(std::iter::once(0)).collect();

    // Safety: buffer has 256 writable bytes, name_with_null is null-terminated
    let bytes_written = unsafe {
        unsafe_greet(buffer.as_mut_ptr(), buffer.len(), name_with_null.as_ptr())
    };

    if bytes_written < 0 {
        return Err("Buffer too small or invalid name".to_string());
    }

    String::from_utf8(buffer[..bytes_written as usize].to_vec())
        .map_err(|e| format!("Invalid UTF-8: {e}"))
}

fn main() {
    match safe_greet("Rustacean") {
        Ok(msg) => println!("{msg}"),
        Err(e) => eprintln!("Error: {e}"),
    }
}
// Output:
// Hello, Rustacean!