Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Release Profiles and Binary Size 🟡
发布配置与二进制体积 🟡

What you’ll learn:
本章将学到什么:

  • Release profile anatomy: LTO, codegen-units, panic strategy, strip, opt-level
    发布配置的关键旋钮:LTO、codegen-units、panic 策略、stripopt-level
  • Thin vs Fat vs Cross-Language LTO trade-offs
    Thin、Fat 与跨语言 LTO 的取舍
  • Binary size analysis with cargo-bloat
    如何用 cargo-bloat 分析二进制体积
  • Dependency trimming with cargo-udeps and cargo-machete
    如何用 cargo-udepscargo-machete 修剪依赖

Cross-references: Compile-Time Tools, Benchmarking, and Dependencies.
交叉阅读: 编译期工具基准测试 以及 依赖管理

The default cargo build --release is already decent. But in production deployment, especially for single-binary tools shipped to thousands of machines, there is a large distance between “decent” and “fully optimized”. This chapter focuses on the knobs and measurement tools that close that gap.
默认的 cargo build --release 已经不算差了。但真到了生产部署,尤其是那种要把单个二进制工具铺到成千上万台机器上的场景,“够用”和“真正优化过”之间差得还很远。这一章就是把这些关键旋钮和度量工具掰开说明白。

Release Profile Anatomy
发布配置的基本结构

Cargo profile 决定了 rustc 如何编译代码。默认值偏保守,更看重广泛兼容,而不是极限性能和极限体积:
Cargo profile 控制的是 rustc 的编译行为。默认配置比较保守,重心在广泛兼容,不是在性能和体积上狠狠干到头。

# Cargo.toml — Cargo's built-in defaults

[profile.release]
opt-level = 3        # Optimization level
lto = false          # Link-time optimization OFF
codegen-units = 16   # Parallel codegen units
panic = "unwind"     # Stack unwinding on panic
strip = "none"       # Keep symbols and debug info
overflow-checks = false
debug = false

Production-optimized profile:
更偏生产部署的配置

[profile.release]
lto = true
codegen-units = 1
panic = "abort"
strip = true

The impact of each setting:
每个选项大致会带来什么影响:

SettingDefault -> OptimizedBinary Size
体积
Runtime Speed
运行速度
Compile Time
编译时间
lto = false -> true-10% 到 -20%
缩小 10% 到 20%
+5% 到 +20%
提升 5% 到 20%
变慢 2 到 5 倍
codegen-units = 16 -> 1-5% 到 -10%+5% 到 +10%变慢 1.5 到 2 倍
panic = "unwind" -> "abort"-5% 到 -10%几乎没有变化几乎没有变化
strip = "none" -> true-50% 到 -70%没影响没影响
opt-level = 3 -> "s"-10% 到 -30%-5% 到 -10%接近不变
opt-level = 3 -> "z"-15% 到 -40%-10% 到 -20%接近不变

Additional profile tweaks:
还可以继续加的配置项:

[profile.release]
overflow-checks = true      # Keep overflow checks in release
debug = "line-tables-only"  # Minimal debug info for backtraces
rpath = false
incremental = false

# For size-optimized builds:
# opt-level = "z"
# strip = "symbols"

Per-crate profile overrides let hot crates and cold crates take different strategies:
按 crate 单独覆盖 profile 可以让热点 crate 和非热点 crate 用不同策略:

[profile.dev.package."*"]
opt-level = 2

[profile.release.package.serde_json]
opt-level = 3
codegen-units = 1

[profile.test]
opt-level = 1

LTO in Depth — Thin vs Fat vs Cross-Language
LTO 深入看:Thin、Fat 与跨语言 LTO

Link-Time Optimization allows LLVM to optimize across crate boundaries. Without LTO, every crate is basically its own optimization island.
Link-Time Optimization 能让 LLVM 跨 crate 做优化。不开 LTO 的话,每个 crate 基本就像一个彼此隔离的优化孤岛。

[profile.release]
# Option 1: Fat LTO
lto = true

# Option 2: Thin LTO
# lto = "thin"

# Option 3: No LTO
# lto = false

# Option 4: Explicit off
# lto = "off"

Fat LTO vs Thin LTO:
Fat LTO 和 Thin LTO 的差别:

Aspect
方面
Fat LTO (true)Thin LTO ("thin")
Optimization quality
优化质量
Best
最好
About 95% of fat
接近 Fat 的 95%
Compile time
编译时间
Slow
更慢
Moderate
中等
Memory usage
内存占用
High
更高
Lower
更低
Parallelism
并行性
None or very low
很低
Good
较好
Recommended for
适用场景
Final release builds
最终发布构建
CI and everyday builds
CI 与日常构建

Cross-language LTO means optimizing Rust and C code together across the FFI boundary:
跨语言 LTO 指的是把 Rust 和 C 代码一起优化,连 FFI 边界也不放过:

[profile.release]
lto = true

[build-dependencies]
cc = "1.0"
// build.rs
fn main() {
    cc::Build::new()
        .file("csrc/fast_parser.c")
        .flag("-flto=thin")
        .opt_level(2)
        .compile("fast_parser");
}
RUSTFLAGS="-Clinker-plugin-lto -Clinker=clang -Clink-arg=-fuse-ld=lld" \
    cargo build --release

This matters most when small C helpers are called frequently from Rust, because inlining across the boundary can finally become possible.
这种做法在 FFI 很重的场景下最值钱,尤其是那种 Rust 频繁调用小型 C 辅助函数的地方,因为跨边界内联终于有机会发生了。

Binary Size Analysis with cargo-bloat
cargo-bloat 分析二进制体积

cargo-bloat answers a brutally practical question: “Which functions and which crates are把二进制撑胖了?”
cargo-bloat 解决的是一个非常现实的问题:到底是哪些函数、哪些 crate 把二进制撑胖了?

# Install
cargo install cargo-bloat

# Show largest functions
cargo bloat --release -n 20

# Show by crate
cargo bloat --release --crates

# Compare before and after
cargo bloat --release --crates > before.txt
# ... make changes ...
cargo bloat --release --crates > after.txt
diff before.txt after.txt

Common bloat sources and fixes:
常见膨胀来源与处理方式:

Bloat Source
膨胀来源
Typical Size
典型体积
Fix
处理方式
regex200 到 400 KBUse regex-lite if Unicode support is unnecessary
如果不需要完整 Unicode 支持,可以换 regex-lite
serde_json200 到 350 KBConsider lighter or faster alternatives
按场景考虑更轻或更快的替代库
Generics monomorphizationVariesUse dyn Trait at API boundaries
在 API 边界适度引入 dyn Trait
Formatting machinery50 到 150 KBAvoid over-deriving or overly rich formatting paths
别无脑派生太多调试格式能力
Panic message strings20 到 80 KBUse panic = "abort" and strip
panic = "abort"strip 收缩
Unused featuresVariesDisable default features
关闭不需要的默认 feature

Trimming Dependencies with cargo-udeps
cargo-udeps 修剪依赖

cargo-udeps finds dependencies declared in Cargo.toml that the code no longer uses.
cargo-udeps 可以找出那些已经写进 Cargo.toml,但代码实际上早就不再使用的依赖。

# Install (requires nightly)
cargo install cargo-udeps

# Find unused dependencies
cargo +nightly udeps --workspace

Every unused dependency brings four kinds of tax:
每一个没用的依赖都会额外带来四层负担:

  1. More compile time
    1. 编译更慢。
  2. Larger binaries
    2. 二进制更大。
  3. More supply-chain risk
    3. 供应链风险更高。
  4. More licensing complexity
    4. 许可证问题更复杂。

Alternative: cargo-machete offers a faster heuristic approach, though it may report false positives.
替代方案:cargo-machete 走的是更快的启发式路线,不过误报概率也更高一些。

cargo install cargo-machete
cargo machete

Alternative: cargo-shear — sweet spot between cargo-udeps and cargo-machete:
另一种选择:cargo-shear,速度和准确率通常处在 cargo-udepscargo-machete 中间,挺适合日常巡检。

cargo install cargo-shear
cargo shear --fix
# Slower than cargo-machete but much faster than cargo-udeps
# Much less false positives than cargo-machete

Size Optimization Decision Tree
体积优化决策树

flowchart TD
    START["Binary too large?<br/>二进制太大了吗?"] --> STRIP{"strip = true?<br/>已经 strip 了吗?"}
    STRIP -->|"No<br/>否"| DO_STRIP["Add strip = true<br/>先加 strip = true"]
    STRIP -->|"Yes<br/>是"| LTO{"LTO enabled?<br/>已经开 LTO 了吗?"}
    LTO -->|"No<br/>否"| DO_LTO["Add lto = true<br/>and codegen-units = 1"]
    LTO -->|"Yes<br/>是"| BLOAT["Run cargo-bloat<br/>--crates"]
    BLOAT --> BIG_DEP{"Large dependency?<br/>是不是某个依赖特别大?"}
    BIG_DEP -->|"Yes<br/>是"| REPLACE["Replace it or disable<br/>default features"]
    BIG_DEP -->|"No<br/>否"| UDEPS["Run cargo-udeps<br/>remove dead deps"]
    UDEPS --> OPT_LEVEL{"Need even smaller?<br/>还想更小吗?"}
    OPT_LEVEL -->|"Yes<br/>是"| SIZE_OPT["Use opt-level = 's' or 'z'"]
    
    style DO_STRIP fill:#91e5a3,color:#000
    style DO_LTO fill:#e3f2fd,color:#000
    style REPLACE fill:#ffd43b,color:#000
    style SIZE_OPT fill:#ff6b6b,color:#000

🏋️ Exercises
🏋️ 练习

🟢 Exercise 1: Measure LTO Impact
🟢 练习 1:测量 LTO 的影响

Build once with the default release settings, then build again with lto = truecodegen-units = 1strip = true. Compare binary size and compile time.
先用默认 release 配置构建一次,再用 lto = truecodegen-units = 1strip = true 重构建一次,对比二进制大小和编译时间。

Solution 参考答案
# Default release
cargo build --release
ls -lh target/release/my-binary
time cargo build --release

# Optimized release — add to Cargo.toml:
# [profile.release]
# lto = true
# codegen-units = 1
# strip = true
# panic = "abort"

cargo clean
cargo build --release
ls -lh target/release/my-binary
time cargo build --release

🟡 Exercise 2: Find Your Biggest Crate
🟡 练习 2:找出最胖的 crate

Run cargo bloat --release --crates on a project. Identify the largest dependency and see whether it can be slimmed down via feature trimming or a lighter replacement.
对一个项目执行 cargo bloat --release --crates,找出体积最大的依赖,再看看能不能通过裁剪 feature 或替换更轻的库把它压下去。

Solution 参考答案
cargo install cargo-bloat
cargo bloat --release --crates

# Example:
# regex-lite = "0.1"
# serde = { version = "1", default-features = false, features = ["derive"] }

cargo bloat --release --crates

Key Takeaways
本章要点

  • lto = truecodegen-units = 1strip = truepanic = "abort" 是一套很常见的生产发布配置。
    这是一套非常常见的生产级发布组合。
  • Thin LTO 通常能拿到大部分优化收益,但编译成本比 Fat LTO 小得多。
    对大多数项目来说,它往往是更平衡的选择。
  • cargo-bloat --crates 能把“到底谁在吃空间”这件事讲明白。
    别靠猜,直接测。
  • cargo-udepscargo-machetecargo-shear 都可以清理掉那些白白拖慢构建、增大体积的死依赖。
    依赖瘦身往往同时改善编译时间、二进制大小和供应链质量。
  • 按 crate 单独覆写 profile,可以让热点路径得到强化,又不至于把整个工程的编译速度都拖死。
    细粒度 profile 是个很值钱的中间路线。