Release Profiles and Binary Size 🟡
发布配置与二进制体积 🟡
What you’ll learn:
本章将学到什么:
- Release profile anatomy: LTO,
codegen-units, panic strategy,strip,opt-level
发布配置的关键旋钮:LTO、codegen-units、panic 策略、strip、opt-level- Thin vs Fat vs Cross-Language LTO trade-offs
Thin、Fat 与跨语言 LTO 的取舍- Binary size analysis with
cargo-bloat
如何用cargo-bloat分析二进制体积- Dependency trimming with
cargo-udepsandcargo-machete
如何用cargo-udeps和cargo-machete修剪依赖Cross-references: Compile-Time Tools, Benchmarking, and Dependencies.
交叉阅读: 编译期工具、基准测试 以及 依赖管理。
The default cargo build --release is already decent. But in production deployment, especially for single-binary tools shipped to thousands of machines, there is a large distance between “decent” and “fully optimized”. This chapter focuses on the knobs and measurement tools that close that gap.
默认的 cargo build --release 已经不算差了。但真到了生产部署,尤其是那种要把单个二进制工具铺到成千上万台机器上的场景,“够用”和“真正优化过”之间差得还很远。这一章就是把这些关键旋钮和度量工具掰开说明白。
Release Profile Anatomy
发布配置的基本结构
Cargo profile 决定了 rustc 如何编译代码。默认值偏保守,更看重广泛兼容,而不是极限性能和极限体积:
Cargo profile 控制的是 rustc 的编译行为。默认配置比较保守,重心在广泛兼容,不是在性能和体积上狠狠干到头。
# Cargo.toml — Cargo's built-in defaults
[profile.release]
opt-level = 3 # Optimization level
lto = false # Link-time optimization OFF
codegen-units = 16 # Parallel codegen units
panic = "unwind" # Stack unwinding on panic
strip = "none" # Keep symbols and debug info
overflow-checks = false
debug = false
Production-optimized profile:
更偏生产部署的配置:
[profile.release]
lto = true
codegen-units = 1
panic = "abort"
strip = true
The impact of each setting:
每个选项大致会带来什么影响:
| Setting | Default -> Optimized | Binary Size 体积 | Runtime Speed 运行速度 | Compile Time 编译时间 |
|---|---|---|---|---|
lto = false -> true | — | -10% 到 -20% 缩小 10% 到 20% | +5% 到 +20% 提升 5% 到 20% | 变慢 2 到 5 倍 |
codegen-units = 16 -> 1 | — | -5% 到 -10% | +5% 到 +10% | 变慢 1.5 到 2 倍 |
panic = "unwind" -> "abort" | — | -5% 到 -10% | 几乎没有变化 | 几乎没有变化 |
strip = "none" -> true | — | -50% 到 -70% | 没影响 | 没影响 |
opt-level = 3 -> "s" | — | -10% 到 -30% | -5% 到 -10% | 接近不变 |
opt-level = 3 -> "z" | — | -15% 到 -40% | -10% 到 -20% | 接近不变 |
Additional profile tweaks:
还可以继续加的配置项:
[profile.release]
overflow-checks = true # Keep overflow checks in release
debug = "line-tables-only" # Minimal debug info for backtraces
rpath = false
incremental = false
# For size-optimized builds:
# opt-level = "z"
# strip = "symbols"
Per-crate profile overrides let hot crates and cold crates take different strategies:
按 crate 单独覆盖 profile 可以让热点 crate 和非热点 crate 用不同策略:
[profile.dev.package."*"]
opt-level = 2
[profile.release.package.serde_json]
opt-level = 3
codegen-units = 1
[profile.test]
opt-level = 1
LTO in Depth — Thin vs Fat vs Cross-Language
LTO 深入看:Thin、Fat 与跨语言 LTO
Link-Time Optimization allows LLVM to optimize across crate boundaries. Without LTO, every crate is basically its own optimization island.
Link-Time Optimization 能让 LLVM 跨 crate 做优化。不开 LTO 的话,每个 crate 基本就像一个彼此隔离的优化孤岛。
[profile.release]
# Option 1: Fat LTO
lto = true
# Option 2: Thin LTO
# lto = "thin"
# Option 3: No LTO
# lto = false
# Option 4: Explicit off
# lto = "off"
Fat LTO vs Thin LTO:
Fat LTO 和 Thin LTO 的差别:
| Aspect 方面 | Fat LTO (true) | Thin LTO ("thin") |
|---|---|---|
| Optimization quality 优化质量 | Best 最好 | About 95% of fat 接近 Fat 的 95% |
| Compile time 编译时间 | Slow 更慢 | Moderate 中等 |
| Memory usage 内存占用 | High 更高 | Lower 更低 |
| Parallelism 并行性 | None or very low 很低 | Good 较好 |
| Recommended for 适用场景 | Final release builds 最终发布构建 | CI and everyday builds CI 与日常构建 |
Cross-language LTO means optimizing Rust and C code together across the FFI boundary:
跨语言 LTO 指的是把 Rust 和 C 代码一起优化,连 FFI 边界也不放过:
[profile.release]
lto = true
[build-dependencies]
cc = "1.0"
// build.rs
fn main() {
cc::Build::new()
.file("csrc/fast_parser.c")
.flag("-flto=thin")
.opt_level(2)
.compile("fast_parser");
}
RUSTFLAGS="-Clinker-plugin-lto -Clinker=clang -Clink-arg=-fuse-ld=lld" \
cargo build --release
This matters most when small C helpers are called frequently from Rust, because inlining across the boundary can finally become possible.
这种做法在 FFI 很重的场景下最值钱,尤其是那种 Rust 频繁调用小型 C 辅助函数的地方,因为跨边界内联终于有机会发生了。
Binary Size Analysis with cargo-bloat
用 cargo-bloat 分析二进制体积
cargo-bloat answers a brutally practical question: “Which functions and which crates are把二进制撑胖了?”cargo-bloat 解决的是一个非常现实的问题:到底是哪些函数、哪些 crate 把二进制撑胖了?
# Install
cargo install cargo-bloat
# Show largest functions
cargo bloat --release -n 20
# Show by crate
cargo bloat --release --crates
# Compare before and after
cargo bloat --release --crates > before.txt
# ... make changes ...
cargo bloat --release --crates > after.txt
diff before.txt after.txt
Common bloat sources and fixes:
常见膨胀来源与处理方式:
| Bloat Source 膨胀来源 | Typical Size 典型体积 | Fix 处理方式 |
|---|---|---|
regex | 200 到 400 KB | Use regex-lite if Unicode support is unnecessary如果不需要完整 Unicode 支持,可以换 regex-lite |
serde_json | 200 到 350 KB | Consider lighter or faster alternatives 按场景考虑更轻或更快的替代库 |
| Generics monomorphization | Varies | Use dyn Trait at API boundaries在 API 边界适度引入 dyn Trait |
| Formatting machinery | 50 到 150 KB | Avoid over-deriving or overly rich formatting paths 别无脑派生太多调试格式能力 |
| Panic message strings | 20 到 80 KB | Use panic = "abort" and strip用 panic = "abort" 和 strip 收缩 |
| Unused features | Varies | Disable default features 关闭不需要的默认 feature |
Trimming Dependencies with cargo-udeps
用 cargo-udeps 修剪依赖
cargo-udeps finds dependencies declared in Cargo.toml that the code no longer uses.cargo-udeps 可以找出那些已经写进 Cargo.toml,但代码实际上早就不再使用的依赖。
# Install (requires nightly)
cargo install cargo-udeps
# Find unused dependencies
cargo +nightly udeps --workspace
Every unused dependency brings four kinds of tax:
每一个没用的依赖都会额外带来四层负担:
- More compile time
1. 编译更慢。 - Larger binaries
2. 二进制更大。 - More supply-chain risk
3. 供应链风险更高。 - More licensing complexity
4. 许可证问题更复杂。
Alternative: cargo-machete offers a faster heuristic approach, though it may report false positives.
替代方案:cargo-machete 走的是更快的启发式路线,不过误报概率也更高一些。
cargo install cargo-machete
cargo machete
Alternative: cargo-shear — sweet spot between cargo-udeps and cargo-machete:
另一种选择:cargo-shear,速度和准确率通常处在 cargo-udeps 与 cargo-machete 中间,挺适合日常巡检。
cargo install cargo-shear
cargo shear --fix
# Slower than cargo-machete but much faster than cargo-udeps
# Much less false positives than cargo-machete
Size Optimization Decision Tree
体积优化决策树
flowchart TD
START["Binary too large?<br/>二进制太大了吗?"] --> STRIP{"strip = true?<br/>已经 strip 了吗?"}
STRIP -->|"No<br/>否"| DO_STRIP["Add strip = true<br/>先加 strip = true"]
STRIP -->|"Yes<br/>是"| LTO{"LTO enabled?<br/>已经开 LTO 了吗?"}
LTO -->|"No<br/>否"| DO_LTO["Add lto = true<br/>and codegen-units = 1"]
LTO -->|"Yes<br/>是"| BLOAT["Run cargo-bloat<br/>--crates"]
BLOAT --> BIG_DEP{"Large dependency?<br/>是不是某个依赖特别大?"}
BIG_DEP -->|"Yes<br/>是"| REPLACE["Replace it or disable<br/>default features"]
BIG_DEP -->|"No<br/>否"| UDEPS["Run cargo-udeps<br/>remove dead deps"]
UDEPS --> OPT_LEVEL{"Need even smaller?<br/>还想更小吗?"}
OPT_LEVEL -->|"Yes<br/>是"| SIZE_OPT["Use opt-level = 's' or 'z'"]
style DO_STRIP fill:#91e5a3,color:#000
style DO_LTO fill:#e3f2fd,color:#000
style REPLACE fill:#ffd43b,color:#000
style SIZE_OPT fill:#ff6b6b,color:#000
🏋️ Exercises
🏋️ 练习
🟢 Exercise 1: Measure LTO Impact
🟢 练习 1:测量 LTO 的影响
Build once with the default release settings, then build again with lto = true、codegen-units = 1、strip = true. Compare binary size and compile time.
先用默认 release 配置构建一次,再用 lto = true、codegen-units = 1、strip = true 重构建一次,对比二进制大小和编译时间。
Solution 参考答案
# Default release
cargo build --release
ls -lh target/release/my-binary
time cargo build --release
# Optimized release — add to Cargo.toml:
# [profile.release]
# lto = true
# codegen-units = 1
# strip = true
# panic = "abort"
cargo clean
cargo build --release
ls -lh target/release/my-binary
time cargo build --release
🟡 Exercise 2: Find Your Biggest Crate
🟡 练习 2:找出最胖的 crate
Run cargo bloat --release --crates on a project. Identify the largest dependency and see whether it can be slimmed down via feature trimming or a lighter replacement.
对一个项目执行 cargo bloat --release --crates,找出体积最大的依赖,再看看能不能通过裁剪 feature 或替换更轻的库把它压下去。
Solution 参考答案
cargo install cargo-bloat
cargo bloat --release --crates
# Example:
# regex-lite = "0.1"
# serde = { version = "1", default-features = false, features = ["derive"] }
cargo bloat --release --crates
Key Takeaways
本章要点
lto = true、codegen-units = 1、strip = true、panic = "abort"是一套很常见的生产发布配置。
这是一套非常常见的生产级发布组合。- Thin LTO 通常能拿到大部分优化收益,但编译成本比 Fat LTO 小得多。
对大多数项目来说,它往往是更平衡的选择。 cargo-bloat --crates能把“到底谁在吃空间”这件事讲明白。
别靠猜,直接测。cargo-udeps、cargo-machete和cargo-shear都可以清理掉那些白白拖慢构建、增大体积的死依赖。
依赖瘦身往往同时改善编译时间、二进制大小和供应链质量。- 按 crate 单独覆写 profile,可以让热点路径得到强化,又不至于把整个工程的编译速度都拖死。
细粒度 profile 是个很值钱的中间路线。